Otherwise anv_descriptor_set is accessed through an unaligned pointer,
which is undefined behavior in C.
```
anv_descriptor_set.c:1620:17: runtime error: member access within misaligned address 0x61900002c2b5
for type 'struct anv_descriptor_set', which requires 8 byte alignment 0x61900002c2b5
```
Fixes: 2570a58bcd ("anv: Implement descriptor pools")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32070>
It could be confusing that a newer platform named with a smaller
number than a half-generation of an older platform like 'gfx20' and
'gfx75' in xml files.
Down the road, it can be a little worse once we pass something like
'gfx40' when there is already a gfx45.xml for the oldest platform.
Unify naming xml files with verx10 numbers to resolve the issue.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31943>
While the documentation says to use NUM_SIMD_LANES_PER_DSS for the stack
address calculation, what the HW actually uses is
NUM_SYNC_STACKID_PER_DSS. The former may vary depending on the platform,
while the latter is fixed to 2048 for all current platforms.
Fixes: 6c84cbd8c9 ("intel/dev/xe: Set max_eus_per_subslice using topology query")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32049>
There are two fragment shaders from RDR2 that is hurt for spills and
fills on Lunar Lake.
Totals from 2 (0.00% of 551413) affected shaders:
Spill count: 1252 -> 1317 (+5.19%)
Fill count: 2518 -> 2642 (+4.92%)
Those shaders... have a lot of room for improvement. There are some
patterns in those shaders that we handle very, very poorly. Improving
those patterns would likely improve the spills and fills in these
shaders quite dramatically.
Given how much other platforms are helped, I don't this should block
this commit.
No shader-db or fossil-db changes on any pre-Gfx12.5 Intel platforms.
v2: Add some comments and an additional assertion. Suggested by Ken.
shader-db:
Lunar Lake
total instructions in shared programs: 18094517 -> 18094511 (<.01%)
instructions in affected programs: 809 -> 803 (-0.74%)
helped: 6 / HURT: 0
total cycles in shared programs: 921532158 -> 921532168 (<.01%)
cycles in affected programs: 2266 -> 2276 (0.44%)
helped: 0 / HURT: 3
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19820845 -> 19820839 (<.01%)
instructions in affected programs: 803 -> 797 (-0.75%)
helped: 6 / HURT: 0
total cycles in shared programs: 906372999 -> 906372949 (<.01%)
cycles in affected programs: 3216 -> 3166 (-1.55%)
helped: 6 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 141887377 -> 141884465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21990301498 -> 21990267232 (-0.00%); split: -0.00%, +0.00%
Spill count: 69732 -> 69797 (+0.09%)
Fill count: 128521 -> 128645 (+0.10%)
Totals from 349 (0.06% of 551413) affected shaders:
Instrs: 506117 -> 503205 (-0.58%); split: -0.79%, +0.21%
Cycle count: 32362996 -> 32328730 (-0.11%); split: -0.52%, +0.41%
Spill count: 1951 -> 2016 (+3.33%)
Fill count: 4899 -> 5023 (+2.53%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152773732 -> 152761383 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17187529968 -> 17187450663 (-0.00%); split: -0.00%, +0.00%
Spill count: 79279 -> 79003 (-0.35%)
Fill count: 148803 -> 147942 (-0.58%)
Scratch Memory Size: 3949568 -> 3946496 (-0.08%)
Max live registers: 31879325 -> 31879230 (-0.00%)
Totals from 366 (0.06% of 633185) affected shaders:
Instrs: 557377 -> 545028 (-2.22%); split: -2.22%, +0.01%
Cycle count: 26171205 -> 26091900 (-0.30%); split: -0.54%, +0.24%
Spill count: 3238 -> 2962 (-8.52%)
Fill count: 10018 -> 9157 (-8.59%)
Scratch Memory Size: 257024 -> 253952 (-1.20%)
Max live registers: 28187 -> 28092 (-0.34%)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
Previously an instruction like
cmp.l.f0.0(16) null:F, v359:F, 0f
would get lowered to
undef(16) v13703:UD
cmp.l.f0.0(16) v13703:F, v359:F, 0f
mov(16) null:UD, v13703:UD
After copy propagation and dead-code elimination are run again, the
original CMP gets turned back into its original form!
Some cases can also emit MOVs from the original NULL register.
It should be possible to not do any lowering here, but there are some
interactions with source lowering passes for things like
cmp.l.f0.0(16) null:HF, g89.1<16,16,1>:HF, 0hf
What inspired this was... diff'ing step-by-step dumps from
INTEL_DEBUG=optimizer had a lot of useless changes due to these MOVs
and undefs. It was very annoying. This low-effort change gets the
majority of the possible benefit.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
Copy propagation would incorrectly occur in this code
mov(16) v4+2.0:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0
to create
mov(16) v4+2.0:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, u0<0>:UD NoMask group0
This has different behavior. I think I just made a mistake when I
changed this condition in e3f502e007.
It seems like this condition could be relaxed to cover cases like (note
the change of destination stride)
mov(16) v4+2.0<2>:UW, u0<0>:UW NoMask
...
mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0
I'm not sure it's worth it.
No shader-db or fossil-db changes on any Intel platform. Even the code
for the test case mentioned in the original commit did not change.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e3f502e007 ("intel/fs: Allow copy propagation between MOVs of mixed sizes")
Closes: #12116
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
Specifically, allow two immediate sources for BFE on Gfx12+. I stumbled
on this while trying some stuff with !31852.
v2: Don't be lazy. Add proper assertions for all the things on all the
platforms. Based on a suggestion by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7bed11fbde ("intel/brw: Allow immediates in the BFE instruction on Gfx12+")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31858>
This has the best fossil-db results across in a sweep from 0..15.
fossil-db results on Alderlake:
Instructions in all programs: 152849904 -> 152824116 (-0.0%)
SENDs in all programs: 7677830 -> 7677830 (+0.0%)
Loops in all programs: 48470 -> 48470 (+0.0%)
Cycles in all programs: 11988670382 -> 11987530942 (-0.0%)
Spills in all programs: 42863 -> 41777 (-2.5%)
Fills in all programs: 77114 -> 73044 (-5.3%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31990>
A while back Matt enabled shader_spilling_rate by default for anv.
But intel_clc doesn't use the driconf mechanism that we use there.
The GRL shaders spill a lot, and with us now compiling additional
generations of the shaders, Mesa build time is getting prohibitively
expensive. By setting this, we drop the time taken for a clean debug
build by approximately 35% on my current laptop.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31993>
Xe KMD added a uAPI to syncronze metrics id changes, so we can make
it wait for all previous workloads in exec_queue and all previous
metrics id changes to finish before start change it again.
This should make Vulkan queries more robust.
So this makes use of intel_bind_timeline to syncronize the metrics id
changes and xe_queue_get_syncobj_for_idle() to syncronize with
exec_queue.
As i915 and some versions of Xe KMD will not support it, this feature
will only be used then intel_bind_timeline parameter is not NULL and
timeline has a valid syncobj id.
At this patch level all callers will set it to NULL, next patch will
add and initialize timeline in ANV when supported by Xe KMD.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31283>
Previously this bit was not clearly documented in PRMs, but gfx12 PRMs
finally list all the instructions where it is present.
Although it's unclear if it's functional for anything other than "if",
"else", and "goto", we probably still should acknowledge its existence
in other instructions.
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
Previously this bit was not clearly documented in PRMs, but gfx12 PRMs
finally list all the instructions where it is present.
Although it's unclear if it's functional for anything other than "if",
"else", and "goto", we probably still should acknowledge its existence
in other instructions.
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
This is a workaround that is still in progress, see HSD 22020521218.
If we don't have these NOPs we may see rendering corruption or even
GPU hangs.
While we still don't fully understand the issue from the hardware
point of view, let's have this workaround so we can pass CTS and move
things forward. If we need to change this later, we can. Besides, the
impact is minimal. Shaderdb/fossilize report no changes for this
patch.
On our Blackops trace, the lack of this patch causes corruption in fog
rendering (rectangles where fog was supposed to be shown don't show
the fog).
On dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters, without
this patch we get a GPU hang.
Backport-to: 24.2
Testcase: dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11813
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31331>
intel_device_info_update_after_hwconfig() updates max_cs_threads
based on max_eus_per_subslice and num_thread_per_eu but in some
platforms simulator the hwconfig don't have the
INTEL_HWCONFIG_MAX_NUM_EU_PER_DSS value, causing max_cs_threads to
be set to a wrong value and then causing issues when programing
CFE_STATE with a invalid value.
Fortunately we can also get max_eus_per_subslice from topology query,
so here moving the hwconfig query and
intel_device_info_update_after_hwconfig() call to after topology.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31850>
If a render area covers an area that is smaller than an attachment's
extent and is not aligned to the CCS block size, we must load the clear
color so that the pixels outside of that area are decompressed with the
right clear color.
Prevents the next patch from causing the following test failure on gfx9:
dEQP-VK.renderpass.suballocation.load_store_op_none.color_load_op_none_store_op_none
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
Store an array of clear values, one for each view format of the image.
Load the clear value based on the view format.
anv_image_msaa_resolve() may override the source or destination with
ISL_FORMAT_UNSUPPORTED, so make anv_image_get_clear_color_addr() handle
that format.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
In later commits, we'll rely on the number of view formats used by an
image to determine the size allocated for an array of clear colors in
the aux-state tracking buffer. Having a single view format for dmabufs
with clear color support allows anv to transparently handle this case.
Restrict the number of view formats by explicitly setting the image
format list to incomplete. Secondly, loosen the non-zero clear color
restriction on clear color supporting dmabufs. Those images can support
any clear color even with an incomplete list because we restrict
problematic accesses for the clear color during the negotiation phase.
Lastly, update add_all_surfaces_explicit_layout() to assert that the
sizing of the imported clear color struct meets expectations.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
There's currently no GL or GLES testing on the iris gallium driver,
and the VKCTS expectations were erroneously listed under iris-*.txt.
Fix the rules set for anv-adl-full, change the GPU_VERSION to anv-adl
and move the expectations around accordingly.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31862>
The InlineData passed to the shader is a fixed size unrelated to the
register size. It happens to match pre-Xe2, but by considering it the
same in Xe2, we ended up reading pushed constants from the wrong place
when they didn't fit in the InlineData.
Fixes: 97b17aa0b1 ("brw/nir: rework inline_data_intel to work with compute")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31856>