Commit Graph

14540 Commits

Author SHA1 Message Date
Lionel Landwerlin 262baafe27 anv: fix partial queries
Partial results should be computed for all types of queries.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36916>
2025-09-04 13:25:26 +03:00
Sagar Ghuge ebbc358db5 blorp: Emit state cache invalidation after every compute dispatch
Implement HSD 16028171704/14025112257:
   LSC state cache livelock:- Once state cache entries are full,
   subsequent walker dispatches with two threads per thread group maybe
   gets stuck infinitely because of state cache live lock.

   One thread continuously stuck in loop doing UGM fence + evict and UGM
   read is waiting on UGM read to have certain value. while other thread
   supposed to update the value that first thread is waiting for. But
   since entries are full in state cache, there is second thread never
   make progress.

Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
2025-09-04 00:14:48 +00:00
Sagar Ghuge 3e0ad0176b anv: Emit state cache invalidation after every compute dispatch
Implement HSD 16028171704/14025112257:
   LSC state cache livelock:- Once state cache entries are full,
   subsequent walker dispatches with two threads per thread group maybe
   gets stuck infinitely because of state cache live lock.

   One thread continuously stuck in loop doing UGM fence + evict and UGM
   read is waiting on UGM read to have certain value. while other thread
   supposed to update the value that first thread is waiting for. But
   since entries are full in state cache, there is second thread never
   make progress.

Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
2025-09-04 00:14:48 +00:00
Caio Oliveira 4e253184de brw: Run validation as soon as we have the CFG around
Fixes: affa7567c2 ("intel/brw: Add phases to backend")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37148>
2025-09-03 20:42:05 +00:00
Yiwei Zhang c0e51bcf24 anv: fix broken utrace
The non-compute end flag should be INTEL_DS_TRACEPOINT_FLAG_END_OF_PIPE.
This fixes the broken anv utrace for anything non-compute that can
potentially overlap (execute in parallel).

Fixes: 6281b207db ("anv: add tracepoints timestamp mode for empty dispatches")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37155>
2025-09-03 08:12:28 +00:00
Calder Young a8e64e83c2 anv: Update video test expectations for layered_dpb
Remove all layered_dpb fails that have a passing separated_dpb equivalent

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young 0b911356e5 anv: Report disjoint images as unsupported for video usage
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young 9bbb68a817 anv: Add support for using layered surfaces in VP9 video decoding
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young d0bf3a96f6 anv: Add support for using layered surfaces in AV1 video decoding
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young 30b763f6e2 anv: Add support for using layered surfaces in H.264 and H.265 video coding
Reviewed-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young 3fb25cc78a anv: Add support for creating layered surfaces for video encode/decode
Layered surfaces (array textures) with video encode/decode usage bits
will have their slices aligned to make them addressable to the media
engine. Multi-planar layered surfaces will be stored with their slices
interleaved so that a relative offset can be programmed between the
gamma and chroma slices.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Calder Young 73608eb8b7 isl: Add support for creating layered surfaces for video encode/decode
Adds support for creating layered surfaces with slices that are addressable
to the media engine for video encoding and decoding.

Co-authored-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
2025-09-03 06:41:44 +00:00
Lionel Landwerlin 0e198f796c anv/utrace: avoid memseting timestamp buffers by using tracepoint flags
Using the flag we can deduce how the timestamp was written and avoid
guessing when reading back.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13806
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37111>
2025-09-02 21:59:56 +00:00
Lionel Landwerlin f262865a90 anv: fix pipeline barriers with pre-rasterization stages
Pre-rasterization stages need a CS stall if they need to wait on the
flushes from a PIPE_CONTROL.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37132>
2025-09-02 20:13:11 +00:00
Tapani Pälli 4035520ca9 anv: change some image qualifiers as coherent for Last Of Us
This fixes graphics artifacts happening with particular shader.

This 'heuristic' hits few very similar shaders but should provide better
performance than current fix to turn off caching from all shaders.

Cc: mesa-stable
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35929>
2025-09-02 11:04:35 +00:00
Renato Pereyra 443446aa82 anv: Enable anv_emulate_read_without_format for Android 15+
shaderStorageImageReadWithoutFormat is required by Android 15+

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37073>
2025-08-29 22:36:12 +00:00
Tim Van Patten c585341552 intel/ds: Skip expensive timestamp query until necessary
The Xe ioctl DRM_XE_DEVICE_QUERY_ENGINE_CYCLES provides accurate
timestamps correlated between the CPU and GPU. However, it is slow and
impacts performance while collecting Perfetto traces.

Instead, use Perfetto's GetBootTimeNs() to track when to emit the
BUILTIN_CLOCK_BOOTTIME clock sync event so it only occurs every 1
second. This reduces the impact of recording gpu.renderstages from
-8% to -4%.

More concretely, FPS measurements when tracing Unity BoatAttack demo on
an Intel ADL device:

* gpu.renderstages disabled:            48.044293667
* gpu.renderstages enabled:             38.119778333 (-20.66%)
* gpu.renderstages enabeled + this fix: 42.641818333 (-11.24%)

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37095>
2025-08-29 21:34:43 +00:00
Sagar Ghuge 90daa80d1d anv: Apply pipe flushes for outstanding PC bits
Apply any outstanding accumulated PC bits before we proceed on building
Acceleration Structure.

2 reasons for this :
   - some of the data accessed by the build might need to be flushed
     as a result of a previous barrier
   - the scratch buffer might get reused between builds

Cc: mesa-stable
Closes: #13711
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Tested-by: Caleb Callaway <caleb.callaway@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36951>
2025-08-29 20:19:45 +00:00
Lionel Landwerlin 23a4aef14a Revert "brw: move texture offset packing to NIR"
This reverts commit 4346210ae6.

Fixes: 4346210ae6 ("brw: move texture offset packing to NIR")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37050>
2025-08-29 06:29:14 +00:00
Lionel Landwerlin 1f279e6a08 Revert "anv: enable non uniform texture offset lowering"
This reverts commit 23de5abcb5.

Fixes: 23de5abcb5 ("anv: enable non uniform texture offset lowering")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37050>
2025-08-29 06:29:14 +00:00
Lionel Landwerlin d0e1dffcb7 anv: temporary disable KHR_maintenance8
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 47cfc77085 ("anv: expose VK_KHR_maintenance8 support")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37050>
2025-08-29 06:29:13 +00:00
Ian Romanick 49141ad5f2 brw: Strategically place flags initialization to help cmod prop
v2: Rebase on ac2b072312 ("brw: Add more specific brw_builder
helpers"), and fix a bug that caused the new instruction to possibly be
put in the wrong place.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233675305 -> 233641585 (-0.01%)
Cycle count: 32593658094 -> 32591467794 (-0.01%); split: -0.01%, +0.00%

Totals from 33513 (4.25% of 789264) affected shaders:
Instrs: 5200332 -> 5166612 (-0.65%)
Cycle count: 1499831128 -> 1497640828 (-0.15%); split: -0.15%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Ian Romanick 3018849535 brw: Don't emit redundant flags initialization for subgroup op lowering
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233676039 -> 233675305 (-0.00%)
Cycle count: 32594097814 -> 32593658094 (-0.00%); split: -0.00%, +0.00%

Totals from 325 (0.04% of 789264) affected shaders:
Instrs: 104491 -> 103757 (-0.70%)
Cycle count: 1183870034 -> 1183430314 (-0.04%); split: -0.04%, +0.00%

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Ian Romanick 4a238f461d brw: Do cmod prop again after brw_lower_subgroup_ops
shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 17114300 -> 17114294 (<.01%)
instructions in affected programs: 3617 -> 3611 (-0.17%)
helped: 6 / HURT: 0

total cycles in shared programs: 886397556 -> 886397454 (<.01%)
cycles in affected programs: 511400 -> 511298 (-0.02%)
helped: 6 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 233683694 -> 233676039 (-0.00%); split: -0.00%, +0.00%
Cycle count: 32602038466 -> 32594097814 (-0.02%); split: -0.03%, +0.01%
Spill count: 540908 -> 540704 (-0.04%)
Fill count: 700935 -> 700258 (-0.10%)

Totals from 2200 (0.28% of 789264) affected shaders:
Instrs: 2062360 -> 2054705 (-0.37%); split: -0.37%, +0.00%
Cycle count: 2506073282 -> 2498132630 (-0.32%); split: -0.41%, +0.09%
Spill count: 14423 -> 14219 (-1.41%)
Fill count: 34219 -> 33542 (-1.98%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 263545171 -> 263543341 (-0.00%); split: -0.00%, +0.00%
Cycle count: 26480835985 -> 26484748317 (+0.01%); split: -0.01%, +0.03%
Spill count: 554335 -> 554338 (+0.00%)
Fill count: 645486 -> 645498 (+0.00%)

Totals from 610 (0.07% of 903944) affected shaders:
Instrs: 1139871 -> 1138041 (-0.16%); split: -0.17%, +0.01%
Cycle count: 2274612327 -> 2278524659 (+0.17%); split: -0.15%, +0.33%
Spill count: 15153 -> 15156 (+0.02%)
Fill count: 36831 -> 36843 (+0.03%)

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 268713723 -> 268712817 (-0.00%); split: -0.00%, +0.00%
Cycle count: 24653238085 -> 24652269669 (-0.00%); split: -0.00%, +0.00%
Fill count: 671369 -> 671361 (-0.00%)

Totals from 666 (0.07% of 899711) affected shaders:
Instrs: 924423 -> 923517 (-0.10%); split: -0.11%, +0.01%
Cycle count: 840380565 -> 839412149 (-0.12%); split: -0.13%, +0.02%
Fill count: 13006 -> 12998 (-0.06%)

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
2025-08-28 22:08:20 +00:00
Lionel Landwerlin c0cfd16da6 anv: move input coverage mask setup to runtime flush
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37060>
2025-08-28 19:08:33 +00:00
Caio Oliveira 84963d6833 intel/brw: Take shader in the brw_generator::generate_code() parameters
Simplify the calls in all the stage compile functions.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00
Caio Oliveira c19a4150b5 intel/brw: Simplify variant tracking in brw_compile_fs
Remove the cfg variables and use the shader pointers directly.  Reset
the variant pointer if a shader failed or will not be used.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00
Caio Oliveira 834e30d244 intel/brw: Simplify tracking of dispatch_width_limit in brw_compile_fs
Keep it in a variable, that way don't need to check which shader to look
for the limit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:20 +00:00
Caio Oliveira 9d53e27579 intel/brw: Remove brw_shader::import_uniforms()
The brw_shader::uniforms now is derived from the nir_shader.  The
only exception is compute shaders for older Gfx versions, so we
move the adjust logic for that.

The benefit here is untangling the code for compilation variants,
that before needed to keep track of the first that compiled to,
in most cases, copy an integer.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:19 +00:00
Caio Oliveira 0b4d62d340 anv: Allocate prog_data->param array when making internal kernels
As we set prog_data->nr_params, allocate the array like elsewhere.
Current code is getting by because the logic for adding a new element
will realloc it.  But later changes will make the array be accessed
before this reallocation.

This will make sure later patches won't cause tests like

  dEQP-VK.query_pool.statistics_query.compute_shader_invocations.32bits_cmdcopyquerypoolresults_secondary

to fail in gfxver < 125.  Note the bug appears when DRI option
to tweak the thresold to use these shaders is set to 0.  This is
done by the GitLab CI, which allowed testing later patches to find
this issue.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:19 +00:00
Caio Oliveira b8a35a8a27 brw: Pass per_primitive_offset in brw_shader_params
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:19 +00:00
Caio Oliveira 6ca9021758 brw: Add brw_shader_params
And unify the initialization code for brw_shader.  Avoid passing
brw_compile_params since for a single compilation we might have
multiple shaders (the case for BS stage).

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
2025-08-28 00:06:18 +00:00
Caio Oliveira 1c933b6511 brw: Fix checking sources of wrong instruction in opt_address_reg_load
Fixes: 8ac7802ac8 ("brw: move final send lowering up into the IR")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37019>
2025-08-27 22:50:23 +00:00
Lionel Landwerlin 93996c07e2 brw: fix broadcast opcode
The problem with the current code is that there is a disconnect between :
   - the virtual register size allocated
   - the dispatch size
   - the size_written value

Only the last 2 are in sync and this confuses the spiller that only
looks at the destination register allocation & dispatch size to figure
out how much to spill.

The solution in this change is to make BROADCAST more like
MOV_INDIRECT, so that you can do a BROADCAST(8) that actually reads a
SIMD32 register. We put the size of the register read into src2.

Now the spiller sees correct read/write sizes just looking at the
destination register & dispatch size.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 662339a2ff ("brw/build: Use SIMD8 temporaries in emit_uniformize")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13614
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36564>
2025-08-28 00:23:44 +03:00
Lionel Landwerlin e6ca709a4e brw: fix INTEL_DEBUG=spill_fs
We need to dirty the instruction BRW_DEPENDENCY_INSTRUCTIONS &
BRW_DEPENDENCY_VARIABLES if anything was spilled.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: a6b0783375 ("brw: Use brw_ip_ranges in scheduling / regalloc")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13233
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36925>
2025-08-27 15:08:35 +00:00
Tapani Pälli ad2ef16198 iris/anv: toggle on CACHE_MODE_0::MsaaFastClearEnabled on BMG G31
This increases rate of depth fast clear rate on BMG G31
per HSD 22020044224.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35966>
2025-08-26 19:35:34 +00:00
Tapani Pälli c65f5cd36d intel/dev: provide a helper to detect bmg g31 device
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35966>
2025-08-26 19:35:33 +00:00
Tapani Pälli 2c9bc313a0 intel/genxml: update CACHE_MODE_0 register for gfx200
Field that we currently utilize does not change place, however
there are some new fields so let's update contents to match spec.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35966>
2025-08-26 19:35:33 +00:00
Lionel Landwerlin 3362b8dcb5 brw: use a scalar builder for the load_payload on transpose loads
I noticed SIMD32 shaders have that kind of pattern :

mov(32)         g94<1>D         0D                              { align1 WE_all };
send(1)         g15UD           g94UD           nullUD          0x6210d500                0x02010000
                ugm MsgDesc: ( load, a32, d32, V16, transpose, L1STATE_L3MOCS dst_len = 1, src0_len = 1, src1_len = 0 bti )  BTI 2  base_offset 16  { align1 WE_all 1N I@5 $1 };

Why use a 32 wide register for a SEND that is only going to read the first lane?

We can stick a single physical register and reduce register pressure.

DG2 fossils-db results :

Totals:
Instrs: 157417515 -> 157417796 (+0.00%); split: -0.00%, +0.00%
Cycle count: 15362185116 -> 15363086774 (+0.01%); split: -0.05%, +0.05%
Max live registers: 29059141 -> 29051166 (-0.03%)
Max dispatch width: 5071256 -> 5075720 (+0.09%); split: +0.33%, -0.24%

Totals from 82132 (14.43% of 569221) affected shaders:
Instrs: 26564632 -> 26564913 (+0.00%); split: -0.00%, +0.00%
Cycle count: 4630907475 -> 4631809133 (+0.02%); split: -0.16%, +0.18%
Max live registers: 5425037 -> 5417062 (-0.15%)
Max dispatch width: 128384 -> 132848 (+3.48%); split: +12.92%, -9.45%

LNL fossils-db results :

Totals:
Instrs: 141870413 -> 141870745 (+0.00%); split: -0.00%, +0.00%
Cycle count: 20176018818 -> 20191262632 (+0.08%); split: -0.07%, +0.14%
Max live registers: 44858167 -> 44838370 (-0.04%)

Totals from 51859 (10.55% of 491590) affected shaders:
Instrs: 16834547 -> 16834879 (+0.00%); split: -0.00%, +0.00%
Cycle count: 5761980106 -> 5777223920 (+0.26%); split: -0.24%, +0.50%
Max live registers: 5893878 -> 5874081 (-0.34%)

Perf A/B testing only reported a 0.5% improvement on DG2 on one trace, no changes on BMG.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>
2025-08-26 12:03:22 +00:00
Lionel Landwerlin 27c69acb6a brw: remove uniform from opt_offsets
Those are for push constants, no point in doing that because :
   - there is no HW constant offsets in push constants (payload
     delivery), it's just register offset calculation
   - if we have an dynamic value it's already using MOV_INDIRECT

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e103afe7be ("brw: run the nir_opt_offsets pass and set the maximum offset size")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>
2025-08-26 12:03:22 +00:00
Sagar Ghuge 2cd564c1de anv: Add missing L3 flushes
We are reading out some of the parameters from IR data structure those
have been written previously, on some platforms L3 is not coherent, so
explicitly add those flushes.

Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36952>
2025-08-25 17:36:08 +00:00
Sagar Ghuge 4473e21e2f anv: Enable CS stall for ACCELERATION_STRUCTURE_COPY stage
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36952>
2025-08-25 17:36:08 +00:00
Sagar Ghuge 75d770b4f8 anv: Add missing ACCELERATION_STRUCTURE_READ in barrier handling
Cc: mesa-stable
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36952>
2025-08-25 17:36:08 +00:00
Eric Engestrom fa74e939bf ci/piglit: automatically use LAVA proxy
This avoids having to hardcode the proxy in the traces `download-url` or
jobs setting `PIGLIT_REPLAY_EXTRA_ARGS` and accidentally overriding the
default args when the author meant to append.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36955>
2025-08-25 14:52:38 +00:00
Konstantin Seurer 9df7b48d2f nir: Use nir_def_as_* in more places
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36746>
2025-08-24 14:03:09 +00:00
Yiwei Zhang dcffe932a0 anv: adopt common GetAndroidHardwareBufferPropertiesANDROID
ANV currently carries a partial copy of the gralloc mapper's format
resolving code, while the ground truth solely resides inside the
gralloc. The local copy is delicate and unable to maintain compatibility
with different gralloc implementations because AHB formats like
Y8Cb8Cr8_420 and IMPLEMENTATION_DEFINED are flexible formats, and can be
resolved to different underlying drm fourcc formats depending on the
usage and media IPs.

The common impl is more correct as it relies on the info from gralloc
mapper side, and it only sets the minimal set of explicit formats to
avoid hitting spec corner case of allocating out AHB with flexible
formats (missing half of the media usage bits might end up allocating
something different that potentially get resolved to a different
VkFormat as well).

Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
2025-08-22 23:40:35 +00:00
Yiwei Zhang a34eb09c89 anv: drop anv_ahb_format_for_vk_format
The vk_image::ahb_format is for drivers that support more than the
common explicit AHB formats. It is used on AHB image memory export
allocation path, and more specifically vk_device_memory_create will
use that AHB format to allocate the AHB out from gralloc. To be noted,
export allocation path only deals with explicit format but not external
format. So even with the obsolete HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL
private format, we don't need such either as multi-planar formats are
supposed to be reported as external format.

Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
2025-08-22 23:40:35 +00:00
Yiwei Zhang ef885eb9ac anv: adopt vk_android_get_ahb_image_properties
The current impl misses the probe against gralloc mapper, which is the
required handshake before advertising support. For simplicity, just
adopt the common AHB helper. It does not rely on driver specific format
mapping, since the query doesn't allow external format at all.

Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
2025-08-22 23:40:34 +00:00
Yiwei Zhang 3b19aa6261 anv: avoid setting image format twice for AHB image
AHB images are created with the right VkFormat when external format
isn't used. When external format does get used, the proper VkFormat has
already being set in the common runtime. Upon AHB props query, we
resolve external format to VkFormat and set to the externalFormat field
to be used by the app. The app would than chain the exact external
format when creating the AHB image if it wants to go down the external
format code path instead of being explicit. So in the end, the format we
resolve is the format we get. Thus no need to set it twice.

Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
2025-08-22 23:40:34 +00:00
Yiwei Zhang b6427520d6 anv: drop obsolete anv_create_ahw_memory
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
2025-08-22 23:40:33 +00:00