Commit Graph

14203 Commits

Author SHA1 Message Date
Lionel Landwerlin 89f3ee4cb2 brw: remove debug printf
Fixes: fcf4401824 ("brw: handle wa_18019110168 with independent shader compilation")
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35815>
2025-06-29 12:39:03 +03:00
Calder Young 646977348b anv: Fix typo when checking format's extended usage flag
Fixes: f4c1753c1a ("anv: report color/storage features on YCbCr images with EXTENDED_USAGE")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35703>
2025-06-28 20:39:18 +00:00
Lionel Landwerlin a742b859bd anv: add support for handling wa_18019110168 with gfx-libs
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:35 +00:00
Lionel Landwerlin fcf4401824 brw: handle wa_18019110168 with independent shader compilation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:35 +00:00
Lionel Landwerlin bc8d18aee2 brw: make a helper for vertex attribute offset computation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:34 +00:00
Lionel Landwerlin 8fabcd754f brw: move primitive_id_index field in fs_msaa
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:34 +00:00
Lionel Landwerlin 6336cf0ea2 brw: store the remapping table for wa_18019110168 in constant data
That way it can be accessed at runtime.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:33 +00:00
Lionel Landwerlin e1a7eb1718 brw: extract out attribute register remapping
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:33 +00:00
Lionel Landwerlin 5cc66e2c8d anv/brw: move Wa_18019110168 handling to backend
We simplify the implementation by assuming the worse case, copying
entire per-vertex regions if necessary.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:32 +00:00
Lionel Landwerlin 8e7e0ef75a anv: make Wa_18019110168 deal with dynamic provoking vertex
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:32 +00:00
Lionel Landwerlin f0f4f9c566 brw: fix vertex attribute offset computation
The formula uses scalar indices (4bytes), not slots (16bytes).

We also incorrectly passed a scalar (vertex case) & slot (mesh case)
offset in the push constants. Use slots instead so that the value is
smaller and we can pack more stuff into fs_msaa_flags.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 18bbcf9a63 ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:31 +00:00
Lionel Landwerlin 4b5539a0cb brw: fix set_range on load_per_primitive_output
load intrinsics don't have range

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 18bbcf9a63 ("intel: introduce new VUE layout for separate compiled shader with mesh")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35103>
2025-06-28 05:55:31 +00:00
Matt Turner 6842a8179f intel: Add support for float16 as cooperative matrix accumulator
The number of passing tests in ./deqp-vk -n '*cooperative_matrix.khr*'
increases

- on PTL from 787 -> 914
- on RPL from 799 -> 926

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13304
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35616>
2025-06-27 01:26:22 +00:00
Matt Turner 6d786a0e4b brw: Use convert_cmat_intel intrinsic
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35616>
2025-06-27 01:26:22 +00:00
Matt Turner 41cd196886 brw: Implement convert_cmat_intel intrinsic
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35616>
2025-06-27 01:26:22 +00:00
Matt Turner 1215845b5b intel: Increase size of cooperative_matrix_configurations[] to 16
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35616>
2025-06-27 01:26:21 +00:00
Marek Olšák 1754507d49 nir: rename nir_lower_io_to_temporaries -> nir_lower_io_vars_to_temporaries
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>
2025-06-26 18:20:54 +00:00
Marek Olšák 1e03827c77 nir: rename nir_lower_io_arrays_to_elements -> nir_lower_io_array_vars_to_elements
same for *_no_indirects

Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>
2025-06-26 18:20:54 +00:00
Marek Olšák 12df9b3def nir: rename nir_vectorize_tess_levels -> nir_lower_tess_level_array_vars_to_vec
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>
2025-06-26 18:20:50 +00:00
Marek Olšák 2aa94caf82 nir: rename nir_lower_io_to_vector -> nir_opt_vectorize_io_vars
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>
2025-06-26 18:20:50 +00:00
Marek Olšák 439d805291 nir: rename nir_lower_io_to_scalar_early -> nir_lower_io_vars_to_scalar
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35760>
2025-06-26 18:20:49 +00:00
Ian Romanick b83f618fb2 brw: Fully write temporary destinations
Consider an innocuous instruction like:

    and(1) v250:UD, g0.3<0,1,0>:UD, 4294967264u NoMask group0

If register allocation decides to spill v250, it will see this
instruction and say, "Oh no! The other components of v250 aren't set, so
I'd better add a fill before that instruction!"

But it gets even worse than that... if register coalesce decided to
merge two of these, the live range gets massively extended because the
writes don't fully initialize the value. This causes the need to spill
these registers in the first place.

Changing that instruction to SIMD16 on Xe2 or SIMD8 on other platforms
alleviates these issues.

shader-db:

Lunar Lake
total instructions in shared programs: 17118324 -> 17113191 (-0.03%)
instructions in affected programs: 93701 -> 88568 (-5.48%)
helped: 42 / HURT: 6

total cycles in shared programs: 895422566 -> 895079488 (-0.04%)
cycles in affected programs: 30111338 -> 29768260 (-1.14%)
helped: 35 / HURT: 40

total spills in shared programs: 3588 -> 3304 (-7.92%)
spills in affected programs: 285 -> 1 (-99.65%)
helped: 10 / HURT: 0

total fills in shared programs: 2218 -> 1663 (-25.02%)
fills in affected programs: 556 -> 1 (-99.82%)
helped: 10 / HURT: 0

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results.  (Meteor Lake shown)
total instructions in shared programs: 20059218 -> 20053563 (-0.03%)
instructions in affected programs: 96938 -> 91283 (-5.83%)
helped: 43 / HURT: 6

total cycles in shared programs: 884174588 -> 883536475 (-0.07%)
cycles in affected programs: 22105268 -> 21467155 (-2.89%)
helped: 35 / HURT: 27

total spills in shared programs: 5032 -> 4679 (-7.02%)
spills in affected programs: 355 -> 2 (-99.44%)
helped: 12 / HURT: 0

total fills in shared programs: 4782 -> 4113 (-13.99%)
fills in affected programs: 671 -> 2 (-99.70%)
helped: 12 / HURT: 0

Skylake
total instructions in shared programs: 19097658 -> 19097665 (<.01%)
instructions in affected programs: 14202 -> 14209 (0.05%)
helped: 0 / HURT: 5

total cycles in shared programs: 862058109 -> 862058267 (<.01%)
cycles in affected programs: 3450244 -> 3450402 (<.01%)
helped: 7 / HURT: 11

fossil-db:

Lunar Lake
Totals:
Cycle count: 31439652246 -> 31439652272 (+0.00%)

Totals from 2 (0.00% of 707091) affected shaders:
Cycle count: 2602 -> 2628 (+1.00%)

No other Intel platforms had any fossil-db changes.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35721>
2025-06-26 17:59:47 +00:00
Caio Oliveira 30490de24a intel/executor: allow single line comments in macro lines
Assembler supports them, so allow them on @-macro lines.  For now
we don't bother with multiline comments, if becomes a thing we
can add them later.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35699>
2025-06-26 00:58:02 +00:00
Caio Oliveira d14fa6683b intel/executor: update SFID names in macros to match recent changes
After commit 88309a9818, SFID names were renamed

- "dp data 1" became "hdc1"
- "thread_spawner" became "ts/btd"

Update macros in executor to use the new SFID names so the
generated assembly can be parsed correctly.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35701>
2025-06-25 17:31:00 -07:00
Iván Briano d964b8d5fa anv: don't report custom sample locations for sample count 1
We can't actually enable MSAA for images with sample count 1, and
without MSAA active, the sample location machinery does not get used.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35504>
2025-06-24 19:44:34 +00:00
Matt Turner 6a47531440 intel: Generate files with newline at end
This generator scripts uses the `write` function that, unlike `print`,
doesn't print a trailing newline. So let's add one to the template.

Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35697>
2025-06-24 14:01:04 +00:00
Dave Airlie 29c599ffea anv: only expose VK_KHR_cooperative_matrix on devices with hw instructions.
Currently anv exposes this on lots of devices, with the intent to be better
than apps can give, but I think this is wrong for a couple of reasons.

Apps want to know if hw exposes the fast path, Vulkan is meant to be explicit,
and telling llama.cpp if the fast path exists lets it make smarter decisions.

It seems unless someone heavily optimises the slow path, that CPU is usually
faster than GPU with llama-bench unless the hw path exists.

v2: added INTEL_LOWER_DPAS support

Acked-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35564>
2025-06-23 21:06:51 +00:00
Konstantin Seurer 4cbbdc0a50 vulkan: Pass a structure to most BVH build callbacks
It is annoying to change all function signatures when a driver needs
more information. There are also some callbacks that have a lot of
parameters and there have already been bugs related to that.

This patch tries to clean the interface by adding a struct that contains
all information that might be relevant for the driver and passing that
to most callbacks.

radv changes are:
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>

anv changes are:
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>

turnip changes are:
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>

vulkan runtime changes are:

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>
2025-06-23 20:43:43 +00:00
Konstantin Seurer 28713789ad vulkan: Replace get_*_key with get_build_config
It is a cleaner API and gives more control about the build to the
driver.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35385>
2025-06-23 20:43:43 +00:00
José Roberto de Souza bdd20457ed anv: Emit STATE_COMPUTE_MODE before COMPUTE_WALKER when new async compute limits are needed
Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:25 +00:00
José Roberto de Souza b37747ce68 blorp: Emit STATE_COMPUTE_MODE before COMPUTE_WALKER
Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:25 +00:00
José Roberto de Souza 59d361043e intel/common: Use as much as possible spec recommended values for compute engine async thread limits
Spec recommended values should give us a good balance between progress
in render and compute engines, also with less possibility of values
it will reduce the number of times that we need to emit
STATE_COMPUTE_MODE reducing the number of stalls in the compute engine.

Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:25 +00:00
José Roberto de Souza 080b9a165c intel/common: Add function to compute optimal compute engine async thread limits
Spec has several restrictions to the values we program to compute
engine async thread limits.
Without those we risk hit deadlocks, so here adding a function to
return the optimal value based on those restrictions.

Cc: stable
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35563>
2025-06-23 18:57:24 +00:00
Eric Engestrom 99e8d804bf intel/compiler tests: fix variable type for getopt_long() return value
`getopt_long()` returns an `int`, not a `char`; putting the value in
a `char` before comparing it to `-1` was making the comparison always
fail, resulting in the invalid codepath taken that then fails with:

    option `-' is invalid: ignored

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34756>
2025-06-23 08:26:29 +00:00
Eric Engestrom f545f9eed4 intel/compiler tests: fix "is there something after the options" check
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34756>
2025-06-23 08:26:29 +00:00
Eric Engestrom 729922cdae intel/compiler tests: fix path-to-string conversion
cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34756>
2025-06-23 08:26:29 +00:00
Eric Engestrom de6ab1beda intel/compiler tests: rewrite subprocess handling in run-test.py
`subprocess.Popen()` returns immediately, and the subprocess might not
have finished by the time `stdout` is read on the next line, spuriously
failing the tests.

`subprocess.check_output()` makes sure the output is available before
returning, solving this issue; it additionally raises an error if the
subprocess failed, giving a better error than a failed diff later in the
script.

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34756>
2025-06-23 08:26:29 +00:00
Georg Lehmann 9da23499ff compiler: add float8 glsl types
e4m3fn: 8bit floating point format with 4bit exponent, 3bit mantissa
        and no infinities (finite only)
e5m2:   8bit floating point format with 5bit exponent, 2bit mantissa
        and with infinities.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35434>
2025-06-23 07:59:24 +00:00
Lionel Landwerlin 4e25a4ce1e intel/ci: document a couple of vkd3d failures
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35679>
2025-06-23 05:45:25 +00:00
Lionel Landwerlin 786bace191 anv: fix sampler hashing in set layouts
The logic needs to handle embedded samplers without conversion state.

Fixes vkd3d-proton's test_sampler_border_color

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35679>
2025-06-23 05:45:25 +00:00
Lionel Landwerlin 32b53a7c6a anv: fix clears on single aspect of YCbCr images
Fixes vkd3d-proton's test_planar_video_formats

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35679>
2025-06-23 05:45:25 +00:00
Lionel Landwerlin 691ac65000 isl: handle DISABLE_AUX in get_mcs_surf
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35679>
2025-06-23 05:45:24 +00:00
Yiwei Zhang 64326d0be5 anv: use common vk_android_get_front_buffer_usage helper
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35568>
2025-06-23 00:46:42 +00:00
Rohan Garg e103afe7be brw: run the nir_opt_offsets pass and set the maximum offset size
Perf A/B testing on DG2: no changes
Perf A/B testing on BMG: +2.1% Blackops3, +1.5% Cyberpunk

DG2 stats (mostly insignificant):
Assassins Creed Valhalla:
 Totals from 1169 (55.67% of 2100) affected shaders:
 Instrs: 509237 -> 509215 (-0.00%)
 Cycle count: 30614325 -> 30607419 (-0.02%); split: -0.03%, +0.00%
 Non SSA regs after NIR: 83434 -> 85909 (+2.97%)

Blackops 3:
 Totals from 1045 (64.63% of 1617) affected shaders:
 Instrs: 527312 -> 527310 (-0.00%)
 Cycle count: 496912222 -> 496902846 (-0.00%); split: -0.00%, +0.00%
 Non SSA regs after NIR: 106883 -> 109095 (+2.07%)

Cyberpunk:
 Totals from 706 (56.03% of 1260) affected shaders:
 Instrs: 345976 -> 345974 (-0.00%); split: -0.00%, +0.00%
 Cycle count: 9775138 -> 9775472 (+0.00%); split: -0.00%, +0.00%
 Max live registers: 40295 -> 40297 (+0.00%)
 Non SSA regs after NIR: 93245 -> 94718 (+1.58%)

Fortnite:
 Totals from 4210 (55.98% of 7521) affected shaders:
 Instrs: 2205471 -> 2205469 (-0.00%)
 Cycle count: 91451040 -> 91450956 (-0.00%); split: -0.00%, +0.00%
 Non SSA regs after NIR: 952354 -> 961664 (+0.98%)

LNL stats (notable changes):
Assassins Creed Valhalla:
 Totals from 1684 (83.57% of 2015) affected shaders:
 Instrs: 774305 -> 764501 (-1.27%); split: -1.27%, +0.01%
 Cycle count: 58845842 -> 58699250 (-0.25%); split: -0.98%, +0.73%
 Spill count: 625 -> 638 (+2.08%)
 Fill count: 1490 -> 1503 (+0.87%)
 Scratch Memory Size: 41984 -> 44032 (+4.88%)
 Max live registers: 196424 -> 197561 (+0.58%); split: -0.10%, +0.68%

Blackops 3:
 Totals from 1125 (76.53% of 1470) affected shaders:
 Instrs: 781749 -> 773275 (-1.08%); split: -1.08%, +0.00%
 Subgroup size: 22896 -> 22912 (+0.07%)
 Cycle count: 659864454 -> 654641032 (-0.79%); split: -1.10%, +0.31%
 Max live registers: 116772 -> 116854 (+0.07%); split: -0.01%, +0.08%
 Non SSA regs after NIR: 172648 -> 168260 (-2.54%); split: -2.55%, +0.01%

Control:
 Totals from 378 (51.50% of 734) affected shaders:
 Instrs: 148184 -> 147544 (-0.43%)
 Cycle count: 6905200 -> 6913366 (+0.12%); split: -0.30%, +0.42%
 Max live registers: 41271 -> 41281 (+0.02%)
 Non SSA regs after NIR: 44964 -> 43868 (-2.44%); split: -2.45%, +0.01%

Cyberpunk:
 Totals from 1141 (92.46% of 1234) affected shaders:
 Instrs: 636744 -> 629333 (-1.16%)
 Subgroup size: 24256 -> 24272 (+0.07%)
 Cycle count: 24952258 -> 24801298 (-0.60%); split: -1.39%, +0.78%
 Max live registers: 125848 -> 126855 (+0.80%); split: -0.00%, +0.80%
 Non SSA regs after NIR: 127399 -> 119837 (-5.94%); split: -5.95%, +0.02%

Fortnite:
 Totals from 5497 (83.52% of 6582) affected shaders:
 Instrs: 4072831 -> 4041852 (-0.76%); split: -0.77%, +0.01%
 Subgroup size: 103296 -> 103312 (+0.02%)
 Cycle count: 133046874 -> 132789242 (-0.19%); split: -0.67%, +0.48%
 Spill count: 7218 -> 7254 (+0.50%); split: -0.33%, +0.83%
 Fill count: 11724 -> 11749 (+0.21%); split: -0.34%, +0.55%
 Scratch Memory Size: 591872 -> 599040 (+1.21%)
 Max live registers: 816530 -> 818522 (+0.24%); split: -0.01%, +0.26%
 Non SSA regs after NIR: 1610296 -> 1560284 (-3.11%); split: -3.11%, +0.00%

Hitman3:
 Totals from 4713 (92.39% of 5101) affected shaders:
 Instrs: 2731598 -> 2698224 (-1.22%)
 Cycle count: 186422098 -> 185472640 (-0.51%); split: -1.12%, +0.61%
 Spill count: 3244 -> 3242 (-0.06%)
 Fill count: 9937 -> 9933 (-0.04%)
 Max live registers: 585035 -> 589801 (+0.81%); split: -0.00%, +0.82%
 Non SSA regs after NIR: 347681 -> 324314 (-6.72%); split: -6.73%, +0.01%

Hogwarts Legacy:
 Totals from 930 (59.81% of 1555) affected shaders:
 Instrs: 464146 -> 459526 (-1.00%); split: -1.00%, +0.01%
 Subgroup size: 19104 -> 19120 (+0.08%)
 Cycle count: 24062460 -> 24078964 (+0.07%); split: -0.49%, +0.56%
 Spill count: 2068 -> 1964 (-5.03%); split: -5.22%, +0.19%
 Fill count: 2342 -> 2205 (-5.85%); split: -6.40%, +0.56%
 Scratch Memory Size: 147456 -> 141312 (-4.17%)
 Max live registers: 112384 -> 112787 (+0.36%); split: -0.08%, +0.44%
 Non SSA regs after NIR: 80293 -> 79161 (-1.41%); split: -1.72%, +0.32%

Metro Exodus:
 Totals from 29755 (78.62% of 37846) affected shaders:
 Instrs: 11495578 -> 11492951 (-0.02%); split: -0.02%, +0.00%
 Subgroup size: 644688 -> 644704 (+0.00%)
 Cycle count: 301572068 -> 301548054 (-0.01%); split: -0.03%, +0.02%
 Max live registers: 3369504 -> 3370454 (+0.03%); split: -0.00%, +0.03%
 Non SSA regs after NIR: 2476561 -> 2396090 (-3.25%); split: -3.27%, +0.02%

Red Dead Redemption 2:
 Totals from 4161 (78.61% of 5293) affected shaders:
 Instrs: 2428782 -> 2409032 (-0.81%); split: -0.82%, +0.00%
 Subgroup size: 85344 -> 85360 (+0.02%)
 Cycle count: 8514984142 -> 8533415324 (+0.22%); split: -0.02%, +0.23%
 Spill count: 4659 -> 4674 (+0.32%); split: -0.02%, +0.34%
 Fill count: 11236 -> 11231 (-0.04%); split: -0.19%, +0.14%
 Scratch Memory Size: 398336 -> 397312 (-0.26%)
 Max live registers: 473946 -> 475798 (+0.39%); split: -0.08%, +0.47%
 Non SSA regs after NIR: 616820 -> 567706 (-7.96%); split: -8.09%, +0.12%

Rise Of The Tomb Raider:
 Totals from 68 (46.58% of 146) affected shaders:
 Instrs: 28209 -> 27801 (-1.45%)
 Subgroup size: 1584 -> 1600 (+1.01%)
 Cycle count: 16182992 -> 16249364 (+0.41%); split: -0.97%, +1.38%
 Max live registers: 7320 -> 7296 (-0.33%); split: -0.38%, +0.05%
 Non SSA regs after NIR: 8438 -> 8207 (-2.74%); split: -2.82%, +0.08%

Spiderman Remastered:
 Totals from 6403 (93.87% of 6821) affected shaders:
 Instrs: 5662713 -> 5597949 (-1.14%); split: -1.28%, +0.14%
 Cycle count: 282861519016 -> 279806958122 (-1.08%); split: -1.26%, +0.18%
 Spill count: 61150 -> 60754 (-0.65%); split: -1.13%, +0.48%
 Fill count: 162597 -> 163190 (+0.36%); split: -0.84%, +1.21%
 Scratch Memory Size: 5834752 -> 5804032 (-0.53%); split: -0.70%, +0.18%
 Max live registers: 901926 -> 903820 (+0.21%); split: -0.01%, +0.22%
 Non SSA regs after NIR: 555053 -> 521016 (-6.13%); split: -6.14%, +0.01%

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:24 +00:00
Rohan Garg 8a5e062e5e brw: store the buffer offset for load/store intrinsics
This will later be encoded by the backend into the
LSC extended descriptor message.

Reworks:
  * Sagar: Add nir_intrinsic_ssbo_atomic_swap

Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:24 +00:00
Rohan Garg 0186113640 brw: encode the offset into the message descriptor for Xe2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:24 +00:00
Rohan Garg 937d37f0b1 brw: introduce MEMORY_LOGICAL_ADDRESS_OFFSET to encode address offsets
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:24 +00:00
Lionel Landwerlin d5a58364b1 brw: add new helper for immediate integer register with type
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:24 +00:00
Lionel Landwerlin 16fca611d7 nir: add new intel ssbo intrinsics
Similar to ir3 ones, to optimize offsets in the backend.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:23 +00:00
Lionel Landwerlin ba119c73c6 intel: replace RANGE_BASE by BASE for uniform block loads
We're not currently using RANGE_BASE and we'll use BASE for offset
optimizations on Xe2+.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35252>
2025-06-22 10:55:23 +00:00