Ian Romanick
e1bb53bb3c
nir/algebraic: Optimize some trivial bfi
...
In fossil-db, one big compute shader on Hogwarts Legacy is helped for
spills and fills. It has a lot of instances of bfi(0x3f, a, a).
On Tiger Lake and Skylake, a compute shader in Unicom that has a
single instance of this pattern is hurt for spills and fills. I think
this is just due to non-determinism in the register allocation
algorithm.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16992643 -> 16992548 (<.01%)
instructions in affected programs: 17533 -> 17438 (-0.54%)
helped: 33 / HURT: 0
total cycles in shared programs: 914313986 -> 914316238 (<.01%)
cycles in affected programs: 3734544 -> 3736796 (0.06%)
helped: 26 / HURT: 6
fossil-db:
Lunar Lake, Meteor Lake, DG2, and Ice Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 208952780 -> 208952537 (-0.00%)
Send messages: 10934879 -> 10934875 (-0.00%)
Cycle count: 30988230904 -> 30988228660 (-0.00%); split: -0.00%, +0.00%
Spill count: 534864 -> 534843 (-0.00%)
Fill count: 667081 -> 667068 (-0.00%)
Max live registers: 65686656 -> 65686624 (-0.00%)
Non SSA regs after NIR: 244185358 -> 244185335 (-0.00%)
Totals from 3 (0.00% of 704834) affected shaders:
Instrs: 4708 -> 4465 (-5.16%)
Send messages: 234 -> 230 (-1.71%)
Cycle count: 264382 -> 262138 (-0.85%); split: -0.88%, +0.03%
Spill count: 91 -> 70 (-23.08%)
Fill count: 73 -> 60 (-17.81%)
Max live registers: 647 -> 615 (-4.95%)
Non SSA regs after NIR: 3957 -> 3934 (-0.58%)
Tiger Lake
Totals:
Instrs: 230516919 -> 230515185 (-0.00%); split: -0.00%, +0.00%
Send messages: 12657684 -> 12657680 (-0.00%)
Cycle count: 23060318600 -> 23060279758 (-0.00%); split: -0.00%, +0.00%
Spill count: 548462 -> 548446 (-0.00%); split: -0.00%, +0.00%
Fill count: 582304 -> 582294 (-0.00%); split: -0.00%, +0.00%
Scratch Memory Size: 19538944 -> 19539968 (+0.01%)
Max live registers: 41713622 -> 41713593 (-0.00%)
Non SSA regs after NIR: 260667939 -> 260667712 (-0.00%); split: -0.00%, +0.00%
Totals from 174 (0.02% of 794323) affected shaders:
Instrs: 158346 -> 156612 (-1.10%); split: -1.13%, +0.04%
Send messages: 14330 -> 14326 (-0.03%)
Cycle count: 24859875 -> 24821033 (-0.16%); split: -0.32%, +0.16%
Spill count: 183 -> 167 (-8.74%); split: -9.29%, +0.55%
Fill count: 284 -> 274 (-3.52%); split: -7.39%, +3.87%
Scratch Memory Size: 9216 -> 10240 (+11.11%)
Max live registers: 12587 -> 12558 (-0.23%)
Non SSA regs after NIR: 164466 -> 164239 (-0.14%); split: -0.16%, +0.02%
Skylake
Totals:
Instrs: 158904982 -> 158903764 (-0.00%)
Send messages: 8490500 -> 8490496 (-0.00%)
Cycle count: 19732284279 -> 19732345496 (+0.00%); split: -0.00%, +0.00%
Spill count: 519127 -> 519115 (-0.00%)
Fill count: 594283 -> 594290 (+0.00%); split: -0.00%, +0.00%
Max live registers: 33708764 -> 33708739 (-0.00%)
Non SSA regs after NIR: 169377234 -> 169377007 (-0.00%); split: -0.00%, +0.00%
Totals from 174 (0.03% of 648725) affected shaders:
Instrs: 160391 -> 159173 (-0.76%)
Send messages: 14354 -> 14350 (-0.03%)
Cycle count: 24776486 -> 24837703 (+0.25%); split: -0.07%, +0.32%
Spill count: 332 -> 320 (-3.61%)
Fill count: 587 -> 594 (+1.19%); split: -0.17%, +1.36%
Max live registers: 12709 -> 12684 (-0.20%)
Non SSA regs after NIR: 166557 -> 166330 (-0.14%); split: -0.16%, +0.02%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32493 >
2024-12-05 21:39:07 +00:00
José Roberto de Souza
04bdbeec31
intel/dev/xe: Fix access to eu_per_dss_mask
...
DRM_XE_TOPO_EU_PER_DSS and DRM_XE_TOPO_SIMD16_EU_PER_DSS can be any
number of bytes long but it was assuming it was always 4 bytes long.
That was not a issue because Xe KMD return 4 bytes even if only needs
1 or 2 bytes but that is a problem with our HW simulator that was
returning 2 bytes.
Fixes: a24d93aa89 ("intel/dev: Query and compute hardware topology for Xe")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Signed-off-by: José Roberto de Souza <jose.souza@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32307 >
2024-12-05 20:30:44 +00:00
Lionel Landwerlin
371b7a9b0d
anv: set pipeline flags correct for imported libs
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Fixes: 3d49cdb71e ("anv: implement VK_EXT_graphics_pipeline_library")
Reviewed-by: Ivan Briano <ivan.briano@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32507 >
2024-12-05 19:53:34 +00:00
Lionel Landwerlin
6e396b400a
anv: fix missing bindings valid dynamic state change check
...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Fixes: 9ddd296cd3 ("anv: implement VK_EXT_vertex_input_dynamic_state")
Reviewed-by: Ivan Briano <ivan.briano@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32507 >
2024-12-05 19:53:34 +00:00
Adam Jackson
266dfb15c1
docs/envvars: Combine WGL sections
...
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32316 >
2024-12-05 19:46:38 +00:00
Adam Jackson
f447e31daa
docs/envvars: Remove mention of IRIS_ENABLE_CLOVER
...
This went away when clover dropped nir driver support.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32316 >
2024-12-05 19:46:38 +00:00
Eric R. Smith
a2f96667e2
mesa: update more drivers to handle pipe_blit_info swizzle_enable
...
Handle swizzling by falling through to the software path. Swizzle
should be rarely enabled, so this shouldn't affect performance in
most cases.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31378 >
2024-12-05 18:27:37 +00:00
Eric R. Smith
3da4a404ae
aux: add support for dumping the swizzle in pipe_blit_info
...
Just some additional debug code for the new blit swizzle feature.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31378 >
2024-12-05 18:27:37 +00:00
Eric R. Smith
b81aefcc19
mesa: when blitting between formats clear any unused components
...
If the state tracker chooses to implement one format with a more
general one (e.g. GL_ALPHA implemented with GL_RGBA) we end up
in a situation where some components should be ignored. Readpix
handles this correctly, but blit does not, which means that if
we blit between different formats we can end up writing garbage
into some components. Work around this by adding an explicit
swizzle to the pipe_blit_info struct, which can re-arrange elements
and/or put 0 or 1 into appropriate channels, and use this to
set the appropriate values into unused channels via the sampler
view.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31378 >
2024-12-05 18:27:37 +00:00
Erik Faye-Lund
9f69f7a66d
panvk: free preload-shaders after compiling
...
These shaders are created using nir_builder_init_simple_shader(), which
allocates using a NULL ralloc-parent, so ralloc_free should be the right
function to free them with.
Fixes: 0bc3502ca3 ("panvk: Implement a custom FB preload logic")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32486 >
2024-12-05 17:45:16 +00:00
Erik Faye-Lund
43738a9a94
vulkan/meta: plug a couple of memory leaks
...
We create NIR shaders here, and we need to free them when we're done with
them as well.
These shaders are created using nir_builder_init_simple_shader(), which
allocates using a NULL ralloc-parent, so ralloc_free should be the right
function to free them with.
Fixes: 514c10344e ("vulkan/meta: Add a concept of rect pipelines")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32486 >
2024-12-05 17:45:16 +00:00
Tomeu Vizoso
3aad0afc30
teflon/tests: Also use the cache for models in the test suite
...
To speed things up now that we have more models under testing.
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32485 >
2024-12-05 17:02:27 +00:00
Tomeu Vizoso
74239aeb77
teflon/tests: Add support for models with float inputs and outputs
...
Ended up deciding to drop C++ collections and use instead C pointers
because the template use was starting to get ridiculous.
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32485 >
2024-12-05 17:02:27 +00:00
Tomeu Vizoso
f21d8af43a
teflon: Don't crash when a tensor isn't quantized
...
We don't support yet hardware that can deal with floats, but it is
better not to crash.
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32485 >
2024-12-05 17:02:27 +00:00
Tomeu Vizoso
a548b17b4e
teflon: Rename model tests so they aren't skipped by gtest-runner
...
The regular expression engine in gtest-runner was matching more tests
than we wanted, so we weren't testing all we thought.
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32485 >
2024-12-05 17:02:26 +00:00
Tomeu Vizoso
1e117478d4
teflon: Support tests with inputs with less than 4 dims
...
Needed in models such as YOLOX.
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32485 >
2024-12-05 17:02:26 +00:00
Tomeu Vizoso
140150083e
teflon: Add tests for the YOLOX model
...
The model was generated from:
https://github.com/Megvii-BaseDetection/YOLOXa (Apache License 2.0)
Reviewed-by: Philipp Zabel <p.zabel@pengutronix.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32485 >
2024-12-05 17:02:26 +00:00
David Rosca
8d3d35bf05
frontends/va: Add support for VA_SURFACE_ATTRIB_MEM_TYPE_DRM_PRIME_3
...
Reviewed-by: Leo Liu <leo.liu@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32113 >
2024-12-05 16:34:09 +00:00
Lionel Landwerlin
80c0d2718c
anv: report formats supported by the common bvh framework
...
Enables DXR 1.1 with vkd3d-proton
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com >
Reviewed-by: Kevin Chuang <kaiwenjon23@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32487 >
2024-12-05 15:54:10 +00:00
Eric R. Smith
aba90c1523
panfrost: check afbc status in panfrost_query_compression_modifiers
...
In panfrost_query_compression_modifiers we need to ignore AFBC
modifiers if the device does not support AFBC. In order to avoid
duplicating code, we do this by calling panfrost_walk_dmabuf_modifiers
with a flag that indicates we do not want AFRC modifiers.
Reviewed-by: Louis-Francis Ratté-Boulianne <lfrb@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32406 >
2024-12-05 14:54:09 +00:00
Marek Olšák
dfc2f054b6
radeonsi/ci: update navi31 failures
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Marek Olšák
ed4606a062
radeonsi/ci: remove --slow
...
The tests were split or reduced in glcts.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Marek Olšák
c0ccae84a7
radeonsi/ci: remove most flakes and some skips, update navi31 failures
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Marek Olšák
af618dd907
radeonsi/ci: stop using a global flakes list, only use a per-chip flakes list
...
We need to start treating flakes as fails and they are likely different
between chips.
I removed the gfx9 flakes file and renamed the original flakes file
to gfx6-tahiti-flakes.csv, but it would be better to add a new flakes
file for each generation we test.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Marek Olšák
3ff8111fc6
radeonsi/ci: handle glinfo errors better
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Marek Olšák
738a501e92
radeonsi: don't compute total_direct_count in si_draw if it's unused
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Marek Olšák
ed372d4b7c
radeonsi: try to fix Navi14 regression in debug builds
...
Assertion failure:
../src/gallium/drivers/radeonsi/si_state_shaders.cpp:1369: unsigned int si_get_input_prim(const
si_shader_selector*, const si_shader_key*, bool): Assertion `gs->stage == MESA_SHADER_VERTEX' failed.
Fixes: 7e959864b2 ("radeonsi: enable NGG culling for non-monolithic TES and GS")
Tested-by: Michel Dänzer <mdaenzer@redhat.com >
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Marek Olšák
a3c293cdcd
radeonsi: revert to always returning true for load_cull_any_enabled_amd
...
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Marek Olšák
511a637a5c
radeonsi: pass cull face state via user SGPRs for shader culling
...
The culling code always computes the determinant for culling zero-area
triangles, so passing the state via user SGPRs doesn't really add much
shader code to justify having shader variants for front/back face
culling that uses the same determinant.
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32288 >
2024-12-05 12:07:06 +00:00
Alyssa Rosenzweig
ca9bf43d0b
nir,asahi: make argument alignment configurable
...
this is more flexible. Mali needs 32-bit alignment, for example.
I added an option struct in case we need to make this a callback or something
later.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
0d77e91ca3
nir/opt_load_store_vectorize: match amul like imul
...
for AGX, we preserve amul all the way until fusing address modes in order to be
able to fuse effectively. so the load/store vectorizer wouldn't vectorize before
fusing.
however, after fusing we get fused intrinsics which are tricky to teach the
vectorizer about as their semantics are pretty subtle. so we can't vectorize
after, either.
the easiest solution is to teach the vectorize about amul, which can always be
replaced by imul for our pattern matches.
this fixes certain cases of vectorization in OpenCL kernels on asahi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
77d4ed0a01
nir/opt_algebraic: optimize sign bit manipulation
...
libclc loves to generate the iand(0x7fffffff) pattern. ior/ixor patterns are
added for completeness.
Shaves 4 instructions off libclc vec4 normalize.
v2: Loop over the bit sizes (Georg).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Marek Olšák <marek.olsak@amd.com > [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
be049e1c14
nir/search_helpers: handle bcsel in is_only_used_as_float
...
this lets algebraic see through chains of instructions.
v2: Limit recursion depth (Georg).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Marek Olšák <marek.olsak@amd.com > [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Pavel Ondračka
ecc4d5da67
i915/ci: update CI expectations
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32494 >
2024-12-05 09:35:43 +00:00
Boris Brezillon
19231c7ae3
pan: s/NIR_PASS_V/NIR_PASS/
...
Move away from NIR_PASS_V() like other drivers have done long ago.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Chia-I Wu <olvaffe@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32480 >
2024-12-05 08:49:45 +00:00
Boris Brezillon
b47cf63cca
panvk: s/NIR_PASS_V/NIR_PASS/
...
Move away from NIR_PASS_V() like other drivers have done long ago.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Chia-I Wu <olvaffe@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32480 >
2024-12-05 08:49:45 +00:00
Boris Brezillon
7e78aa73dd
panfrost: Use nir_shader_intrinsics_pass() for the line_smooth lowering pass
...
We have a helper function to iterate only on intrinsics, so let's use it.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Chia-I Wu <olvaffe@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32480 >
2024-12-05 08:49:45 +00:00
Boris Brezillon
34beb93635
panfrost: s/NIR_PASS_V/NIR_PASS/
...
Move away from NIR_PASS_V() like other drivers have done long ago.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Chia-I Wu <olvaffe@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32480 >
2024-12-05 08:49:45 +00:00
Boris Brezillon
98e3c1e6fb
nir: Let nir_lower_texcoord_replace_late() report progress
...
Useful if we want to wrap this pass with a NIR_PASS() to enforce
validation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Chia-I Wu <olvaffe@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32480 >
2024-12-05 08:49:45 +00:00
Samuel Pitoiset
ea112cf84d
ci: update VKCTS main to a9f7069b9a5ba94715a175cb1818ed504add0107
...
This contains many more tests for Vulkan 1.4, but the Vulkan loader
probably needs an update too.
This should only affect RADV which is the only user for VKCTS main.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32475 >
2024-12-05 08:06:23 +00:00
Samuel Pitoiset
8b755840fc
radv: fix initializing HTILE when the image has VRS rates
...
VRS rates should only be preserved for clears, otherwise the HTILE
buffer should be cleared completely.
This fixes some failures/flakes in CI.
Fixes: 8197d744f5 ("radv: Do not overwrite VRS rates when doing fast clears")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32463 >
2024-12-05 07:34:58 +00:00
Samuel Pitoiset
e73fdac9a6
radv: enable DGC IES for compute with ESO
...
This was supposed to be enabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32484 >
2024-12-05 07:06:17 +00:00
Simon Perretta
e26a383ee8
pco: fix x86 build
...
Use inttypes.h when printing variables whose format specifier changes
across different archs.
Fixes: 37d47913
Fixes: e67e4452
Closes : #12238
Signed-off-by: Simon Perretta <simon.perretta@imgtec.com >
Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32492 >
2024-12-05 00:50:16 +00:00
Dylan Baker
43bdc84831
docs: update calendar for 24.3.1
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32491 >
2024-12-05 00:43:50 +00:00
Dylan Baker
fd0da8eb80
docs: Add SHA sums for 24.3.1
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32491 >
2024-12-05 00:43:50 +00:00
Dylan Baker
a3715349fd
docs: add release notes for 24.3.1
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32491 >
2024-12-05 00:43:50 +00:00
Ian Romanick
0754a18621
brw/copy: Allow copy prop into src1 of broadcast
...
This is the selector, and it must always be a uniform UD, so there's no
reason to not propagate into it.
No shader-db change on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 220507131 -> 220507127 (-0.00%)
Cycle count: 31607052398 -> 31607053364 (+0.00%); split: -0.00%, +0.00%
Totals from 5 (0.00% of 702410) affected shaders:
Instrs: 995 -> 991 (-0.40%)
Cycle count: 86392 -> 87358 (+1.12%); split: -0.07%, +1.19%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097 >
2024-12-05 00:15:27 +00:00
Ian Romanick
662339a2ff
brw/build: Use SIMD8 temporaries in emit_uniformize
...
The fossil-db results are very different from v1. This is now mostly
helpful on older platforms.
v2: When optimizing BROADCAST or FIND_LIVE_CHANNEL to a simple MOV,
adjust the exec_size to match the size allocated for the destination
register. Fixes EU validation failures in some piglit OpenCL tests
(e.g., atomic_add-global-return.cl).
v3: Use component_size() in emit_uniformize and BROADCAST to properly
account for UQ vs UD destination. This doesn't matter for
emit_uniformize because the type is always UD, but it is technically
more correct.
v4: Update trace checksums. Now amly expects the same checksum as
several other platforms.
v5: Use xbld.dispatch_width() in the builder for when scalar_group()
eventually becomes SIMD1. Suggested by Lionel.
shader-db:
Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18091701 -> 18091586 (<.01%)
instructions in affected programs: 29616 -> 29501 (-0.39%)
helped: 28 / HURT: 18
total cycles in shared programs: 919250494 -> 919123828 (-0.01%)
cycles in affected programs: 12201102 -> 12074436 (-1.04%)
helped: 124 / HURT: 108
LOST: 0
GAINED: 1
Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20480808 -> 20480624 (<.01%)
instructions in affected programs: 58465 -> 58281 (-0.31%)
helped: 61 / HURT: 20
total cycles in shared programs: 874860168 -> 874960312 (0.01%)
cycles in affected programs: 18240986 -> 18341130 (0.55%)
helped: 113 / HURT: 158
total spills in shared programs: 4557 -> 4555 (-0.04%)
spills in affected programs: 93 -> 91 (-2.15%)
helped: 1 / HURT: 0
total fills in shared programs: 5247 -> 5243 (-0.08%)
fills in affected programs: 224 -> 220 (-1.79%)
helped: 1 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 220486064 -> 220486959 (+0.00%); split: -0.00%, +0.00%
Subgroup size: 14102592 -> 14102624 (+0.00%)
Cycle count: 31602733838 -> 31604733270 (+0.01%); split: -0.01%, +0.02%
Max live registers: 65371025 -> 65355084 (-0.02%)
Totals from 12130 (1.73% of 702392) affected shaders:
Instrs: 5162700 -> 5163595 (+0.02%); split: -0.06%, +0.08%
Subgroup size: 388128 -> 388160 (+0.01%)
Cycle count: 751721956 -> 753721388 (+0.27%); split: -0.54%, +0.81%
Max live registers: 1538550 -> 1522609 (-1.04%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 241601142 -> 241599114 (-0.00%); split: -0.00%, +0.00%
Subgroup size: 9631168 -> 9631216 (+0.00%)
Cycle count: 25101781573 -> 25097909570 (-0.02%); split: -0.03%, +0.01%
Max live registers: 41540611 -> 41514296 (-0.06%)
Max dispatch width: 6993456 -> 7000928 (+0.11%); split: +0.15%, -0.05%
Totals from 16852 (2.11% of 796880) affected shaders:
Instrs: 6303937 -> 6301909 (-0.03%); split: -0.11%, +0.07%
Subgroup size: 323592 -> 323640 (+0.01%)
Cycle count: 625455880 -> 621583877 (-0.62%); split: -1.20%, +0.58%
Max live registers: 1072491 -> 1046176 (-2.45%)
Max dispatch width: 76672 -> 84144 (+9.75%); split: +14.04%, -4.30%
Tiger Lake
Totals:
Instrs: 235190395 -> 235193286 (+0.00%); split: -0.00%, +0.00%
Cycle count: 23130855720 -> 23128936334 (-0.01%); split: -0.02%, +0.01%
Max live registers: 41644106 -> 41620052 (-0.06%)
Max dispatch width: 6959160 -> 6981512 (+0.32%); split: +0.34%, -0.02%
Totals from 15102 (1.90% of 793371) affected shaders:
Instrs: 5771042 -> 5773933 (+0.05%); split: -0.06%, +0.11%
Cycle count: 371062226 -> 369142840 (-0.52%); split: -1.04%, +0.52%
Max live registers: 989858 -> 965804 (-2.43%)
Max dispatch width: 61344 -> 83696 (+36.44%); split: +38.42%, -1.98%
Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 236063150 -> 236063242 (+0.00%); split: -0.00%, +0.00%
Cycle count: 24516187174 -> 24516027518 (-0.00%); split: -0.00%, +0.00%
Spill count: 567071 -> 567049 (-0.00%)
Fill count: 701323 -> 701273 (-0.01%)
Max live registers: 41914047 -> 41913281 (-0.00%)
Max dispatch width: 7042608 -> 7042736 (+0.00%); split: +0.00%, -0.00%
Totals from 3904 (0.49% of 798473) affected shaders:
Instrs: 2809690 -> 2809782 (+0.00%); split: -0.02%, +0.03%
Cycle count: 182114259 -> 181954603 (-0.09%); split: -0.34%, +0.25%
Spill count: 1696 -> 1674 (-1.30%)
Fill count: 2523 -> 2473 (-1.98%)
Max live registers: 341695 -> 340929 (-0.22%)
Max dispatch width: 32752 -> 32880 (+0.39%); split: +0.44%, -0.05%
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097 >
2024-12-05 00:15:27 +00:00
Ian Romanick
d2b266187d
brw: Use resize_sources several more places
...
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097 >
2024-12-05 00:15:27 +00:00
Ian Romanick
12d1886b87
brw/lower: Don't "fix" regioning of broadcast
...
The next two commits modify the destination regioning in a way that,
which still correct, trigger assertion failures if we try to fix the
regioning here.
Broadcast gets lowered in brw_eu_emit. For the purposes of region
restrictions, let's assume that the final code emission will do the
right thing. Doing a bunch of shuffling here is only going to make a
mess of things.
No shader-db or fossil-db changes on any Intel platform.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32097 >
2024-12-05 00:15:27 +00:00