Caio Oliveira
d1dd088ede
brw: Allow DPAS with BF on Gfx125
...
MTL doesn't support, but both ACM and ARL-H do.
Fixes: e384ccde28 ("brw: Expand EU validation for DPAS")
Reviewed-by: Rohan Garg <rohan.garg@intel.com >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506 >
2025-04-14 18:23:43 +00:00
Caio Oliveira
050acb9def
intel: Disable has_bfloat16 for MTL
...
Not supported. Some operations *do* work, but proper support
was removed since it also doesn't support DPAS.
Fixes: 9916cc1050 ("brw: Add BRW_TYPE_BF for bfloat16")
Reviewed-by: Rohan Garg <rohan.garg@intel.com >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506 >
2025-04-14 18:23:43 +00:00
Caio Oliveira
adfab666a4
intel: Add intel_device_info::has_systolic
...
Gfx125+ has systolic, with exception for MTL and some ARL
variants. Update code and tests to use it.
Reviewed-by: Rohan Garg <rohan.garg@intel.com >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34506 >
2025-04-14 18:23:43 +00:00
Mike Blumenkrantz
bf5273dd38
ci: update VVL to current week
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33651 >
2025-04-14 17:51:05 +00:00
Mike Blumenkrantz
0b7611824a
zink: use implicit stride in ntv for temp vars
...
APPARENTLY explicit stride is illegal for temp vars because they should
just be using the element stride implicitly
this makes total sense and is very obvious
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33651 >
2025-04-14 17:51:05 +00:00
Mike Blumenkrantz
b4e3535650
zink: stop setting ArrayStride on image arrays
...
this is illegal
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33651 >
2025-04-14 17:51:05 +00:00
Mike Blumenkrantz
1c0de360bc
zink: don't set shared block stride without KHR_workgroup_memory_explicit_layout
...
this is illegal
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33651 >
2025-04-14 17:51:05 +00:00
Connor Abbott
74531094cb
ir3: Vectorize shared memory loads/stores
...
This drastically helps a Path of Exile 2 compute dispatch, going from
4.6ms to 2.7ms.
Totals from 969 (0.59% of 164134) affected shaders:
MaxWaves: 9586 -> 9560 (-0.27%); split: +0.02%, -0.29%
Instrs: 1252433 -> 1234724 (-1.41%); split: -1.47%, +0.05%
CodeSize: 2237424 -> 2195238 (-1.89%); split: -1.91%, +0.03%
NOPs: 362213 -> 360913 (-0.36%); split: -0.92%, +0.56%
MOVs: 58879 -> 59591 (+1.21%); split: -0.62%, +1.83%
Full: 15817 -> 15867 (+0.32%); split: -0.04%, +0.36%
(ss): 35671 -> 35434 (-0.66%); split: -1.80%, +1.14%
(sy): 23953 -> 23964 (+0.05%); split: -0.38%, +0.43%
(ss)-stall: 127807 -> 124930 (-2.25%); split: -3.43%, +1.18%
(sy)-stall: 583947 -> 585886 (+0.33%); split: -0.61%, +0.94%
Early-preamble: 317 -> 316 (-0.32%)
Cat0: 394577 -> 393316 (-0.32%); split: -0.85%, +0.53%
Cat1: 100335 -> 101057 (+0.72%); split: -0.36%, +1.08%
Cat2: 415880 -> 415835 (-0.01%); split: -0.05%, +0.04%
Cat3: 187928 -> 187929 (+0.00%); split: -0.00%, +0.00%
Cat5: 19143 -> 19148 (+0.03%)
Cat6: 69630 -> 52523 (-24.57%)
Cat7: 47160 -> 47136 (-0.05%); split: -0.56%, +0.51%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34441 >
2025-04-14 17:22:47 +00:00
Connor Abbott
9977c4d682
ir3: Move load/store vectorization to finalize
...
Some frontends such as rusticl and turnip call the optimization loop
before choosing the shared memory layout, in order to be able to delete
variables that turn out to be unused. This means that we can't vectorize
them until after the first run of the optimization loop. Other drivers
also seem to do something similar.
This also has the benefit that by delaying vectorization of UBOs until
after they are lowered from derefs, we don't insert casts which remove
the ability of nir_lower_explicit_io to insert a range, which was
blocking the pushing of vectorized indirect UBO loads. This has a
significant positive impact on fossil-db:
Only doing vectorization later exposes a bug where vectorization could
change the bitsize after we used it to determine which descriptor to
use. It happened to work before because vectorization was usually done
early. To fix it, move adjusting the descriptor to a new pass that
happens after finalizing.
Totals:
MaxWaves: 2249140 -> 2281068 (+1.42%); split: +1.43%, -0.01%
Instrs: 49624230 -> 49143117 (-0.97%); split: -1.14%, +0.17%
CodeSize: 103796862 -> 104143744 (+0.33%); split: -0.98%, +1.31%
NOPs: 8489860 -> 8512218 (+0.26%); split: -1.55%, +1.81%
MOVs: 1531650 -> 1574911 (+2.82%); split: -1.37%, +4.20%
Full: 1814334 -> 1748906 (-3.61%); split: -3.64%, +0.03%
(ss): 1155395 -> 1128249 (-2.35%); split: -3.48%, +1.13%
(sy): 608650 -> 567972 (-6.68%); split: -7.32%, +0.64%
(ss)-stall: 4352550 -> 4340473 (-0.28%); split: -2.08%, +1.80%
(sy)-stall: 17852259 -> 16943647 (-5.09%); split: -6.25%, +1.16%
STPs: 24568 -> 24215 (-1.44%)
LDPs: 37799 -> 37468 (-0.88%)
Early-preamble: 115698 -> 113694 (-1.73%); split: +0.17%, -1.90%
Cat0: 9345228 -> 9367782 (+0.24%); split: -1.41%, +1.65%
Cat1: 2445265 -> 2549122 (+4.25%); split: -0.81%, +5.06%
Cat2: 18704736 -> 18377519 (-1.75%); split: -1.76%, +0.01%
Cat3: 14210303 -> 14130558 (-0.56%); split: -0.56%, +0.00%
Cat4: 1346895 -> 1346462 (-0.03%); split: -0.03%, +0.00%
Cat5: 1420418 -> 1420417 (-0.00%); split: -0.07%, +0.07%
Cat6: 745590 -> 549358 (-26.32%); split: -26.66%, +0.34%
Cat7: 1405795 -> 1401899 (-0.28%); split: -0.96%, +0.68%
Totals from 79089 (48.19% of 164134) affected shaders:
MaxWaves: 947648 -> 979576 (+3.37%); split: +3.40%, -0.03%
Instrs: 38664140 -> 38183027 (-1.24%); split: -1.47%, +0.22%
CodeSize: 80179110 -> 80525992 (+0.43%); split: -1.27%, +1.70%
NOPs: 6880907 -> 6903265 (+0.32%); split: -1.91%, +2.23%
MOVs: 1183855 -> 1227116 (+3.65%); split: -1.78%, +5.43%
Full: 1107056 -> 1041628 (-5.91%); split: -5.96%, +0.05%
(ss): 939342 -> 912196 (-2.89%); split: -4.28%, +1.39%
(sy): 457959 -> 417281 (-8.88%); split: -9.73%, +0.85%
(ss)-stall: 3664495 -> 3652418 (-0.33%); split: -2.47%, +2.14%
(sy)-stall: 12266805 -> 11358193 (-7.41%); split: -9.10%, +1.69%
STPs: 7494 -> 7141 (-4.71%)
LDPs: 7050 -> 6719 (-4.70%)
Early-preamble: 46339 -> 44335 (-4.32%); split: +0.43%, -4.75%
Cat0: 7548630 -> 7571184 (+0.30%); split: -1.75%, +2.05%
Cat1: 1823872 -> 1927729 (+5.69%); split: -1.09%, +6.78%
Cat2: 14767716 -> 14440499 (-2.22%); split: -2.22%, +0.01%
Cat3: 10630582 -> 10550837 (-0.75%); split: -0.75%, +0.00%
Cat4: 1150090 -> 1149657 (-0.04%); split: -0.04%, +0.00%
Cat5: 1068913 -> 1068912 (-0.00%); split: -0.09%, +0.09%
Cat6: 554910 -> 358678 (-35.36%); split: -35.82%, +0.45%
Cat7: 1119427 -> 1115531 (-0.35%); split: -1.20%, +0.86%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34441 >
2025-04-14 17:22:46 +00:00
Connor Abbott
2f93137308
nir/opt_preamble: Handle load_global_ir3
...
fossil-db results with turnip:
Totals from 994 (0.60% of 165023) affected shaders:
MaxWaves: 10720 -> 11528 (+7.54%); split: +7.57%, -0.04%
Instrs: 1032004 -> 972314 (-5.78%); split: -5.99%, +0.21%
CodeSize: 1847536 -> 1942472 (+5.14%); split: -0.11%, +5.25%
NOPs: 261089 -> 233279 (-10.65%); split: -10.89%, +0.23%
MOVs: 57217 -> 51434 (-10.11%); split: -14.11%, +4.00%
Full: 16412 -> 14647 (-10.75%); split: -10.96%, +0.21%
(ss): 23330 -> 25594 (+9.70%); split: -5.51%, +15.21%
(sy): 17803 -> 15711 (-11.75%); split: -11.93%, +0.18%
(ss)-stall: 96387 -> 107976 (+12.02%); split: -5.14%, +17.17%
(sy)-stall: 952952 -> 765754 (-19.64%); split: -19.84%, +0.19%
STPs: 494 -> 327 (-33.81%)
LDPs: 1447 -> 1163 (-19.63%)
Early-preamble: 668 -> 22 (-96.71%)
Cat0: 280935 -> 251779 (-10.38%); split: -10.60%, +0.22%
Cat1: 93400 -> 84766 (-9.24%); split: -11.79%, +2.55%
Cat2: 343880 -> 337270 (-1.92%); split: -3.20%, +1.28%
Cat3: 189311 -> 180918 (-4.43%)
Cat4: 21008 -> 19920 (-5.18%)
Cat5: 17788 -> 17783 (-0.03%)
Cat6: 45786 -> 39531 (-13.66%)
Cat7: 39896 -> 40347 (+1.13%); split: -0.43%, +1.56%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34483 >
2025-04-14 16:53:34 +00:00
Connor Abbott
ec780eb0e7
ir3: Pass through access flags when lowering global accesses
...
This will let us do optimizations such as moving loads to a preamble.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34483 >
2025-04-14 16:53:34 +00:00
Boris Brezillon
b7ff9dddd4
pan/earlyzs: Fix the read-only ZS optimization
...
Read-only ZS optimization can only happen if the ZS tile buffer is not
written, which can only be known when the fixed-function settings is
set.
Change pan_earlyzs_get() to take an enum instead of a boolean and
differentiate ZS-read and ZS-read-with-readonly-optimization-allowed.
Fixes: 25a993731087 ("pan/earlyzs: Support the shader ZS read-only case and its optimization on v10+")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34480 >
2025-04-14 15:20:06 +00:00
Eric R. Smith
69a6db4b2b
panfrost: fix transaction elimination crc valid calculation
...
The setting of the clean_pixel_write_enable flag in pan_prepare_rt
was not consistent with the crc valid calculations in pan_emit_fbd.
This caused the crc_valid flag to not be accurate, causing transaction
elimination to fail.
Fixes: eac8f1d460 ("Revert "panfrost: Disable CRC by default"")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34408 >
2025-04-14 14:56:35 +00:00
Adam Jackson
c4b305079d
meson: Simplify the power8 optimization logic
...
If it compiles, it works. And there's not a particularly good reason to
disable it, so don't let people disable it.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Acked-by: Dylan Baker <dylan.c.baker@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34239 >
2025-04-14 14:12:30 +00:00
Maíra Canal
3122df666e
broadcom/simulator: Fix Indirect CSD jobs for V3D 7.1.6+
...
Signed-off-by: Maíra Canal <mcanal@igalia.com >
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34465 >
2025-04-14 12:13:30 +00:00
Maíra Canal
d3ad4e3465
broadcom/simulator: Expose V3D revision number in the simulator interface
...
Signed-off-by: Maíra Canal <mcanal@igalia.com >
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34465 >
2025-04-14 12:13:30 +00:00
Erik Faye-Lund
1d5da22dfd
nir/lower_tex: avoid undefined-behavior
...
When texture_index and sampler_index are over 32, we can't really check
for them in a single 32-bit word. This happens among other things when
Panfrost uses preload shaders on v9 and later. Otherwise, we trigger
undefined behavior.
We're already doing this for textures in one case, let's be consistent.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Reviewed-by: Eric R. Smith <eric.smith@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365 >
2025-04-14 11:22:43 +00:00
Erik Faye-Lund
41b136f674
nir/lower_tex: use texture_mask instead of shifting on use
...
In commit 292ac71a4a ("nir/lower_tex: handle deref casts"), we avoided
using texture_index when a texture instruction contained a variable
deref. There's no good reason why this should be done to some of the
lowering, but not all.
So let's fix up code-paths that were added after this change to do the
same.
The first two patches here crossed paths with the commit that introduced
texture_mask, so it's not strange that the change was missed. The last
one seems to have just copied what was done around it, propagating the
issue.
Fixes: 880b00dc59 ("nir/lower_tex: Add support for lowering YUYV formats")
Fixes: 1358d93650 ("nir/lower_tex: Add support for lowering Y41x formats")
Fixes: 65d6f5aed2 ("nir: add options to lower y_vu, yv_yu, yx_xvxu and xy_vxux")
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34365 >
2025-04-14 11:22:43 +00:00
Vignesh Raman
8e069e1ef9
ci: Uprev kernel to 6.14
...
Move to 6.14 for all mesa-ci jobs using gfx-ci/linux, except anv-jsl, and
Raven.
Signed-off-by: Vignesh Raman <vignesh.raman@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34401 >
2025-04-14 10:53:50 +00:00
Philipp Zabel
39855a8fd1
teflon: Log (un)supported operations
...
Log all operations with the information used to decide whether they
are supported or unsupported. Include tensor data types, conv2d fused
activation and dilation parameters to debug output.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34472 >
2025-04-14 10:33:38 +00:00
Philipp Zabel
f23b376e84
etnaviv/ml: Fix padding input/output tensor zero points
...
For tensors that were converted from signed 8-bit tensors to unsigned
8-bit tensors with offset zero point, use the offset zero point also
for the TP pad operation.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34474 >
2025-04-14 09:16:29 +00:00
Philipp Zabel
13a120d13c
etnaviv/ml: Drop duplicated function reorder_for_hw_depthwise()
...
This function is unused, remove it.
An identical copy is found (and used) in etnaviv_ml_nn.c.
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34471 >
2025-04-14 08:59:15 +00:00
Samuel Pitoiset
8ea46b14fa
ci: update VKCTS main to 76c1572eaba42d7ddd9bb8eb5788e52dd932068e
...
RADV is the only driver using VKCTS main.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34299 >
2025-04-14 08:24:14 +00:00
Samuel Pitoiset
410f7f9f6e
radv: only enable DCC for invisible VRAM on GFX12
...
DCC should only be allowed on invisible VRAM, otherwise the CPU could
read the data and it will read garbage if it's compressed.
This also caused GPU hangs after suspend/resume probably because
some buffers were compressed when moved back from GTT to VRAM.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12962
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12922
Fixes: 9af11bf306 ("radv: add initial DCC support on GFX12")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34347 >
2025-04-14 07:39:33 +00:00
Samuel Pitoiset
75be860eec
radv: use paired context regs when optimal on GFX12
...
CP is very slow on GFX12 and parsing the packet header is the main
bottleneck. Using paired context regs reduce the number of packet
headers and it should be more optimal.
It doesn't seem worth when only one context reg is emitted (one packet
header and same number of DWORDS) or when consecutive context regs are
emitted (would increase the number of DWORDS).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34421 >
2025-04-14 06:18:13 +00:00
Samuel Pitoiset
f92f50c58a
radv: add macros for paired context registers on GFX12
...
Imported from RadeonSI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34421 >
2025-04-14 06:18:13 +00:00
Job Noorman
35ec960f6f
ir3: run cp after ir3_imm_const_to_preamble
...
Now that ir3_cp has an option to not lower immediates to const
registers, we can use it after ir3_imm_const_to_preamble instead of
manually propagating immediates.
This fixes a lot of missed opportunities for early-preamble as we didn't
propagate the mova1 immediate which a caused a GPR to be used in many
preambles.
Totals:
Instrs: 49704517 -> 49703700 (-0.00%); split: -0.16%, +0.16%
CodeSize: 103917968 -> 103187072 (-0.70%); split: -0.82%, +0.11%
NOPs: 8516944 -> 8511764 (-0.06%); split: -0.78%, +0.72%
MOVs: 1534023 -> 1536385 (+0.15%); split: -1.12%, +1.27%
Full: 1816517 -> 1816548 (+0.00%); split: -0.05%, +0.06%
(ss): 1162108 -> 1161490 (-0.05%); split: -1.03%, +0.98%
(sy): 611398 -> 610311 (-0.18%); split: -0.80%, +0.62%
(ss)-stall: 4384529 -> 4388096 (+0.08%); split: -1.22%, +1.30%
(sy)-stall: 17858701 -> 17837101 (-0.12%); split: -0.87%, +0.74%
STPs: 25096 -> 25491 (+1.57%); split: -0.05%, +1.63%
LDPs: 37635 -> 38030 (+1.05%); split: -0.03%, +1.08%
Preamble Instrs: 12589113 -> 11391946 (-9.51%); split: -9.75%, +0.24%
Early Preamble: 115946 -> 122893 (+5.99%); split: +6.05%, -0.06%
Cat0: 9374513 -> 9370393 (-0.04%); split: -0.71%, +0.67%
Cat1: 2443348 -> 2446546 (+0.13%); split: -0.82%, +0.95%
Cat2: 18731502 -> 18731478 (-0.00%); split: -0.00%, +0.00%
Cat7: 1410092 -> 1410221 (+0.01%); split: -0.61%, +0.62%
Totals from 39189 (23.81% of 164575) affected shaders:
Instrs: 30656115 -> 30655298 (-0.00%); split: -0.26%, +0.26%
CodeSize: 61714230 -> 60983334 (-1.18%); split: -1.37%, +0.19%
NOPs: 6074700 -> 6069520 (-0.09%); split: -1.10%, +1.01%
MOVs: 1010392 -> 1012754 (+0.23%); split: -1.70%, +1.93%
Full: 617108 -> 617139 (+0.01%); split: -0.16%, +0.16%
(ss): 778842 -> 778224 (-0.08%); split: -1.54%, +1.46%
(sy): 362803 -> 361716 (-0.30%); split: -1.35%, +1.05%
(ss)-stall: 3203827 -> 3207394 (+0.11%); split: -1.67%, +1.78%
(sy)-stall: 9507680 -> 9486080 (-0.23%); split: -1.63%, +1.40%
STPs: 23004 -> 23399 (+1.72%); split: -0.06%, +1.77%
LDPs: 33942 -> 34337 (+1.16%); split: -0.04%, +1.20%
Preamble Instrs: 8090918 -> 6893751 (-14.80%); split: -15.18%, +0.38%
Early Preamble: 12246 -> 19193 (+56.73%); split: +57.25%, -0.52%
Cat0: 6656706 -> 6652586 (-0.06%); split: -1.00%, +0.94%
Cat1: 1546399 -> 1549597 (+0.21%); split: -1.30%, +1.50%
Cat2: 11642214 -> 11642190 (-0.00%); split: -0.00%, +0.00%
Cat7: 943911 -> 944040 (+0.01%); split: -0.91%, +0.92%
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34397 >
2025-04-14 04:37:28 +00:00
Job Noorman
226ec669d8
ir3/cp: ignore alias sources for sam.s2en
...
ir3_cp asserts that the first source of a sam.s2en is a collect which
isn't necessarily true after creating alias registers.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34397 >
2025-04-14 04:37:28 +00:00
Job Noorman
1618c2495b
ir3/cp: add option to disable immediate to const lowering
...
This will allow it to be used after ir3_imm_const_to_preamble so that we
don't have to do the propagation of immediates manually there.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34397 >
2025-04-14 04:37:27 +00:00
Job Noorman
6546a40225
ir3: remove spaces in shader stats
...
The shaderdb scripts don't like them.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34397 >
2025-04-14 04:37:27 +00:00
Trigger Huang
1e709dbea3
radeonsi: Change program seqnece for perf counters
...
Based on the sample usage described in
https://registry.khronos.org/OpenGL/extensions/AMD/AMD_performance_monitor.txt
, the value read from SQ_0004 is always 0, while other counters can be read
successfully.
This patch will sync the program sequence with the following link
https://github.com/GPUOpen-Drivers/AMDVLK/releases/tag/v-2023.Q3.3
With it, SQ_0004 and also other counters can be raed successfully
Signed-off-by: Trigger Huang <Trigger.Huang@amd.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34360 >
2025-04-14 10:23:46 +08:00
Karol Herbst
fc7badeac0
zink: don't apply the map_offset when mapping a staging resource in zink_buffer_map
...
Fixes regressions in the OpenCL CTS allocation tests.
Fixes: 5d46e2bf3c ("zink: implement unsynchronized staging uploads for buffers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34494 >
2025-04-12 17:42:53 +00:00
Faith Ekstrand
fadac25b0c
nil: Multiply by array_stride_B instead of adding
...
Fixes: 5577128c83 ("nil: Rewrite the TIC code in Rust")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34495 >
2025-04-12 17:04:40 +00:00
Faith Ekstrand
5c81b3546f
nvk/nvkmd: Check the correct flag for the Kepler GART workaround
...
Fixes: 1db57bb414 ("nvk/nvkmd: Rework memory placement flags")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34495 >
2025-04-12 17:04:40 +00:00
Konstantin Seurer
985f5e0875
lavapipe: Do not emit aabb handling if no isec shader is used
...
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34003 >
2025-04-12 17:22:50 +02:00
Konstantin Seurer
7113620625
lavapipe: pre-load tmax
...
tmax is lowered to scratch with ray tracing pipelines.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34003 >
2025-04-12 17:22:44 +02:00
Konstantin Seurer
c1a620ae19
lavapipe: Run nir optimizations on ray tracing pipelines
...
Improves performance by 10%.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34003 >
2025-04-12 17:22:37 +02:00
Konstantin Seurer
cdb2e3d2b5
lavapipe: Prefetch 56 bytes of node data during ray traversal
...
Almost all node types need around 56 bytes of data. This patch fetches
this data in a less divergent block.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34003 >
2025-04-12 17:22:27 +02:00
Konstantin Seurer
676e26aed5
radv: Fix rayTracingPositionFetch with multiple geometies
...
The fix adds more indirections to avoid increasing register pressure by
tracking the primitive address.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34460 >
2025-04-11 22:26:08 +00:00
Aleksi Sapon
77eb58baad
draw: fix gl_PrimitiveID in tessellation
...
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Reviewed-by: Dave Airlie <airlied@redhat.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33415 >
2025-04-11 22:01:05 +00:00
Konstantin Seurer
cb31b5a958
clc,libcl: Clean up CL includes
...
This patch does a couple of things to make CL integration with drivers
as seamless as possible:
- We pull in opencl-c.h and opencl-c-base.h to stop relying on system
headers.
- Parts of libcl.h are moved to new headers that are incomplete CL-safe
variants of libc headers.
- A couple of util headers are changed to remove now unnecessary
__OPENCL_VERSION__ guards and make more headers CL safe.
- Drivers now include src/compiler/libcl and use headers like
macros.h,u_math.h instead of libcl.h.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576 >
2025-04-11 21:27:37 +00:00
Konstantin Seurer
a80fab3e87
clc: Allow bitfields
...
bitfields are not officially supported by Open CL but there is a clang
extension that adds support.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576 >
2025-04-11 21:27:37 +00:00
Konstantin Seurer
ed07aab147
clc: Print errors when initializing clang fails
...
It's nice to know what actually went wrong.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33576 >
2025-04-11 21:27:37 +00:00
Dmitry Baryshkov
b9c6afd3a7
meson: disable SIMD blake optimisations on x32 host
...
On X.org startup libgallium crashes on x32 hosts inside
blake3_hash_many_sse41(), most likely because of the different pointer
size. Disable SIMD blake implementation if x32 is detected.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34453 >
2025-04-11 20:57:38 +00:00
Kenneth Graunke
eb1ec9cf8e
brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs()
...
This allows us to create temporary VGRFs that are larger than
MAX_VGRF_SIZE(devinfo), which will be split eventually. They may not
be split on the initial pass, because we may need LOAD_PAYLOAD lowering,
copy propagation, and so on to occur first. So we allow registers to
exceed that size initially.
The "Register allocation relies on split_virtual_grfs()" assertion in
brw_reg_allocate.cpp still asserts that all VGRFs which reach the
register allocator have been properly split.
One case where this is useful is for vectorizing convergent block loads.
We create temporaries to splat the SIMD1 values out to SIMD(N), which
can lead to some very large temporaries. However, copy propagation and
so on ultimately eliminate these and they'll get split down to proper
sizes or elided entirely in the end.
(Note: both this and the prior commits from this merge request are
needed to close the linked issue.)
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com >
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12324
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461 >
2025-04-11 20:34:51 +00:00
Kenneth Graunke
a45583f078
brw: Use live->max_vgrf_size in pre-RA scheduling
...
Post-RA scheduling doesn't use liveness analysis, so we continue using
MAX_VGRF_SIZE(devinfo). But for pre-RA scheduling, we now use
live->max_vgrf_size.
This helps get us to a place where we can emit arbitrarily large VGRFs
early on in compilation, but which will be split and cleaned up prior to
register allocation. It may also allocate smaller arrays in practice
since MAX_VGRF_SIZE(devinfo) assumes the worst case scenario for things
we actually could need to allocate.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461 >
2025-04-11 20:34:51 +00:00
Kenneth Graunke
4b27b5895c
brw: Use live->max_vgrf_size in register coalescing
...
We already require liveness, so just use the actual maximum size we saw
instead of a hardcoded pessimal size.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461 >
2025-04-11 20:34:51 +00:00
Kenneth Graunke
ea468412f6
brw: Track the largest VGRF size in liveness analysis
...
We're already looking at this data to calculate the per-component
vars_from_vgrf[] and vgrf_from_vars[] mappings, so just record the
largest VGRF size while we're here. This will allow passes to size
arrays based on the actual size needed, rather than hardcoding some
fixed size. In many cases, MAX_VGRF_SIZE(devinfo) is larger than
necessary, because e.g. vec5 sparse sampling results aren't used.
Not hardcoding this means we can also temporarily handle very large
VGRFs which we know will be split eventually, without having to
increase the maximum which is ultimately used for RA classes.
Cc: mesa-stable
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34461 >
2025-04-11 20:34:51 +00:00
Alyssa Rosenzweig
4a299bea27
hk: drop soft fault assumption in hk_buffer_addr_range
...
fixes test_index_buffer_edge_case_stream_output without soft fault.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34486 >
2025-04-11 20:16:01 +00:00
Alyssa Rosenzweig
0f9b396588
hk: advertise sparseResidencyBuffer
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34486 >
2025-04-11 20:16:01 +00:00