Commit Graph

18829 Commits

Author SHA1 Message Date
Rhys Perry
20af16b4d8 aco: use MTBUF for 64-bit atomic load/store
A 64-bit atomic load/store should be considered entirely out-of-bounds if
any part of it is out-of-bounds. Since we implemented these as 32-bit vec2
load/store, it would have been possible for the first half to be in-bounds
while the second half is out-of-bounds.

From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification:
> Any non-atomic access to a uniform, storage, uniform texel, or storage
> texel buffer wider than 32-bits may be treated as multiple 32-bit
> accesses that are separately bounds checked.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:31 +00:00
Rhys Perry
f905acfada aco: remove barrier acquire/release workaround
This existed since ccfe9813fb because NIR
had no atomic loads/stores. This is no longer the case.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:31 +00:00
Rhys Perry
271b135b03 aco: set atomic semantic for atomic load/store
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:30 +00:00
Rhys Perry
74b807cf58 aco: only workaround load tearing for atomic loads
For non-atomic loads, this situation would require a data race.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
2025-10-07 17:41:30 +00:00
Timur Kristóf
c473b0b551 radv/amdgpu: Allow IB2 when primary CS isn't chained
The primary CS doesn't need to use chaining in order to use IB2.
Allow using IB2 packets when chaining is disabled.

Rationale for this patch:
When chaining is enabled (the default), this patch removes a
useless check.
When chaining is disabled (by noibchaining), this patch allows us
to use IB2 without chaining.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:49:02 +00:00
Timur Kristóf
503963c08c radv/amdgpu: Support IB2 without chaining, enable on GFX6
GFX6 supports IB2 but not chaining within an IB2.

To use IB2 on GFX6, disable chaining in secondary CS,
and emit an IB2 packet for each secondary IB.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:49:02 +00:00
Timur Kristóf
92ba76710d ac/gpu_info: Add can_chain_ib2 field to ac_gpu_info
GFX6 supports IB2, but not chaining inside IB2.
It only supports chaining in IB1.
See waCpIb2ChainingUnsupported in PAL.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:49:01 +00:00
Timur Kristóf
2091db2461 radv/amdgpu: Small cleanup of counting submitted IBs
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:49:01 +00:00
Timur Kristóf
fd5c50664e radv/amdgpu: Emit a single 4 dword NOP in chainable CS buffers
This is a small optimization that should slightly reduce the CP
overhead for all GPUs as we now only emit a single NOP packet
instead of 4.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:49:01 +00:00
Timur Kristóf
e6a1355bd5 radv/amdgpu: Add a helper function to emit NOP packets
No functional changes, just make the code a bit easier to read.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:49:00 +00:00
Timur Kristóf
e20080315b radv/amdgpu: Don't assert chaining match when copying secondary IB
This assertion is useless.

In this code path it is not relevant whether or not the primary
CS support chaining. And it is already handled when the secondary
has chaining.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:49:00 +00:00
Timur Kristóf
df58cac660 radv: Rename RADV_DEBUG=noibs to noibchaining
Clarify what it actually means.
Also fix the documentation in envvars.rst to better describe it.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:48:59 +00:00
Timur Kristóf
3902cffab7 radv/amdgpu: Rename use_ib to chain_ib
All CS always use IBs, so the naming was confusing.

Rename these fields to chain_ib to better reflect
what it actually means, which is enabling chaining:
radv_amdgpu_winsys::use_ib_bos
radv_amdgpu_cs::chain_ib

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
2025-10-07 15:48:59 +00:00
Georg Lehmann
84f26ed117 nir: optimize atomic isub if supported
Foz-DB Navi48:
Totals from 1 (0.00% of 80287) affected shaders:
Instrs: 1641 -> 1637 (-0.24%)
CodeSize: 8472 -> 8456 (-0.19%)
Latency: 19132 -> 19131 (-0.01%)
InvThroughput: 9566 -> 9565 (-0.01%)
Copies: 126 -> 125 (-0.79%)
VALU: 565 -> 563 (-0.35%)
SALU: 439 -> 438 (-0.23%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>
2025-10-07 14:07:56 +00:00
Georg Lehmann
d514696a0c aco/isel: support nir_op_atomic_isub
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>
2025-10-07 14:07:56 +00:00
Georg Lehmann
65227ef325 ac/llvm: support nir_atomic_op_isub
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>
2025-10-07 14:07:56 +00:00
Georg Lehmann
a173e51541 aco/insert_waitcnt: don't merge waitcnts for LDS clauses
We form LDS clauses because heavily interleaving LDS and VALU leads to false
dependencies. But LDS is completely uncached, so splitting the clause with
waitcnts shouldn't hurt, it might even be beneficial because the first
LDS store can start earlier.

Foz-DB Navi48:
Totals from 170 (0.21% of 80287) affected shaders:
Instrs: 239633 -> 240148 (+0.21%)
CodeSize: 1276584 -> 1278532 (+0.15%)
Latency: 3788507 -> 3789876 (+0.04%); split: -0.01%, +0.04%
InvThroughput: 841637 -> 841694 (+0.01%); split: -0.01%, +0.02%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37701>
2025-10-07 13:12:45 +00:00
Samuel Pitoiset
c177bf81b4 radv: fix expected disk cache size for meta shaders
Math can go wrong.

If the disk cache size is too small, buckets are evicted and this
might cause stuttering when starting applications.

Fixes: 4fc856af98 ("radv: fix caching on-demand meta shaders")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13930
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37718>
2025-10-07 12:50:41 +00:00
Rhys Perry
dfa8ac6b91 aco: remove buffer_load_lds instructions
They don't exist

See https://github.com/llvm/llvm-project/pull/132916

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14041
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37716>
2025-10-07 09:50:26 +00:00
Samuel Pitoiset
08ddf2f878 radv: lower embedded/immutable samplers earlier
Lowering them earlier right after VTN would allow us to implement
embedded samplers for descriptor heap properly for merged shaders.

Non-immediate samplers are still lowered in
radv_nir_apply_pipeline_layout because they require shader arguments.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37688>
2025-10-07 09:25:28 +00:00
Samuel Pitoiset
cb746e2d84 radv: lower ycbcr tex instructions earlier
There is no real advantage to delay this lowering.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37688>
2025-10-07 09:25:27 +00:00
Samuel Pitoiset
b8bdc68933 radv/ci: update expected list of failures for VEGA10/NAVI10
Since a8f4a2a9ba ("radv/video: Check FW version before using
WRITE_MEMORY") presumably.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37733>
2025-10-07 08:06:54 +00:00
Benjamin Cheng
364a2488ad radv/video: Report extra image usages
ENCODE_SRC and DECODE_DST are transparent and can have additional
usages.

Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37656>
2025-10-06 21:27:48 +00:00
Benjamin Cheng
d1872c45ae radv/video: Fix video profile reporting
Use vk_video_is_profile_supported first, and add AMD specific
restrictions later.

vulkaninfo reports on Navi31:
    H.264 Decode (4:2:0 8-bit) Baseline progressive
    H.264 Decode (4:2:0 8-bit) Main progressive
    H.264 Decode (4:2:0 8-bit) High progressive
    H.264 Decode (4:2:0 8-bit) Baseline interlaced (interleaved lines)
    H.264 Decode (4:2:0 8-bit) Main interlaced (interleaved lines)
    H.264 Decode (4:2:0 8-bit) High interlaced (interleaved lines)
    H.264 Decode (monochrome 8-bit) High progressive
    H.264 Decode (monochrome 8-bit) High interlaced (interleaved lines)
    H.265 Decode (4:2:0 8-bit) Main
    H.265 Decode (4:2:0 8-bit) Main 10
    H.265 Decode (4:2:0 8-bit) Main Still Picture
    H.265 Decode (4:2:0 10-bit) Main 10
    VP9 Decode (4:2:0 8-bit) Profile 0
    VP9 Decode (4:2:0 10-bit) Profile 2
    AV1 Decode (4:2:0 8-bit) Main with film grain support
    AV1 Decode (4:2:0 8-bit) Main without film grain support
    AV1 Decode (4:2:0 10-bit) Main with film grain support
    AV1 Decode (4:2:0 10-bit) Main without film grain support
    AV1 Decode (4:2:0 12-bit) Professional with film grain support
    AV1 Decode (4:2:0 12-bit) Professional without film grain support
    AV1 Decode (monochrome 8-bit) Main with film grain support
    AV1 Decode (monochrome 8-bit) Main without film grain support
    AV1 Decode (monochrome 10-bit) Main with film grain support
    AV1 Decode (monochrome 10-bit) Main without film grain support
    AV1 Decode (monochrome 12-bit) Professional with film grain support
    AV1 Decode (monochrome 12-bit) Professional without film grain support
    H.264 Encode (4:2:0 8-bit) Baseline
    H.264 Encode (4:2:0 8-bit) Main
    H.264 Encode (4:2:0 8-bit) High
    H.265 Encode (4:2:0 8-bit) Main
    H.265 Encode (4:2:0 8-bit) Main 10
    H.265 Encode (4:2:0 8-bit) Main Still Picture
    H.265 Encode (4:2:0 10-bit) Main 10
    AV1 Encode (4:2:0 8-bit) Main
    AV1 Encode (4:2:0 10-bit) Main

Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37656>
2025-10-06 21:27:48 +00:00
David Rosca
59a3ca2333 radv/video: Fix waiting on encode feedback query
Currently we wait until the second dword in feedback buffer changes
from 0 to 1, and then the rest of the feedback is read. There is no
guarantee that the rest of the feedback will be available, which can
cause bitstream size to be incorrectly returned as 0.

Add write memory command after encode, marking the query as available
to ensure the entire feedback buffer is ready.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13601
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36772>
2025-10-06 10:32:54 +00:00
David Rosca
a8f4a2a9ba radv/video: Check FW version before using WRITE_MEMORY
Move the version check to separate function so that it can
also be used elsewhere.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36772>
2025-10-06 10:32:54 +00:00
David Rosca
40c124e67a radv: Change radv_vcn_write_event to a write memory func
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36772>
2025-10-06 10:32:53 +00:00
Samuel Pitoiset
874bc09537 radv: reserve more CS space when executing DGC calls
This can trigger an assert otherwise. The space reserved before
executing DGC IBs is an arbitrary number which should be large enough
in all cases.

Found this while implementing descriptor heap.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37681>
2025-10-06 06:28:18 +00:00
Bas Nieuwenhuizen
82d06b58ad radv: use vk_drm_syncobj_copy_payloads
Based on a patch by llyyr <llyyr.public@gmail.com>:

!36827 added the copy_sync_payloads function, but didn't enable use of
it in radv. This commit mirrors similar MRs for anv/panvk/nvk and uses
the common vk_drm_syncobj_copy_payloads function for copy_sync_payloads.

I'm not too familiar with radv internals, so there's potentially a good
reason why this isn't a good change. However, I've personally been using
this patch locally for around a month and have experienced no
regressions and around 8% uplift on vkmark test scores with a 6600 XT.

[vertex] device-local=true: 45110 -> 48489 (+7.5%)
[vertex] device-local=false: 17529 -> 17488 (-0.2%)
[texture] anisotropy=0: 44768 -> 48679 (+8.7%)
[texture] anisotropy=16: 44920 -> 48572 (+8.1%)
[shading] shading=gouraud: 44931 -> 48467 (+7.9%)
[shading] shading=blinn-phong-inf: 44849 -> 48740 (+8.7%)
[shading] shading=phong: 44695 -> 48645 (+8.8%)
[shading] shading=cel: 44809 -> 47938 (+7.0%)
[effect2d] kernel=edge: 45185 -> 47837 (+5.9%)
[effect2d] kernel=blur: 26919 -> 26762 (-0.6%)
[desktop] <default>: 40974 -> 44034 (+7.5%)
[cube] <default>: 45090 -> 49270 (+9.3%)
[clear] <default>: 41102 -> 44375 (+8.0%)

(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37606)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37640>
2025-10-06 00:45:09 +00:00
Yinjie Yao
f0f95a9ae3 ac/parse_ib: Update vcn ib parser to include missing commands
Signed-off-by: Yinjie Yao <yinjie.yao@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37672>
2025-10-03 14:44:07 +00:00
Samuel Pitoiset
38892cb558 radv: only expose AMD_device_coherent_memory if actually supported
This fixes an issue after a recent update to
dEQP-VK.info.device_mandatory_features.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37663>
2025-10-03 14:26:32 +00:00
Samuel Pitoiset
e2db50c97b Revert "radv/ci: document recent unexpected failures on TAHITI"
This reverts commit abd2a79264.

Fixed by 93ce29c42e ("amd: don't allow unsigned wraps for shared
memory offsets on GFX6").

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37685>
2025-10-03 13:37:16 +02:00
Daniel Schürmann
0e3bc3d8c0 nir/opt_offsets: call allow_offset_wrap() for try_fold_shared2()
This prevents applying wrapping offsets on GFX6.

Fixes: e1a692f74b ('nir/opt_offsets: allow for unsigned wraps when folding load/store_shared2_amd offsets')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37667>
2025-10-03 07:54:12 +00:00
Daniel Schürmann
93ce29c42e amd: don't allow unsigned wraps for shared memory offsets on GFX6
Fixes: 10266e7b21 ('radv: allow for unsigned wraps for shared memory intrinsics in nir_opt_offsets')
Fixes: dd68825feb ('radeonsi: allow for unsigned wraps for shared memory intrinsics in nir_opt_offsets')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37667>
2025-10-03 07:54:12 +00:00
abdelhadi
5c82a3e114 aco: fix debug info offset
Signed-off-by: abdelhadi <abdelhadims@icloud.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37244>
2025-10-02 13:38:56 +00:00
Samuel Pitoiset
abd2a79264 radv/ci: document recent unexpected failures on TAHITI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37664>
2025-10-02 13:10:32 +00:00
Vitaliy Triang3l Kuzmin
dea20be1b3 ac: Enable HTILE TC Z clear value bug workaround on GFX1013
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33962>
2025-10-02 08:29:50 +00:00
Vitaliy Triang3l Kuzmin
4e3a5f60e1 radv,ac: Split has_tc_compat_zrange_bug into Z and ZS, document it
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33962>
2025-10-02 08:29:49 +00:00
Vitaliy Triang3l Kuzmin
5243f292ef radv,ac: GFX10 depth/stencil HTILE mipmap bug info variable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33962>
2025-10-02 08:29:48 +00:00
Georg Lehmann
9533e7cdae aco/optimizer: fix incorrect operand order assumption for neg(mul) opt
The code that labels instructions doesn't care about the order either.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14013
Cc: mesa-stable

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37643>
2025-10-01 20:52:12 +00:00
Natalie Vock
52c7b0d20c radv/bvh: Encode empty AS bounds as NaN
If there are no leaves, the root node bounds still span -inf/inf.
Making empty BLASs infinite-sized guarantees ray traversal needs to
enter the BLAS (and immediately exit because it's empty). Remove the
BLAS from the BVH entirely by marking its bounds as NaN. As a bonus,
this works around RADV encountering issues in Silent Hill 2 on RDNA4 due
to infinite-sized BVHs.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37492>
2025-10-01 14:27:15 +00:00
Samuel Pitoiset
29ccbb21f3 radv: add a helper whether shader fp16 is enabled
To remove code duplication.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37619>
2025-09-29 16:17:11 +00:00
Timur Kristóf
d3579190d6 ac/nir/ngg: Fix scalarized mesh primitive indices
Take the write_mask into account when storing primitive indices,
otherwise they will end up being stored in the wrong place.

Fixes: 8e24d3426d ("ac/nir/ngg: Refactor MS primitive indices for scalarized IO.")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37610>
2025-09-29 08:07:54 +00:00
Timur Kristóf
3dc9c1a91e ac/nir/ngg: Remove dead code for 64-bit mesh shader variables
We already lower all 64-bit I/O to 32-bit before this pass,
and the rest of the code here already asserts that I/O variables
must be 32-bit or smaller.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37610>
2025-09-29 08:07:54 +00:00
Georg Lehmann
a7f8c6ed60 radv: call nir_opt_undef late too
Foz-DB GFX1201:
Totals from 2263 (2.82% of 80287) affected shaders:
MaxWaves: 57164 -> 57016 (-0.26%); split: +0.04%, -0.30%
Instrs: 2711595 -> 2678247 (-1.23%); split: -1.23%, +0.00%
CodeSize: 14066656 -> 13929720 (-0.97%); split: -1.01%, +0.03%
VGPRs: 139452 -> 140004 (+0.40%); split: -0.03%, +0.42%
Latency: 15902794 -> 15875935 (-0.17%); split: -0.17%, +0.00%
InvThroughput: 2179122 -> 2165716 (-0.62%); split: -0.62%, +0.00%
SClause: 61416 -> 61477 (+0.10%); split: -0.01%, +0.11%
Copies: 169781 -> 175175 (+3.18%); split: -0.05%, +3.22%
Branches: 53491 -> 53469 (-0.04%)
PreSGPRs: 114087 -> 114086 (-0.00%)
PreVGPRs: 115702 -> 115697 (-0.00%)
VALU: 1555907 -> 1535514 (-1.31%); split: -1.31%, +0.00%
SALU: 362560 -> 353803 (-2.42%)
SMEM: 106263 -> 106259 (-0.00%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37552>
2025-09-26 15:11:26 +00:00
Georg Lehmann
8343e45467 aco/lower_branches: update branch hints after changing jump targets
Fixes: 13ad3db43f ("aco/lower_branches: implement try_remove_simple_block() in lower_branches()")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37552>
2025-09-26 15:11:26 +00:00
Simon McVittie
9d36bf891b vulkan: Compute path to write into JSON manifests once, use it everywhere
This reduces duplication: we only need to distinguish between Windows
and Unix in one place.

The previous code was inconsistent about using either the `platforms`
option, or the `host_machine`. Following the logic described in
commit 94379377 "lavapipe: build "Windows" check should use the host machine, not the `platforms` option.",
I've assumed that checking the host machine is the more-correct version
and used that.

Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>
2025-09-26 10:47:31 +00:00
Simon McVittie
be8cac52d3 vulkan: Consistently form driver library names as prefix + name + suffix
This consistently uses `NAME.dll` on Windows, `libNAME.dylib` on Darwin
derivatives such as macOS, and `libNAME.so` on Linux, *BSD and so on.
It's also consistent about using the local variable name `icd_file_name`
for this name in every Vulkan driver, which was already the case in many
but not all drivers.

Some of these drivers probably don't make sense (or don't work) on
Windows and/or macOS, but if this is kept consistent for all drivers,
it should avoid the need for driver-specific commits like
commit 611e9f29e "lavapipe: fix icd generation for windows",
commit 951f3287 "lavapipe: set empty dll prefix",
commit 13e7a39f "lavapipe: fixes for macOS support",
commit 7008e655 "radv: Update JSON generator if Windows" and so on,
each time a driver is found to be relevant on more platforms than
previously believed.

Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>
2025-09-26 10:47:31 +00:00
Georg Lehmann
cc08786689 aco: use maximum RT vgpr_limit that doesn't reduce wave count
144 instead of 132 with 5 waves, in practice.

Foz-DB Navi31:
Totals from 33 (0.04% of 80273) affected shaders:
Instrs: 3266241 -> 3261329 (-0.15%)
CodeSize: 16885356 -> 16860088 (-0.15%)
VGPRs: 4356 -> 4752 (+9.09%)
SpillVGPRs: 2504 -> 1535 (-38.70%)
Scratch: 264704 -> 216320 (-18.28%)
Latency: 18445909 -> 18395904 (-0.27%)
InvThroughput: 3689182 -> 3679182 (-0.27%)
VClause: 85171 -> 84595 (-0.68%)
SClause: 59365 -> 59320 (-0.08%); split: -0.08%, +0.01%
Copies: 260528 -> 259113 (-0.54%); split: -0.59%, +0.05%
Branches: 92537 -> 92519 (-0.02%)
VALU: 1937426 -> 1935925 (-0.08%); split: -0.08%, +0.01%
SALU: 393075 -> 393047 (-0.01%); split: -0.01%, +0.01%
VMEM: 147914 -> 146003 (-1.29%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37548>
2025-09-26 08:45:05 +00:00
Georg Lehmann
8e03505782 aco: don't insert s_sendmsg dealloc_vgprs with little vgprs allocated
Reduces message bus traffic when the benefit is small.

Foz-DB Navi31:
Totals from 3752 (4.67% of 80273) affected shaders:
Instrs: 1999755 -> 1992249 (-0.38%)
CodeSize: 10531824 -> 10501800 (-0.29%)
Latency: 14935247 -> 14935147 (-0.00%)
InvThroughput: 5976053 -> 5975262 (-0.01%)

Foz-DB Navi33:
Totals from 2614 (3.26% of 80273) affected shaders:
Instrs: 969475 -> 964247 (-0.54%)
CodeSize: 5171240 -> 5150328 (-0.40%)
Latency: 7891519 -> 7891434 (-0.00%)
InvThroughput: 4815008 -> 4814287 (-0.01%); split: -0.01%, +0.00%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508>
2025-09-26 07:51:02 +00:00