AlexIndustrial/mesa

Author	SHA1	Message	Date
Rhys Perry	20af16b4d8	aco: use MTBUF for 64-bit atomic load/store A 64-bit atomic load/store should be considered entirely out-of-bounds if any part of it is out-of-bounds. Since we implemented these as 32-bit vec2 load/store, it would have been possible for the first half to be in-bounds while the second half is out-of-bounds. From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification: > Any non-atomic access to a uniform, storage, uniform texel, or storage > texel buffer wider than 32-bits may be treated as multiple 32-bit > accesses that are separately bounds checked. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:31 +00:00
Rhys Perry	f905acfada	aco: remove barrier acquire/release workaround This existed since `ccfe9813fb` because NIR had no atomic loads/stores. This is no longer the case. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:31 +00:00
Rhys Perry	271b135b03	aco: set atomic semantic for atomic load/store Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:30 +00:00
Rhys Perry	74b807cf58	aco: only workaround load tearing for atomic loads For non-atomic loads, this situation would require a data race. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>	2025-10-07 17:41:30 +00:00
Timur Kristóf	c473b0b551	radv/amdgpu: Allow IB2 when primary CS isn't chained The primary CS doesn't need to use chaining in order to use IB2. Allow using IB2 packets when chaining is disabled. Rationale for this patch: When chaining is enabled (the default), this patch removes a useless check. When chaining is disabled (by noibchaining), this patch allows us to use IB2 without chaining. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:49:02 +00:00
Timur Kristóf	503963c08c	radv/amdgpu: Support IB2 without chaining, enable on GFX6 GFX6 supports IB2 but not chaining within an IB2. To use IB2 on GFX6, disable chaining in secondary CS, and emit an IB2 packet for each secondary IB. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:49:02 +00:00
Timur Kristóf	92ba76710d	ac/gpu_info: Add can_chain_ib2 field to ac_gpu_info GFX6 supports IB2, but not chaining inside IB2. It only supports chaining in IB1. See waCpIb2ChainingUnsupported in PAL. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:49:01 +00:00
Timur Kristóf	2091db2461	radv/amdgpu: Small cleanup of counting submitted IBs Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:49:01 +00:00
Timur Kristóf	fd5c50664e	radv/amdgpu: Emit a single 4 dword NOP in chainable CS buffers This is a small optimization that should slightly reduce the CP overhead for all GPUs as we now only emit a single NOP packet instead of 4. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:49:01 +00:00
Timur Kristóf	e6a1355bd5	radv/amdgpu: Add a helper function to emit NOP packets No functional changes, just make the code a bit easier to read. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:49:00 +00:00
Timur Kristóf	e20080315b	radv/amdgpu: Don't assert chaining match when copying secondary IB This assertion is useless. In this code path it is not relevant whether or not the primary CS support chaining. And it is already handled when the secondary has chaining. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:49:00 +00:00
Timur Kristóf	df58cac660	radv: Rename RADV_DEBUG=noibs to noibchaining Clarify what it actually means. Also fix the documentation in envvars.rst to better describe it. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:48:59 +00:00
Timur Kristóf	3902cffab7	radv/amdgpu: Rename use_ib to chain_ib All CS always use IBs, so the naming was confusing. Rename these fields to chain_ib to better reflect what it actually means, which is enabling chaining: radv_amdgpu_winsys::use_ib_bos radv_amdgpu_cs::chain_ib Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>	2025-10-07 15:48:59 +00:00
Georg Lehmann	84f26ed117	nir: optimize atomic isub if supported Foz-DB Navi48: Totals from 1 (0.00% of 80287) affected shaders: Instrs: 1641 -> 1637 (-0.24%) CodeSize: 8472 -> 8456 (-0.19%) Latency: 19132 -> 19131 (-0.01%) InvThroughput: 9566 -> 9565 (-0.01%) Copies: 126 -> 125 (-0.79%) VALU: 565 -> 563 (-0.35%) SALU: 439 -> 438 (-0.23%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>	2025-10-07 14:07:56 +00:00
Georg Lehmann	d514696a0c	aco/isel: support nir_op_atomic_isub Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>	2025-10-07 14:07:56 +00:00
Georg Lehmann	65227ef325	ac/llvm: support nir_atomic_op_isub Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37702>	2025-10-07 14:07:56 +00:00
Georg Lehmann	a173e51541	aco/insert_waitcnt: don't merge waitcnts for LDS clauses We form LDS clauses because heavily interleaving LDS and VALU leads to false dependencies. But LDS is completely uncached, so splitting the clause with waitcnts shouldn't hurt, it might even be beneficial because the first LDS store can start earlier. Foz-DB Navi48: Totals from 170 (0.21% of 80287) affected shaders: Instrs: 239633 -> 240148 (+0.21%) CodeSize: 1276584 -> 1278532 (+0.15%) Latency: 3788507 -> 3789876 (+0.04%); split: -0.01%, +0.04% InvThroughput: 841637 -> 841694 (+0.01%); split: -0.01%, +0.02% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37701>	2025-10-07 13:12:45 +00:00
Samuel Pitoiset	c177bf81b4	radv: fix expected disk cache size for meta shaders Math can go wrong. If the disk cache size is too small, buckets are evicted and this might cause stuttering when starting applications. Fixes: `4fc856af98` ("radv: fix caching on-demand meta shaders") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13930 Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37718>	2025-10-07 12:50:41 +00:00
Rhys Perry	dfa8ac6b91	aco: remove buffer_load_lds instructions They don't exist See https://github.com/llvm/llvm-project/pull/132916 Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14041 Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37716>	2025-10-07 09:50:26 +00:00
Samuel Pitoiset	08ddf2f878	radv: lower embedded/immutable samplers earlier Lowering them earlier right after VTN would allow us to implement embedded samplers for descriptor heap properly for merged shaders. Non-immediate samplers are still lowered in radv_nir_apply_pipeline_layout because they require shader arguments. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37688>	2025-10-07 09:25:28 +00:00
Samuel Pitoiset	cb746e2d84	radv: lower ycbcr tex instructions earlier There is no real advantage to delay this lowering. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37688>	2025-10-07 09:25:27 +00:00
Samuel Pitoiset	b8bdc68933	radv/ci: update expected list of failures for VEGA10/NAVI10 Since `a8f4a2a9ba` ("radv/video: Check FW version before using WRITE_MEMORY") presumably. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37733>	2025-10-07 08:06:54 +00:00
Benjamin Cheng	364a2488ad	radv/video: Report extra image usages ENCODE_SRC and DECODE_DST are transparent and can have additional usages. Reviewed-by: David Rosca <david.rosca@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37656>	2025-10-06 21:27:48 +00:00
Benjamin Cheng	d1872c45ae	radv/video: Fix video profile reporting Use vk_video_is_profile_supported first, and add AMD specific restrictions later. vulkaninfo reports on Navi31: H.264 Decode (4:2:0 8-bit) Baseline progressive H.264 Decode (4:2:0 8-bit) Main progressive H.264 Decode (4:2:0 8-bit) High progressive H.264 Decode (4:2:0 8-bit) Baseline interlaced (interleaved lines) H.264 Decode (4:2:0 8-bit) Main interlaced (interleaved lines) H.264 Decode (4:2:0 8-bit) High interlaced (interleaved lines) H.264 Decode (monochrome 8-bit) High progressive H.264 Decode (monochrome 8-bit) High interlaced (interleaved lines) H.265 Decode (4:2:0 8-bit) Main H.265 Decode (4:2:0 8-bit) Main 10 H.265 Decode (4:2:0 8-bit) Main Still Picture H.265 Decode (4:2:0 10-bit) Main 10 VP9 Decode (4:2:0 8-bit) Profile 0 VP9 Decode (4:2:0 10-bit) Profile 2 AV1 Decode (4:2:0 8-bit) Main with film grain support AV1 Decode (4:2:0 8-bit) Main without film grain support AV1 Decode (4:2:0 10-bit) Main with film grain support AV1 Decode (4:2:0 10-bit) Main without film grain support AV1 Decode (4:2:0 12-bit) Professional with film grain support AV1 Decode (4:2:0 12-bit) Professional without film grain support AV1 Decode (monochrome 8-bit) Main with film grain support AV1 Decode (monochrome 8-bit) Main without film grain support AV1 Decode (monochrome 10-bit) Main with film grain support AV1 Decode (monochrome 10-bit) Main without film grain support AV1 Decode (monochrome 12-bit) Professional with film grain support AV1 Decode (monochrome 12-bit) Professional without film grain support H.264 Encode (4:2:0 8-bit) Baseline H.264 Encode (4:2:0 8-bit) Main H.264 Encode (4:2:0 8-bit) High H.265 Encode (4:2:0 8-bit) Main H.265 Encode (4:2:0 8-bit) Main 10 H.265 Encode (4:2:0 8-bit) Main Still Picture H.265 Encode (4:2:0 10-bit) Main 10 AV1 Encode (4:2:0 8-bit) Main AV1 Encode (4:2:0 10-bit) Main Reviewed-by: David Rosca <david.rosca@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37656>	2025-10-06 21:27:48 +00:00
David Rosca	59a3ca2333	radv/video: Fix waiting on encode feedback query Currently we wait until the second dword in feedback buffer changes from 0 to 1, and then the rest of the feedback is read. There is no guarantee that the rest of the feedback will be available, which can cause bitstream size to be incorrectly returned as 0. Add write memory command after encode, marking the query as available to ensure the entire feedback buffer is ready. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13601 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36772>	2025-10-06 10:32:54 +00:00
David Rosca	a8f4a2a9ba	radv/video: Check FW version before using WRITE_MEMORY Move the version check to separate function so that it can also be used elsewhere. Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36772>	2025-10-06 10:32:54 +00:00
David Rosca	40c124e67a	radv: Change radv_vcn_write_event to a write memory func Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36772>	2025-10-06 10:32:53 +00:00
Samuel Pitoiset	874bc09537	radv: reserve more CS space when executing DGC calls This can trigger an assert otherwise. The space reserved before executing DGC IBs is an arbitrary number which should be large enough in all cases. Found this while implementing descriptor heap. Cc: mesa-stable Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37681>	2025-10-06 06:28:18 +00:00
Bas Nieuwenhuizen	82d06b58ad	radv: use vk_drm_syncobj_copy_payloads Based on a patch by llyyr <llyyr.public@gmail.com>: !36827 added the copy_sync_payloads function, but didn't enable use of it in radv. This commit mirrors similar MRs for anv/panvk/nvk and uses the common vk_drm_syncobj_copy_payloads function for copy_sync_payloads. I'm not too familiar with radv internals, so there's potentially a good reason why this isn't a good change. However, I've personally been using this patch locally for around a month and have experienced no regressions and around 8% uplift on vkmark test scores with a 6600 XT. [vertex] device-local=true: 45110 -> 48489 (+7.5%) [vertex] device-local=false: 17529 -> 17488 (-0.2%) [texture] anisotropy=0: 44768 -> 48679 (+8.7%) [texture] anisotropy=16: 44920 -> 48572 (+8.1%) [shading] shading=gouraud: 44931 -> 48467 (+7.9%) [shading] shading=blinn-phong-inf: 44849 -> 48740 (+8.7%) [shading] shading=phong: 44695 -> 48645 (+8.8%) [shading] shading=cel: 44809 -> 47938 (+7.0%) [effect2d] kernel=edge: 45185 -> 47837 (+5.9%) [effect2d] kernel=blur: 26919 -> 26762 (-0.6%) [desktop] <default>: 40974 -> 44034 (+7.5%) [cube] <default>: 45090 -> 49270 (+9.3%) [clear] <default>: 41102 -> 44375 (+8.0%) (https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37606) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37640>	2025-10-06 00:45:09 +00:00
Yinjie Yao	f0f95a9ae3	ac/parse_ib: Update vcn ib parser to include missing commands Signed-off-by: Yinjie Yao <yinjie.yao@amd.com> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: David Rosca <david.rosca@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37672>	2025-10-03 14:44:07 +00:00
Samuel Pitoiset	38892cb558	radv: only expose AMD_device_coherent_memory if actually supported This fixes an issue after a recent update to dEQP-VK.info.device_mandatory_features. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37663>	2025-10-03 14:26:32 +00:00
Samuel Pitoiset	e2db50c97b	Revert "radv/ci: document recent unexpected failures on TAHITI" This reverts commit `abd2a79264`. Fixed by `93ce29c42e` ("amd: don't allow unsigned wraps for shared memory offsets on GFX6"). Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37685>	2025-10-03 13:37:16 +02:00
Daniel Schürmann	0e3bc3d8c0	nir/opt_offsets: call allow_offset_wrap() for try_fold_shared2() This prevents applying wrapping offsets on GFX6. Fixes: `e1a692f74b` ('nir/opt_offsets: allow for unsigned wraps when folding load/store_shared2_amd offsets') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37667>	2025-10-03 07:54:12 +00:00
Daniel Schürmann	93ce29c42e	amd: don't allow unsigned wraps for shared memory offsets on GFX6 Fixes: `10266e7b21` ('radv: allow for unsigned wraps for shared memory intrinsics in nir_opt_offsets') Fixes: `dd68825feb` ('radeonsi: allow for unsigned wraps for shared memory intrinsics in nir_opt_offsets') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37667>	2025-10-03 07:54:12 +00:00
abdelhadi	5c82a3e114	aco: fix debug info offset Signed-off-by: abdelhadi <abdelhadims@icloud.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37244>	2025-10-02 13:38:56 +00:00
Samuel Pitoiset	abd2a79264	radv/ci: document recent unexpected failures on TAHITI Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37664>	2025-10-02 13:10:32 +00:00
Vitaliy Triang3l Kuzmin	dea20be1b3	ac: Enable HTILE TC Z clear value bug workaround on GFX1013 Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33962>	2025-10-02 08:29:50 +00:00
Vitaliy Triang3l Kuzmin	4e3a5f60e1	radv,ac: Split has_tc_compat_zrange_bug into Z and ZS, document it Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33962>	2025-10-02 08:29:49 +00:00
Vitaliy Triang3l Kuzmin	5243f292ef	radv,ac: GFX10 depth/stencil HTILE mipmap bug info variable Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33962>	2025-10-02 08:29:48 +00:00
Georg Lehmann	9533e7cdae	aco/optimizer: fix incorrect operand order assumption for neg(mul) opt The code that labels instructions doesn't care about the order either. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14013 Cc: mesa-stable Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37643>	2025-10-01 20:52:12 +00:00
Natalie Vock	52c7b0d20c	radv/bvh: Encode empty AS bounds as NaN If there are no leaves, the root node bounds still span -inf/inf. Making empty BLASs infinite-sized guarantees ray traversal needs to enter the BLAS (and immediately exit because it's empty). Remove the BLAS from the BVH entirely by marking its bounds as NaN. As a bonus, this works around RADV encountering issues in Silent Hill 2 on RDNA4 due to infinite-sized BVHs. Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37492>	2025-10-01 14:27:15 +00:00
Samuel Pitoiset	29ccbb21f3	radv: add a helper whether shader fp16 is enabled To remove code duplication. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37619>	2025-09-29 16:17:11 +00:00
Timur Kristóf	d3579190d6	ac/nir/ngg: Fix scalarized mesh primitive indices Take the write_mask into account when storing primitive indices, otherwise they will end up being stored in the wrong place. Fixes: `8e24d3426d` ("ac/nir/ngg: Refactor MS primitive indices for scalarized IO.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37610>	2025-09-29 08:07:54 +00:00
Timur Kristóf	3dc9c1a91e	ac/nir/ngg: Remove dead code for 64-bit mesh shader variables We already lower all 64-bit I/O to 32-bit before this pass, and the rest of the code here already asserts that I/O variables must be 32-bit or smaller. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37610>	2025-09-29 08:07:54 +00:00
Georg Lehmann	a7f8c6ed60	radv: call nir_opt_undef late too Foz-DB GFX1201: Totals from 2263 (2.82% of 80287) affected shaders: MaxWaves: 57164 -> 57016 (-0.26%); split: +0.04%, -0.30% Instrs: 2711595 -> 2678247 (-1.23%); split: -1.23%, +0.00% CodeSize: 14066656 -> 13929720 (-0.97%); split: -1.01%, +0.03% VGPRs: 139452 -> 140004 (+0.40%); split: -0.03%, +0.42% Latency: 15902794 -> 15875935 (-0.17%); split: -0.17%, +0.00% InvThroughput: 2179122 -> 2165716 (-0.62%); split: -0.62%, +0.00% SClause: 61416 -> 61477 (+0.10%); split: -0.01%, +0.11% Copies: 169781 -> 175175 (+3.18%); split: -0.05%, +3.22% Branches: 53491 -> 53469 (-0.04%) PreSGPRs: 114087 -> 114086 (-0.00%) PreVGPRs: 115702 -> 115697 (-0.00%) VALU: 1555907 -> 1535514 (-1.31%); split: -1.31%, +0.00% SALU: 362560 -> 353803 (-2.42%) SMEM: 106263 -> 106259 (-0.00%) Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37552>	2025-09-26 15:11:26 +00:00
Georg Lehmann	8343e45467	aco/lower_branches: update branch hints after changing jump targets Fixes: `13ad3db43f` ("aco/lower_branches: implement try_remove_simple_block() in lower_branches()") Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37552>	2025-09-26 15:11:26 +00:00
Simon McVittie	9d36bf891b	vulkan: Compute path to write into JSON manifests once, use it everywhere This reduces duplication: we only need to distinguish between Windows and Unix in one place. The previous code was inconsistent about using either the `platforms` option, or the `host_machine`. Following the logic described in commit `94379377` "lavapipe: build "Windows" check should use the host machine, not the `platforms` option.", I've assumed that checking the host machine is the more-correct version and used that. Signed-off-by: Simon McVittie <smcv@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>	2025-09-26 10:47:31 +00:00
Simon McVittie	be8cac52d3	vulkan: Consistently form driver library names as prefix + name + suffix This consistently uses `NAME.dll` on Windows, `libNAME.dylib` on Darwin derivatives such as macOS, and `libNAME.so` on Linux, *BSD and so on. It's also consistent about using the local variable name `icd_file_name` for this name in every Vulkan driver, which was already the case in many but not all drivers. Some of these drivers probably don't make sense (or don't work) on Windows and/or macOS, but if this is kept consistent for all drivers, it should avoid the need for driver-specific commits like commit `611e9f29e` "lavapipe: fix icd generation for windows", commit `951f3287` "lavapipe: set empty dll prefix", commit `13e7a39f` "lavapipe: fixes for macOS support", commit `7008e655` "radv: Update JSON generator if Windows" and so on, each time a driver is found to be relevant on more platforms than previously believed. Signed-off-by: Simon McVittie <smcv@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>	2025-09-26 10:47:31 +00:00
Georg Lehmann	cc08786689	aco: use maximum RT vgpr_limit that doesn't reduce wave count 144 instead of 132 with 5 waves, in practice. Foz-DB Navi31: Totals from 33 (0.04% of 80273) affected shaders: Instrs: 3266241 -> 3261329 (-0.15%) CodeSize: 16885356 -> 16860088 (-0.15%) VGPRs: 4356 -> 4752 (+9.09%) SpillVGPRs: 2504 -> 1535 (-38.70%) Scratch: 264704 -> 216320 (-18.28%) Latency: 18445909 -> 18395904 (-0.27%) InvThroughput: 3689182 -> 3679182 (-0.27%) VClause: 85171 -> 84595 (-0.68%) SClause: 59365 -> 59320 (-0.08%); split: -0.08%, +0.01% Copies: 260528 -> 259113 (-0.54%); split: -0.59%, +0.05% Branches: 92537 -> 92519 (-0.02%) VALU: 1937426 -> 1935925 (-0.08%); split: -0.08%, +0.01% SALU: 393075 -> 393047 (-0.01%); split: -0.01%, +0.01% VMEM: 147914 -> 146003 (-1.29%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37548>	2025-09-26 08:45:05 +00:00
Georg Lehmann	8e03505782	aco: don't insert s_sendmsg dealloc_vgprs with little vgprs allocated Reduces message bus traffic when the benefit is small. Foz-DB Navi31: Totals from 3752 (4.67% of 80273) affected shaders: Instrs: 1999755 -> 1992249 (-0.38%) CodeSize: 10531824 -> 10501800 (-0.29%) Latency: 14935247 -> 14935147 (-0.00%) InvThroughput: 5976053 -> 5975262 (-0.01%) Foz-DB Navi33: Totals from 2614 (3.26% of 80273) affected shaders: Instrs: 969475 -> 964247 (-0.54%) CodeSize: 5171240 -> 5150328 (-0.40%) Latency: 7891519 -> 7891434 (-0.00%) InvThroughput: 4815008 -> 4814287 (-0.01%); split: -0.01%, +0.00% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37508>	2025-09-26 07:51:02 +00:00

1 2 3 4 5 ...

18829 Commits