A 64-bit atomic load/store should be considered entirely out-of-bounds if
any part of it is out-of-bounds. Since we implemented these as 32-bit vec2
load/store, it would have been possible for the first half to be in-bounds
while the second half is out-of-bounds.
From 9.6.1. Robust Buffer Access of Vulkan 1.4.324 specification:
> Any non-atomic access to a uniform, storage, uniform texel, or storage
> texel buffer wider than 32-bits may be treated as multiple 32-bit
> accesses that are separately bounds checked.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36602>
The primary CS doesn't need to use chaining in order to use IB2.
Allow using IB2 packets when chaining is disabled.
Rationale for this patch:
When chaining is enabled (the default), this patch removes a
useless check.
When chaining is disabled (by noibchaining), this patch allows us
to use IB2 without chaining.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
All CS always use IBs, so the naming was confusing.
Rename these fields to chain_ib to better reflect
what it actually means, which is enabling chaining:
radv_amdgpu_winsys::use_ib_bos
radv_amdgpu_cs::chain_ib
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37280>
We form LDS clauses because heavily interleaving LDS and VALU leads to false
dependencies. But LDS is completely uncached, so splitting the clause with
waitcnts shouldn't hurt, it might even be beneficial because the first
LDS store can start earlier.
Foz-DB Navi48:
Totals from 170 (0.21% of 80287) affected shaders:
Instrs: 239633 -> 240148 (+0.21%)
CodeSize: 1276584 -> 1278532 (+0.15%)
Latency: 3788507 -> 3789876 (+0.04%); split: -0.01%, +0.04%
InvThroughput: 841637 -> 841694 (+0.01%); split: -0.01%, +0.02%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37701>
Lowering them earlier right after VTN would allow us to implement
embedded samplers for descriptor heap properly for merged shaders.
Non-immediate samplers are still lowered in
radv_nir_apply_pipeline_layout because they require shader arguments.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37688>
Use vk_video_is_profile_supported first, and add AMD specific
restrictions later.
vulkaninfo reports on Navi31:
H.264 Decode (4:2:0 8-bit) Baseline progressive
H.264 Decode (4:2:0 8-bit) Main progressive
H.264 Decode (4:2:0 8-bit) High progressive
H.264 Decode (4:2:0 8-bit) Baseline interlaced (interleaved lines)
H.264 Decode (4:2:0 8-bit) Main interlaced (interleaved lines)
H.264 Decode (4:2:0 8-bit) High interlaced (interleaved lines)
H.264 Decode (monochrome 8-bit) High progressive
H.264 Decode (monochrome 8-bit) High interlaced (interleaved lines)
H.265 Decode (4:2:0 8-bit) Main
H.265 Decode (4:2:0 8-bit) Main 10
H.265 Decode (4:2:0 8-bit) Main Still Picture
H.265 Decode (4:2:0 10-bit) Main 10
VP9 Decode (4:2:0 8-bit) Profile 0
VP9 Decode (4:2:0 10-bit) Profile 2
AV1 Decode (4:2:0 8-bit) Main with film grain support
AV1 Decode (4:2:0 8-bit) Main without film grain support
AV1 Decode (4:2:0 10-bit) Main with film grain support
AV1 Decode (4:2:0 10-bit) Main without film grain support
AV1 Decode (4:2:0 12-bit) Professional with film grain support
AV1 Decode (4:2:0 12-bit) Professional without film grain support
AV1 Decode (monochrome 8-bit) Main with film grain support
AV1 Decode (monochrome 8-bit) Main without film grain support
AV1 Decode (monochrome 10-bit) Main with film grain support
AV1 Decode (monochrome 10-bit) Main without film grain support
AV1 Decode (monochrome 12-bit) Professional with film grain support
AV1 Decode (monochrome 12-bit) Professional without film grain support
H.264 Encode (4:2:0 8-bit) Baseline
H.264 Encode (4:2:0 8-bit) Main
H.264 Encode (4:2:0 8-bit) High
H.265 Encode (4:2:0 8-bit) Main
H.265 Encode (4:2:0 8-bit) Main 10
H.265 Encode (4:2:0 8-bit) Main Still Picture
H.265 Encode (4:2:0 10-bit) Main 10
AV1 Encode (4:2:0 8-bit) Main
AV1 Encode (4:2:0 10-bit) Main
Reviewed-by: David Rosca <david.rosca@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37656>
Based on a patch by llyyr <llyyr.public@gmail.com>:
!36827 added the copy_sync_payloads function, but didn't enable use of
it in radv. This commit mirrors similar MRs for anv/panvk/nvk and uses
the common vk_drm_syncobj_copy_payloads function for copy_sync_payloads.
I'm not too familiar with radv internals, so there's potentially a good
reason why this isn't a good change. However, I've personally been using
this patch locally for around a month and have experienced no
regressions and around 8% uplift on vkmark test scores with a 6600 XT.
[vertex] device-local=true: 45110 -> 48489 (+7.5%)
[vertex] device-local=false: 17529 -> 17488 (-0.2%)
[texture] anisotropy=0: 44768 -> 48679 (+8.7%)
[texture] anisotropy=16: 44920 -> 48572 (+8.1%)
[shading] shading=gouraud: 44931 -> 48467 (+7.9%)
[shading] shading=blinn-phong-inf: 44849 -> 48740 (+8.7%)
[shading] shading=phong: 44695 -> 48645 (+8.8%)
[shading] shading=cel: 44809 -> 47938 (+7.0%)
[effect2d] kernel=edge: 45185 -> 47837 (+5.9%)
[effect2d] kernel=blur: 26919 -> 26762 (-0.6%)
[desktop] <default>: 40974 -> 44034 (+7.5%)
[cube] <default>: 45090 -> 49270 (+9.3%)
[clear] <default>: 41102 -> 44375 (+8.0%)
(https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37606)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37640>
If there are no leaves, the root node bounds still span -inf/inf.
Making empty BLASs infinite-sized guarantees ray traversal needs to
enter the BLAS (and immediately exit because it's empty). Remove the
BLAS from the BVH entirely by marking its bounds as NaN. As a bonus,
this works around RADV encountering issues in Silent Hill 2 on RDNA4 due
to infinite-sized BVHs.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37492>
This reduces duplication: we only need to distinguish between Windows
and Unix in one place.
The previous code was inconsistent about using either the `platforms`
option, or the `host_machine`. Following the logic described in
commit 94379377 "lavapipe: build "Windows" check should use the host machine, not the `platforms` option.",
I've assumed that checking the host machine is the more-correct version
and used that.
Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>
This consistently uses `NAME.dll` on Windows, `libNAME.dylib` on Darwin
derivatives such as macOS, and `libNAME.so` on Linux, *BSD and so on.
It's also consistent about using the local variable name `icd_file_name`
for this name in every Vulkan driver, which was already the case in many
but not all drivers.
Some of these drivers probably don't make sense (or don't work) on
Windows and/or macOS, but if this is kept consistent for all drivers,
it should avoid the need for driver-specific commits like
commit 611e9f29e "lavapipe: fix icd generation for windows",
commit 951f3287 "lavapipe: set empty dll prefix",
commit 13e7a39f "lavapipe: fixes for macOS support",
commit 7008e655 "radv: Update JSON generator if Windows" and so on,
each time a driver is found to be relevant on more platforms than
previously believed.
Signed-off-by: Simon McVittie <smcv@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37576>