Basically just reserve the memory in the descriptor sets.
On the shader side we construct a buffer descriptor, since
AFAIU VGPR indexing on 32-bit pointers in LLVM is still broken.
This fully supports update after bind and variable descriptor set
sizes. However, the limits are somewhat arbitrary and are mostly
about finding a reasonable division of a 2 GiB max memory size over
the set.
v2: - rebased on top of master (Samuel)
- remove the loading resources rework (Samuel)
- only load UBO descriptors if it's a pointer (Samuel)
- use LLVMBuildPtrToInt to avoid IR failures (Samuel)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (v2)
This is actually fixed now.
This change requires LLVM r358579. Make sure to have it in
your tree, otherwise the following piglit will hang:
tests/spec/arb_shader_storage_buffer_object/execution/ssbo-atomicCompSwap-int.shader_test
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
They are buggy with older LLVM version, see r358579.
Fixes: 78c551aca1 ("ac/nir: use new LLVM 8 intrinsics for SSBO atomics except cmpswap")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
They are buggy with LLVM 8 because they weren't marked as source
of divergence, see r358579.
Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")"
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Use the raw version (ie. IDXEN=0) because vindex is unused.
Use the old intrinsic for compare&swap because the new one
hangs the GPU for some reasons.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Both Vulkan and OpenGL might be using glsl_types simultaneously or we
can also have multiple concurrent Vulkan instances using glsl_types.
Patch adds a one time init to track number of users and will release
types only when last user calls _glsl_type_singleton_decref().
This change fixes glsl_type memory leaks we have with anv driver.
v2: reuse hash_mutex, cleanup, apply fix also to radv driver and
rename helper functions (Jason)
v3: move init, destroy to happen on GL context init and destroy
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Even if we don't use local buffers in general. Turns out that even
though the performance is not the best the kernel still does it
better than our own list.
We still have to keep the radv bo list for buffers that are shared
externally.
This improves Talos on lowest quality setting (so as CPU bound as
possible) by ~10% if the global bo list is enabled.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
In radv we had a separate flag to actually use it + an env option
to experimentally use it.
The common code setting has_local_buffers to false of course broke
that experimental option.
Also the "enable on APU" did not make sense for RADV as it is still
disabled by default.
Fixes: b21a4efb55 "radv/winsys: allow local BOs on APUs"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Seems it was missing the "/ ma + 0.5" and the order was swapped.
Fixes: a1a2a8dfda ('nir: add AMD_gcn_shader extended instructions')
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
v2: remove & operator in a couple of memsets
add some memsets
v3: fixup lima
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v2)
ac_build_image_opcode() casts if necessary and buffer images
are casted too.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Should be safe to enable as all instructions seem to support 16-bit.
Unfortunately, there is no CTS test.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This patch does it as late as possible so the potential extra
basic blocks don't inhibit other optimizations.
Big thanks to Jason for writing the lowering pass.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When I implemented opt_if_loop_last_continue() I had restricted
this pass from moving other if-statements inside the branch opposite
the continue. At the time it was causing a bunch of spilling in
shader-db for i965.
However Samuel Pitoiset noticed that making this pass more aggressive
significantly improved the performance of Doom on RADV. Below are
the statistics he gathered.
28717 shaders in 14931 tests
Totals:
SGPRS: 1267317 -> 1267549 (0.02 %)
VGPRS: 896876 -> 895920 (-0.11 %)
Spilled SGPRs: 24701 -> 26367 (6.74 %)
Code Size: 48379452 -> 48507880 (0.27 %) bytes
Max Waves: 241159 -> 241190 (0.01 %)
Totals from affected shaders:
SGPRS: 23584 -> 23816 (0.98 %)
VGPRS: 25908 -> 24952 (-3.69 %)
Spilled SGPRs: 503 -> 2169 (331.21 %)
Code Size: 2471392 -> 2599820 (5.20 %) bytes
Max Waves: 586 -> 617 (5.29 %)
The codesize increases is related to Wolfenstein II it seems largely
due to an increase in phis rather than the existing jumps.
This gives +10% FPS with Doom on my Vega56.
Rhys Perry also benchmarked Doom on his VEGA64:
Before: 72.53 FPS
After: 80.77 FPS
v2: disable pass on non-AMD drivers
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This fixes the following LLVM error when using RADV_DEBUG=checkir:
Intrinsic name not mangled correctly for type arguments! Should be: llvm.amdgcn.buffer.atomic.add.i32
i32 (i32, <4 x i32>, i32, i32, i1)* @llvm.amdgcn.buffer.atomic.add
The cmpswap operation still uses the old intrinsic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Figure it out once in the build system, then just use that all over the place.
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>