Daniel Schürmann
d887eb141b
aco: propagate SGPRs into VOP1 instructions early.
...
This helps DCE. We should reconsider our optimization order
or maybe do the dead code analysis twice
Totals from 106 (0.08% of 136546) affected shaders (RAVEN):
SGPRs: 7184 -> 7152 (-0.45%)
CodeSize: 736912 -> 736052 (-0.12%)
Instrs: 145739 -> 145509 (-0.16%)
Cycles: 2085344 -> 2084268 (-0.05%)
VMEM: 14819 -> 14807 (-0.08%)
SMEM: 7109 -> 7100 (-0.13%); split: +0.04%, -0.17%
SClause: 5383 -> 5385 (+0.04%)
Copies: 13290 -> 13189 (-0.76%)
PreSGPRs: 5265 -> 5221 (-0.84%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6777 >
2020-10-14 15:31:38 +00:00
Samuel Pitoiset
20d73a9049
aco: adjust an assertion about the wavesize in emit_gfx10_wave64_bpermute()
...
This gets rids of one more use of radv_shader_info.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061 >
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
112e66fa09
aco: compute the CS workgroup size from the shader NIR info
...
cs.block_size is copied from cs.local_size during the shader info pass.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061 >
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
e3e8d13ada
radv: move compiler statistics to ACO
...
They are really specific to ACO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061 >
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
97afb2a0a9
aco: remove unused radv_shader.h includes
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061 >
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
408195ec53
aco: remove useless occurences of radv_nir_compiler_options
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061 >
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
8a6f60fc6b
aco: remove stub lower_wqm() prototype
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7061 >
2020-10-14 15:09:34 +00:00
Samuel Pitoiset
48b988e35f
radv: fix ignoring the vertex attribute stride if set as dynamic
...
The vertex attribute stride should be ignored, so make sure it's
initialized to zero if dynamic to avoid computing a wrong offset.
The fact that each element of pStrides must be greater than or equal
to the maximum extent of all vertex input attributes fetched saves us
one user SGPR for the dynamic stride.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3627
Cc: 20.2
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7101 >
2020-10-14 12:29:39 +00:00
James Park
28d02b9d3e
ac,amd/llvm,radv: Initialize structs with {0}
...
Necessary to compile with MSVC.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7123 >
2020-10-14 12:15:23 +00:00
Samuel Pitoiset
b84d1a0c42
radv/aco: disable NGG GS support because it randomly hangs the GPU
...
Disable ACO NGG GS until the random GPU hangs are fixed
(one CTS run == one GPU hang here). No hangs so far after
5 full CTS runs with this disabled.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Acked-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7108 >
2020-10-14 13:52:42 +02:00
James Park
7758664788
radv: Only close local_fd when valid
...
Necessary when drm_device is bypassed.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7119 >
2020-10-13 22:56:31 +00:00
James Park
1026e2ac0f
radv: Increased const usage
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7119 >
2020-10-13 22:56:31 +00:00
James Park
1b551857f9
amd/addrlib: Fix warning list for msvc
...
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7119 >
2020-10-13 22:56:31 +00:00
Rhys Perry
c122315702
aco: fix get_ssbo_size with a vgpr resource
...
The result of load_vulkan_descriptor is passed directly to get_ssbo_size.
This caused convert_pointer_to_64_bit() to skip creating a
v_readfirstlane_b32 if it was necessary.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Fixes: 05b6612b4e ('radv: do not lower UBO/SSBO access to offsets')
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3628
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7095 >
2020-10-13 14:20:28 +00:00
Rhys Perry
bcf7a70008
aco: use nir_opt_uniform_atomics
...
Significantly improves performance of a Control compute shader. Also seems
to increase FPS at the very start of the game by ~9% (RX 580, 1080p,
medium settings, no MSAA).
fossil-db (Navi):
Totals from 315 (0.23% of 135946) affected shaders:
SGPRs: 18296 -> 18336 (+0.22%); split: -0.26%, +0.48%
VGPRs: 11856 -> 11844 (-0.10%); split: -0.81%, +0.71%
CodeSize: 2233800 -> 2457508 (+10.01%)
MaxWaves: 4506 -> 4497 (-0.20%); split: +0.04%, -0.24%
Instrs: 438766 -> 486215 (+10.81%); split: -0.00%, +10.81%
Cycles: 7880180 -> 8963340 (+13.75%); split: -0.00%, +13.75%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:21 +00:00
Rhys Perry
e1120f274f
nir: move divergence analysis options to nir_shader_compiler_options
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:21 +00:00
Rhys Perry
bb5c0ba0d2
aco: implement last_invocation
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:21 +00:00
Rhys Perry
8850a63161
radv/aco,nir/lower_subgroups: don't lower elect
...
ACO can implement this better.
fossil-db (Navi):
Totals from 33 (0.02% of 135946) affected shaders:
SGPRs: 1736 -> 1744 (+0.46%)
VGPRs: 1680 -> 1656 (-1.43%)
CodeSize: 246160 -> 245916 (-0.10%); split: -0.14%, +0.04%
MaxWaves: 449 -> 461 (+2.67%)
Instrs: 48301 -> 48266 (-0.07%); split: -0.12%, +0.05%
Cycles: 469740 -> 469240 (-0.11%); split: -0.18%, +0.08%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:20 +00:00
Rhys Perry
36da9c4aa2
aco: implement elect
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:20 +00:00
Rhys Perry
bf77f539ee
aco: optimize more uniform reductions/scans
...
Uniform atomic optimization will create these.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6558 >
2020-10-13 12:47:20 +00:00
Samuel Pitoiset
b9ca4923d6
aco: implement missing nir_op_unpack_half_2x16_split_{x,y}_flush_to_zero
...
SPIRV->NIR emits nir_op_unpack_half_2x16_flush_to_zero instead of
nir_op_unpack_half_2x16 if the shader enables denorm flush to zero
for 16-bit floating point.
This doesn't fix anything known and CTS doesn't have tests.
Fixes: 56d9bcdded ("radv: enable more float_controls features")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6939 >
2020-10-13 08:35:22 +02:00
Eric Engestrom
c02e933de4
radv: add missing u_atomic.h include
...
Fixes: 7568c97df1 ("radv: Use atomics to read query results.")
Signed-off-by: Eric Engestrom <eric@engestrom.ch >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7050 >
2020-10-12 23:24:18 +02:00
Bas Nieuwenhuizen
1fb3e1fb70
radv: Fix mipmap extent adjustment on GFX9+.
...
With arrays we really have to use the correct size for the base
mipmap to get the right array pitch. In particular, using
surf_pitch results in pitch that is bigger than the base mipmap
and hence results in wrong pitches computed by the HW.
It seems that on GFX9 this has mostly been hidden by the epitch
provided in the descriptor but this is not something we do on
GFX10 anymore.
Now this has some draw-backs:
1. normalized coordinates don't work
2. Bounds checking uses slightly bigger bounds.
2 mostly is not an issue as we still ensure that they're within
the texture memory and not overlapping other layers/mips, but
we can't properly ignore writes.
1 is kinda dead in the water ... On the other hand I'd argue that
using normalized coords & a filter for sampling a block view of
a compressed format is extraordinarily useless.
The old method we employed already had these drawbacks for everything
except the base miplevel of the imageview.
AFAICT this is the same tradeoff AMDVLK makes and no CTS test hits
this. (once it does I think the HW is dead in the water ... Only
workaround I can think of is shader processing which is hard because
we don't know texture formats at compile time.)
I also removed the extra calculations when the image has only 1 mip
level because they ended up being a no-op in that case.
CC: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2292
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2266
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2483
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2906
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3607
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7090 >
2020-10-12 21:00:00 +00:00
Samuel Pitoiset
66d7bb0f23
radv: fix adjusting vertex alpha
...
AC_FETCH_FORMAT_NONE is not zero... Oops.
Fixes: b0829c6af7 ("radv: replace RADV_ALPHA_ADJUST by AC_FETCH_FORMAT")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7103 >
2020-10-12 19:48:35 +02:00
Samuel Pitoiset
b32a8f83dc
radv: move lower_io_arrays_to_elements before lower_io_to_scalar_early
...
nir_lower_io_arrays_to_elements lowers arrays or matrices to elements,
which ends up to vectors for matrices, but a bunch of IO optimizations
only work for scalars.
Calling it before lower_io_to_scalar_early allows nir_link_opt_varyings
to remove duplicated inputs and replace constant inputs.
fossils-db (Navi10):
Totals from 294 (0.22% of 136546) affected shaders:
CodeSize: 861356 -> 860224 (-0.13%); split: -0.13%, +0.00%
Instrs: 161972 -> 161832 (-0.09%); split: -0.09%, +0.00%
Cycles: 1185680 -> 1185120 (-0.05%); split: -0.05%, +0.00%
SMEM: 31422 -> 31424 (+0.01%)
Copies: 9065 -> 9068 (+0.03%)
Only Talos and Dark Souls 3 are affected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7041 >
2020-10-12 15:01:05 +00:00
Samuel Pitoiset
b0829c6af7
radv: replace RADV_ALPHA_ADJUST by AC_FETCH_FORMAT
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7065 >
2020-10-12 13:13:40 +00:00
Samuel Pitoiset
5000c344cc
ac/llvm: move AC_FETCH_FORMAT to non-LLVM code
...
While we are it, give it a name.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7065 >
2020-10-12 13:13:40 +00:00
Rhys Perry
cf3b638f47
radv: remove RDR2 discard workaround
...
The game appears to use HLSL, so this workaround now lives in
SPIR-V -> NIR.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7062 >
2020-10-12 11:07:39 +00:00
Bas Nieuwenhuizen
875ff8414f
radv/winsys: Expand scope of allbos lock.
...
With us not creating a bo_list anymore, there is a problem if we
delete a buffer between enumerating all buffers and doing the submission.
Also changes this to a rwlock given the wider scope of the things under
lock. (especialy some of the syncobj stuff is now under the lock)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7091 >
2020-10-12 10:55:08 +00:00
Bas Nieuwenhuizen
ea778693bf
radv: Fix event write cmdbuffer allocation when tracing.
...
The trace emit is another 7 words.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7091 >
2020-10-12 10:55:08 +00:00
Samuel Pitoiset
98f538dfca
radv: remove one leftover TODO in the shader info pass
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7022 >
2020-10-12 09:23:26 +02:00
Samuel Pitoiset
cec12d4f98
radv/llvm: reduce LDS size for tess by using NIR IO assigned locations
...
To match ACO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7022 >
2020-10-12 09:23:26 +02:00
Samuel Pitoiset
47e26bf334
radv/llvm: reduce the ESGS itemsize by using NIR IO assigned locations
...
There is no longer gaps in the ESGS ring.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7022 >
2020-10-12 09:23:26 +02:00
Samuel Pitoiset
569b894835
radv/llvm: switch to NIR IO assigned locations
...
To match ACO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7022 >
2020-10-12 09:23:25 +02:00
Samuel Pitoiset
6387341cce
ac/nir: pass the variable location to store_tcs_outputs
...
It's actually simpler for the backend to know the variable location.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7022 >
2020-10-12 09:23:25 +02:00
Samuel Pitoiset
8f8ee5b95b
ac,radv,radeonsi: stop multiplying driver_location by 4
...
It's no longer needed to do that.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7010 >
2020-10-12 08:34:02 +02:00
Samuel Pitoiset
0a90dab6b4
radv/llvm: stop assigning driver_location in NIR->LLVM
...
It's already assigned just after NIR linking shaders.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7010 >
2020-10-12 08:33:57 +02:00
Marek Olšák
a4e4644eff
ac/surface: fix valgrind warnings in DCC retile tile lookups
...
==12920== Conditional jump or move depends on uninitialised value(s)
==12920== at 0x8F39391: util_fast_urem32 (fast_urem_by_const.h:71)
==12920== by 0x8F39391: hash_table_search (hash_table.c:285)
==12920== by 0x8B06D5D: ac_compute_dcc_retile_tile_indices (ac_surface.c:136)
Fixes: a37aeb128d "amd/common: Cache intra-tile addresses for retile map."
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7055 >
2020-10-09 23:13:40 +00:00
Rhys Perry
8e981453ed
radv: use radv_optimize_nir() less in radv_link_shaders()
...
fossil-db (Navi):
Totals from 11 (0.01% of 137413) affected shaders:
CodeSize: 99372 -> 99480 (+0.11%)
Instrs: 19119 -> 19110 (-0.05%)
Cycles: 222144 -> 222000 (-0.06%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6891 >
2020-10-09 15:48:00 +00:00
Rhys Perry
55254f241f
radv: move optimizations in shader_compile_to_nir() to after io_to_scalar
...
This results in at least one less radv_optimize_nir() iteration.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6891 >
2020-10-09 15:47:59 +00:00
Bas Nieuwenhuizen
da132d802b
radv: Set fce metadata correctly on DCC initialization.
...
The fce metadata can always be set to false as we don't care about
the compressed clear color.
Avoiding useless fast clear eliminates improves basemark performance by
1%-1.5%.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7005 >
2020-10-09 13:46:49 +00:00
Timur Kristóf
5ae3656890
aco/ngg: Calculate workgroup size of NGG shaders.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00
Timur Kristóf
61280bb4b6
aco/ngg: Allocate NGG GS space early for const vertex/primitive counts.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00
Timur Kristóf
e8a0409d01
aco/ngg: Use more efficient LDS layout to help reduce bank conflicts.
...
The LLVM backend has a trick which helps reduce LDS bank conflicts
by swizzling the LDS address where each vertex is emitted.
This commit implements the same thing for ACO.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00
Timur Kristóf
9bf92d4357
radv/aco: Enable NGG GS by default.
...
ACO NGG GS now supports everything we need except streamout
(aka. transform feedback), but we don't use NGG anyway when
streamout is needed.
Also add a note to the new features txt.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00
Timur Kristóf
dd73719856
aco/ngg: Add shader query support to NGG GS.
...
In each GS thread, we calculate the number of "real" primitives that
were emitted (points, lines, triangles, not strips). Then we
accumulate the number of "real" primitives emitted by the
entire threadgroup in GDS.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00
Timur Kristóf
df62c8fbea
aco/ngg: Place workgroup barrier outside control flow for NGG GS.
...
Merged shaders have a workgroup barrier which makes sure that
the first half is completed in every wave before the 2nd half
is started.
This barrier is located in divergent control flow, so that waves
that don't have any invocations in the 2nd half can finish as early
as possible. This is problematic for NGG GS because it has more
workgroup barriers after the 2nd half.
So, for NGG GS we need to put the barrier outside
control flow because otherwise the waves that have 0 GS threads
won't be able to wait for the waves which have non-zero GS threads.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00
Timur Kristóf
1129575d5e
aco/ngg: Implement NGG GS output.
...
We store emitted GS vertices in LDS.
Then, at the end of the shader, the emitted vertices are compacted
and each thread loads a single vertex from LDS in order to export
a primitive as needed, and the vertex attributes.
The reason this is done is because there is an impedance mismatch
between how API GS and the NGG HW works. API GS can emit an arbitrary
number of vertices and primites in each thread, but NGG HW can only
export one vertex per thread.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00
Timur Kristóf
62b5012ec3
aco/ngg: Implement workgroup reduce / exclusive scan for NGG GS.
...
This function calculates two things at once:
1. The total number of vertices emitted by the threadgroup.
2. Exclusive scan of emitted vertex count accross the threadgroup.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00
Timur Kristóf
c29e288fb5
aco/ngg: Create LDS layout for NGG GS.
...
For NGG GS, we need to store the following in LDS:
1. The ESGS ring, similarly to legacy ESGS.
2. Emitted vertices from the GS threads.
3. Temporary space used by the workgroup scan.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964 >
2020-10-09 15:26:15 +02:00