Yonggang Luo
a144f3f80c
radv: Getting radeon_icd to be generated properly on win32
...
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com >
Reviewed-by: Eric Engestrom <eric@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18747 >
2022-09-22 17:54:24 +00:00
Daniel Schürmann
cd36a29759
aco/optimizer: change inverse_comparison in-place
...
This avoids creating a second comparison which is more expensive
than just using the existing s_not.
Totals from 1089 (0.81% of 134913) affected shaders: (GFX10.3)
VGPRs: 50472 -> 50184 (-0.57%)
CodeSize: 4724692 -> 4760824 (+0.76%); split: -0.03%, +0.79%
MaxWaves: 23964 -> 24012 (+0.20%)
Instrs: 859588 -> 859687 (+0.01%); split: -0.11%, +0.12%
Latency: 10674653 -> 10650353 (-0.23%); split: -0.41%, +0.18%
InvThroughput: 1752987 -> 1750238 (-0.16%); split: -0.20%, +0.04%
VClause: 20921 -> 20872 (-0.23%); split: -0.68%, +0.45%
SClause: 31417 -> 31550 (+0.42%)
Copies: 69428 -> 68738 (-0.99%); split: -1.52%, +0.53%
PreSGPRs: 48033 -> 49649 (+3.36%)
PreVGPRs: 44490 -> 43699 (-1.78%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253 >
2022-09-22 12:35:11 +00:00
Timur Kristóf
c8445c1691
aco: Change inverse-comparison optimization to work with s_not
...
Some time ago we stopped using s_andn2 with exec for boolean NOT.
The reasoning behind that change was that those booleans will be
always ANDed with exec when necessary.
This inhibited the inverse-comparison optimization in most cases
which is fixed by this patch.
Fossil DB stats on Navi 21:
Totals from 12251 (9.08% of 134913) affected shaders:
VGPRs: 801744 -> 802016 (+0.03%); split: -0.00%, +0.04%
SpillSGPRs: 8863 -> 8893 (+0.34%)
CodeSize: 100593244 -> 100370684 (-0.22%); split: -0.22%, +0.00%
MaxWaves: 204994 -> 204948 (-0.02%); split: +0.00%, -0.02%
Instrs: 18717001 -> 18668965 (-0.26%); split: -0.26%, +0.00%
Latency: 263255046 -> 262874896 (-0.14%); split: -0.16%, +0.02%
InvThroughput: 52760249 -> 52721736 (-0.07%); split: -0.08%, +0.01%
VClause: 329631 -> 329680 (+0.01%); split: -0.03%, +0.04%
SClause: 681563 -> 681435 (-0.02%); split: -0.02%, +0.00%
Copies: 1331612 -> 1372446 (+3.07%); split: -0.03%, +3.10%
Branches: 548325 -> 548301 (-0.00%)
PreSGPRs: 911317 -> 909700 (-0.18%)
PreVGPRs: 766279 -> 767070 (+0.10%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253 >
2022-09-22 12:35:11 +00:00
Daniel Schürmann
cf5f9854bc
aco/optimizer: optimize s_and(exec, s_and(x, y)) more aggressively
...
It is enough if either of the two operands is respecting the
current exec mask.
Totals from 1680 (1.25% of 134913) affected shaders: (GFX10.3)
CodeSize: 3929436 -> 3922372 (-0.18%); split: -0.18%, +0.00%
Instrs: 730305 -> 728536 (-0.24%); split: -0.24%, +0.00%
Latency: 6839314 -> 6835154 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 1371351 -> 1371267 (-0.01%); split: -0.01%, +0.00%
SClause: 32819 -> 32802 (-0.05%); split: -0.09%, +0.04%
Copies: 33264 -> 33271 (+0.02%); split: -0.01%, +0.03%
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253 >
2022-09-22 12:35:11 +00:00
Daniel Schürmann
79a8e8b5b2
aco/optimizer: do can_eliminate_and_exec() optimization later
...
This will allow to optimize s_not(v_cmp()) safely.
Totals from 1024 (0.76% of 134913) affected shaders: (GFX10.3)
CodeSize: 5695424 -> 5701860 (+0.11%); split: -0.00%, +0.11%
Scratch: 242688 -> 241664 (-0.42%)
Instrs: 1040656 -> 1041635 (+0.09%); split: -0.00%, +0.09%
Latency: 16842282 -> 16922790 (+0.48%); split: -0.06%, +0.54%
InvThroughput: 4772728 -> 4810868 (+0.80%); split: -0.10%, +0.90%
VClause: 20013 -> 20000 (-0.06%); split: -0.12%, +0.05%
Copies: 115057 -> 114384 (-0.58%); split: -1.22%, +0.63%
Branches: 34531 -> 34532 (+0.00%); split: -0.00%, +0.01%
PreSGPRs: 46263 -> 46267 (+0.01%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18253 >
2022-09-22 12:35:11 +00:00
Yonggang Luo
c74595ead3
radv/r600/clover: Getting libelf to be optional
...
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com >
Reviewed-by: Jesse Natalie <jenatali@microsoft.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18503 >
2022-09-22 05:07:35 +00:00
Georg Lehmann
84c0529258
aco: Unswizzle v_pk_fma_f16 literals to produce more v_pk_fmac_f16.
...
No Foz-DB difference, but it reduces code size in some angle shaders.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18676 >
2022-09-21 23:04:06 +00:00
Samuel Pitoiset
578e30f3e6
radv: make sure to initialize wd_switch_on_eop before checking its value
...
This is technically not a bug because it might just trigger
SWITCH_ON_EOI when streamout is used and I think it was fine.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7303
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18700 >
2022-09-21 18:20:58 +00:00
Timur Kristóf
ed76471d08
aco/optimizer_postRA: Clarify terminology.
...
Change the terminology around the post-RA optimizer, primarily this
changes the use of "clobbered" to "overwritten" to avoid confusion,
and it removes some redundant states.
Proposed for backporting to stable, to make sure it is easy to
backport further fixes (if any) on top of this.
Fossil DB stats unaffected on Navi 21.
Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488 >
2022-09-21 16:56:57 +00:00
Timur Kristóf
a8dd07518c
aco/optimizer_postRA: Fix logical control flow handling.
...
Change reset_block() so it only considers the logical
predecessors for VGPRs. Relevant for some optimizations
across loops.
This commit fixes an assertion failure which was triggered
by Zink in a piglit test.
Fossil DB stats unaffected on Navi 21.
Fixes: 2e56e23420
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488 >
2022-09-21 16:56:57 +00:00
Timur Kristóf
2eab413cf7
aco/optimizer_postRA: Don't assume all operand registers were written by same instr.
...
This assumption is no longer true since the post-RA optimizer
can work across blocks. It is now possible that some control
flow paths overwrite some but not all registers of an operand.
This commit may prevent invalid optimizations and/or assertion
failures (on debug builds).
Fossil DB stats unaffected on Navi 21.
Fixes: 0e4747d3fb
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488 >
2022-09-21 16:56:57 +00:00
Timur Kristóf
63063dd5ce
aco/optimizer_postRA: Mark a register overwritten when predecessors disagree.
...
Affects blocks whose some (but not all) predecessors overwrite a register.
This commit fixes glitches in some games which regressed because of the
improved SCC no-compare optimization.
Fossil DB stats on Navi 21:
Totals from 2816 (2.09% of 134906) affected shaders:
CodeSize: 24224276 -> 24241580 (+0.07%)
Instrs: 4570595 -> 4574921 (+0.09%)
Latency: 53680256 -> 53693655 (+0.02%); split: -0.00%, +0.02%
InvThroughput: 9829289 -> 9830573 (+0.01%)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7257
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7305
Fixes: 2e56e23420
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488 >
2022-09-21 16:56:57 +00:00
Timur Kristóf
5e80edfa78
aco/tests: Add post-RA SCC no-compare tests cases with control flow.
...
- scc_nocmp_across_cf: passes
- scc_nocmp_across_cf_partially_overwritten: fails (fixed later)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488 >
2022-09-21 16:56:56 +00:00
Timur Kristóf
d4b3f81d94
aco/tests: Add post-RA DPP test cases with control flow.
...
These are intended to make sure that the post-RA optimizer works
correctly across control flow. The new tests emit a divergent
if-else branch (with full logical+linear CFG).
- dpp_across_cf:
Simple case of DPP optimizable across control flow. Should pass.
- dpp_across_cf_overwritten:
Similar case but the DPP source register is overwritten in CF.
This shows a bug so the test fails now (will be fixed).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488 >
2022-09-21 16:56:56 +00:00
Timur Kristóf
d7cd49d54b
aco/tests: Add post-RA optimizer testcase for partially overwritten VCC.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18488 >
2022-09-21 16:56:56 +00:00
Timur Kristóf
2274b26dfb
ac/nir/ngg: Don't initialize same-invocation mesh shader outputs.
...
This is actually not necessary and generates a lot of superfluous
instructions at every phi (setting the value to zero).
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18566 >
2022-09-21 16:14:59 +00:00
Timur Kristóf
697ea02202
ac/nir/ngg: Don't use LDS for same-invocation indices and cull outputs.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18566 >
2022-09-21 16:14:59 +00:00
Timur Kristóf
bb4bdba17e
radv: Remove dead shader temps after linking.
...
Prevent nir_lower_scratch accidentally turning these dead variables
into scratch. This can especially happen to arrayed outputs of
eg. tess control or mesh shaders, which become large shader temps.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18566 >
2022-09-21 16:14:59 +00:00
Timur Kristóf
3e6ad428b6
radv: Change max preferred task workgroup invocations to 64.
...
This was accidentally left at the maximum possible value.
However I now tested this with the cadscene demo app and there is
hardly any benefit from going above 64. Set it to 64 for now.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18566 >
2022-09-21 16:14:59 +00:00
Daniel Schürmann
98e3c446d8
aco/optimizer: disallow can_eliminate_and_exec() with s_not
...
Totals from 295 (0.22% of 134913) affected shaders: (GFX10.3)
CodeSize: 1016564 -> 1016896 (+0.03%); split: -0.05%, +0.09%
Instrs: 187659 -> 187724 (+0.03%); split: -0.08%, +0.11%
Latency: 2839516 -> 2839541 (+0.00%); split: -0.01%, +0.01%
Copies: 12301 -> 12305 (+0.03%); split: -0.01%, +0.04%
PreSGPRs: 10266 -> 10268 (+0.02%)
Closes : #7024
Cc: mesa-stable
Tested-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18722 >
2022-09-21 15:33:43 +00:00
Georg Lehmann
94b5225c9b
aco: Use v_fmaak/v_fmamk if two operands are the same literal.
...
Foz-DB Navi21:
Totals from 5744 (4.26% of 134913) affected shaders:
VGPRs: 237128 -> 237056 (-0.03%); split: -0.04%, +0.01%
CodeSize: 16654484 -> 16620668 (-0.20%); split: -0.23%, +0.03%
MaxWaves: 152838 -> 152846 (+0.01%)
Instrs: 3063214 -> 3058572 (-0.15%); split: -0.17%, +0.02%
Latency: 23935195 -> 23934827 (-0.00%); split: -0.03%, +0.03%
InvThroughput: 5478562 -> 5478160 (-0.01%); split: -0.01%, +0.01%
VClause: 60432 -> 60435 (+0.00%); split: -0.02%, +0.03%
SClause: 121032 -> 120896 (-0.11%); split: -0.20%, +0.09%
Copies: 147865 -> 143144 (-3.19%); split: -3.59%, +0.40%
PreSGPRs: 195722 -> 195661 (-0.03%); split: -0.06%, +0.03%
PreVGPRs: 182849 -> 182787 (-0.03%)
Foz-DB Vega10:
Totals from 5290 (3.92% of 135041) affected shaders:
SGPRs: 357952 -> 359616 (+0.46%); split: -0.11%, +0.57%
VGPRs: 204048 -> 203928 (-0.06%); split: -0.08%, +0.02%
CodeSize: 14043176 -> 14003100 (-0.29%); split: -0.29%, +0.00%
MaxWaves: 39401 -> 39398 (-0.01%); split: +0.01%, -0.02%
Instrs: 2636739 -> 2631246 (-0.21%); split: -0.21%, +0.00%
Latency: 25264088 -> 25256482 (-0.03%); split: -0.05%, +0.02%
InvThroughput: 12039643 -> 12039346 (-0.00%); split: -0.00%, +0.00%
VClause: 55603 -> 55584 (-0.03%); split: -0.04%, +0.00%
SClause: 101577 -> 101342 (-0.23%); split: -0.30%, +0.07%
Copies: 213344 -> 207929 (-2.54%); split: -2.58%, +0.05%
Branches: 34053 -> 34054 (+0.00%)
PreSGPRs: 172405 -> 172260 (-0.08%)
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18645 >
2022-09-21 12:16:47 +00:00
Samuel Pitoiset
f4ec8e1ad5
radv: emit the rasterization samples through an user SGPR if needed
...
When the main FS needs sample positions and the number of samples
isn't known at compile time with GPL.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18677 >
2022-09-21 10:30:33 +00:00
Samuel Pitoiset
deb2dccc75
radv: add barycentric_at_sample lowering when the number of samples is dynamic
...
Use two different paths (static vs dynamic) to avoid running more NIR
pass to remove dead CF code when static is used.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18677 >
2022-09-21 10:30:33 +00:00
Samuel Pitoiset
68bb58a46e
nir,radv: pass the number of samples to load_sample_positions_amd
...
This will be used to lower it when it's dynamic.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18677 >
2022-09-21 10:30:33 +00:00
Samuel Pitoiset
c20d5ee3c2
radv: lower nir_load_rasterization_samples_amd in ABI
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18677 >
2022-09-21 10:30:33 +00:00
Samuel Pitoiset
2a0e4b5ef7
radv: declare shader arguments for the number of samples for FS
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18677 >
2022-09-21 10:30:33 +00:00
Samuel Pitoiset
f85b7e294b
radv: add radv_pipeline_key::dynamic_rasterization_samples
...
With GPL, it's possible to build the main FS without the multisample
state, but the number of rasterization samples is required for
lowering interpolateAtSample(). In this rare situation, the number of
samples will be passed through a new user SGPR.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18677 >
2022-09-21 10:30:33 +00:00
Samuel Pitoiset
45d4ee91e0
radv: constify radv_lookup_user_sgpr()
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18677 >
2022-09-21 10:30:33 +00:00
Samuel Pitoiset
923a864d94
radv: acquire pstate on-demand when capturing with RGP
...
AMDGPU pstate is per-device, not per Vulkan logical devices. The same
AMDGPU device is shared accross logical devices because the driver
creates only one winsys per fd. The kernel only allows one context
at a time per AMDGPU device, otherwise it returns -EBUSY.
Fixes this by acquiring pstate on-demand to avoid this multiple
logical device problem.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17712 >
2022-09-21 09:50:18 +00:00
Samuel Pitoiset
f1566ad500
radv: rename radv_thread_trace_set_pstate() to radv_device_set_pstate()
...
Setting pstate is used for RGP captures and performance counters, so
this name is more generic. Also make it non static.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17712 >
2022-09-21 09:50:18 +00:00
Yonggang Luo
209a89e51d
aco: Convert to use u8 literal for Unicode character to fixes msvc warning
...
Warning:
aco_register_allocation.cpp(383): warning C4819: The file contains a character that cannot be represented in the current code page (0). Save the file in Unicode format to prevent data loss
This warning was treated as error with compiling with msvc
u8 is belongs to c11 standard so it's safe to use it
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18682 >
2022-09-20 18:40:50 +00:00
Yonggang Luo
b70e92fe04
radv: Remove the redundant #include <gelf.h> and #include <libelf.h> in ac_binary.c
...
It's not access these two header in the source code
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18682 >
2022-09-20 18:40:50 +00:00
Friedrich Vock
a418ab6654
radv: Correct accel struct header size
...
The size was changed when adding metadata but not updated here.
Fixes: 07eceb4f ("radv: Add metadata to acceleration structures")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18680 >
2022-09-20 14:20:00 +00:00
Samuel Pitoiset
19eec024d2
radv,aco: do not compact MRTs if the pipeline uses a PS epilog
...
We can't detect color attachment without exports when compiling a PS
epilog, so we can't compact MRTs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18514 >
2022-09-20 13:12:49 +00:00
Rhys Perry
6df5ff7f19
aco: DCE ra_ctx::defs_done
...
This was used to distinguish definitions fixed before and during RA, but
it seems it isn't used anymore.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18547 >
2022-09-20 12:24:03 +00:00
Samuel Pitoiset
0f88f57223
radv: allow to build the main FS in a graphics pipeline library
...
Corner cases like implicit gl_PrimitiveID are currently broken and
will be fixed later, but the general case should work.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18516 >
2022-09-20 11:53:38 +00:00
Samuel Pitoiset
e529745be3
radv: do not link shaders when the next stage is unknown
...
With GPL, it's possible to build the pre-rasterization stages separately
from the fragment stage. Implicit IO (like gl_PrimitiveID) between the
last pre-rast stage and the FS will be addressed later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18516 >
2022-09-20 11:53:38 +00:00
Marcin Ślusarz
037404b441
nir, anv, hasvk, radv: pull uses_wide_subgroup_intrinsics into shader_info
...
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18504 >
2022-09-20 10:19:21 +00:00
Samuel Pitoiset
704ef1fd3b
radv,aco: lower barycentric_at_sample in NIR
...
fossils-db (NAVI21):
Totals from 158 (0.12% of 134913) affected shaders:
CodeSize: 569456 -> 568824 (-0.11%)
Only Control seems affected.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18615 >
2022-09-20 09:52:37 +00:00
Samuel Pitoiset
9f0b4da875
radv: run nir_opt_cse before lowering FS intrinsics
...
Otherwise, there might be redundant barycentric_at_sample intrinsics
that will be lowered and this will increase code size.
No fossils-db changes.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18615 >
2022-09-20 09:52:37 +00:00
Samuel Pitoiset
7e433e25c8
radv: add nir_intrinsic_load_sample_positions_amd in the ABI
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18615 >
2022-09-20 09:52:37 +00:00
Bas Nieuwenhuizen
266fe31666
ac/surface: Fix some warnings.
...
../mesa/src/amd/common/ac_surface.c:2324:48: warning: implicit conversion from enumeration type 'AddrResourceType' (aka 'enum _AddrResourceType') to different enumeration type 'enum gfx9_resource_type' [-Wenum-conversion]
surf->u.gfx9.resource_type = AddrSurfInfoIn.resourceType;
~ ~~~~~~~~~~~~~~~^~~~~~~~~~~~
../mesa/src/amd/common/ac_surface.c:3046:38: warning: implicit conversion from enumeration type 'const enum gfx9_resource_type' to different enumeration type 'AddrResourceType' (aka 'enum _AddrResourceType') [-Wenum-conversion]
input.resourceType = surf->u.gfx9.resource_type;
~ ~~~~~~~~~~~~~^~~~~~~~~~~~~
../mesa/src/amd/common/ac_surface.c:3069:38: warning: implicit conversion from enumeration type 'const enum gfx9_resource_type' to different enumeration type 'AddrResourceType' (aka 'enum _AddrResourceType') [-Wenum-conversion]
input.resourceType = surf->u.gfx9.resource_type;
The enums are compatible so lets just add some casts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18694 >
2022-09-20 09:25:09 +00:00
Vinson Lee
97d307406b
radv: Use count_tes_user_sgprs return value.
...
Fix defect reported by Coverity Scan.
Useless call (USELESS_CALL)
side_effect_free: Calling count_tes_user_sgprs(key) is only useful for
its return value, which is ignored.
Fixes: 8253ec3855 ("radv: add shader arguments for dynamic patch control points")
Signed-off-by: Vinson Lee <vlee@freedesktop.org >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18659 >
2022-09-20 06:14:47 +00:00
Qiang Yu
4d15a06dee
radeonsi: implement nir_intrinsic_load_streamout_buffer_amd
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Signed-off-by: Qiang Yu <yuq825@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17456 >
2022-09-20 05:41:50 +00:00
Qiang Yu
8049edb653
radeonsi: implement nir_intrinsic_load_num_vertices_per_primitive_amd
...
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Signed-off-by: Qiang Yu <yuq825@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17456 >
2022-09-20 05:41:50 +00:00
Bas Nieuwenhuizen
b2972cf410
radv: Add scratch stack to reduce LDS stack in RT traversal.
...
The current stack size is a significant limiter for occupancy, and
hence we need smaller stacks in LDS.
Rhys earlier had a patch that just put the N entries closest to the
root in LDS and the rest in scratch. However, this is not ideal for
performance as most of the activity is happening away from the root,
near the leaves. Of course we can't just switch it around, as the
leaf activity likely isn't happening all the way at the end of the
stack.
So what we do is make the LDS stack kinda a ringbuffer by always
accessing it using the stack index modulo the buffer size (always
a power of two so we can efficiently mask). If we then do not have
free space in this buffer we evict the entries closest to the root
to scratch and if we hit the "bottom" of the LDS space we load from
scratch.
Some rough perf numbers for indication with Q2RTX:
| evicting | LDS entries | perf |
|----------|-------------|------|
| no | 76 | 55% |
| no | 32 | 100% |
| no | 24 | 105% |
| yes | 32 | 95% |
| yes | 16 | 100% |
| yes | 8 | 90% |
| yes | 4 | 75% |
(For the case with 4 entries we need to do some extra accounting as
a full batch may not be available to evict)
So an obvious choice is to use a stack of 16 entries.
One might wonder if Q2RTX perf is mainly good due to BVHs with very
little geometry and hence low depth, so I also did some profiling
with control. This is done with RGP instruction timing, so this is
instructions executed not weighted for enabled masks, i.e. divergence
effects included.
| game | LDS entries | scratch action | fraction of iterations |
|---------|-------------|----------------|------------------------|
| Control | 8 | store | 10.3% |
| Control | 8 | load | 34.8% |
| Control | 16 | store | 0.58% |
| Control | 16 | load | 2.62% |
| Q2RTX | 16 | store | 1.00% |
| Q2RTX | 16 | load | 3.07% |
So Q2RTX doesn't seem like an unreasonably good case for this
algorithm.
On the implementation side, we can always place the scratch stack at
address 0 by just reserving the scratch space, and in the case of fixed
callstack size moving that up. In the dynamic case the dynamic stack
base already takes any reserved scratch space into account.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18541 >
2022-09-20 01:39:20 +00:00
Rhys Perry
7d26fafacf
radv: fix dynamic RT stack size with VGPR spilling
...
VGPR spilling might cause VGPRs to be spilled at scratch offset 0, so we
can't use that.
fossil-db (Sienna Cichlid, Q2RTX and Control):
Totals from 4 (0.26% of 1524) affected shaders:
Instrs: 8734 -> 8737 (+0.03%)
CodeSize: 48492 -> 48504 (+0.02%)
Latency: 384375 -> 384369 (-0.00%)
InvThroughput: 256250 -> 256246 (-0.00%)
Copies: 1312 -> 1313 (+0.08%)
Branches: 256 -> 258 (+0.78%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18541 >
2022-09-20 01:39:20 +00:00
Bas Nieuwenhuizen
ca04f968d9
radv: Use nested ifs for pushing child nodes in traversal loop.
...
Avoids a bunch of overhead costs if the previous child was empty
already.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18538 >
2022-09-20 01:29:05 +02:00
Bas Nieuwenhuizen
91a4cd26b3
radv: Use constant for ray traversal exit condition.
...
Make the stack base ssa def dead in the loop, can save a register.
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18538 >
2022-09-20 01:29:04 +02:00
Bas Nieuwenhuizen
40a235c9a8
Revert "radv/rt: use derefs for the traversal stack"
...
This reverts commit 3750663c72 .
Doing things with derefs adds extra instructions for multiplying the
index with the element size, e.g.
BBF0_13:
s_waitcnt vmcnt(0)
v_mov_b32_e32 v27, v55
s_mov_b32 s23, exec_lo
v_cmpx_ne_i32_e32 -1, v27
s_cbranch_execz _L14
BBF0_14:
v_lshlrev_b32_e32 v48, 2, v46 <--
ds_write_b32 v48, v27
v_add_nc_u32_e32 v46, 32, v46
_L14:
s_mov_b32 exec_lo, s23
v_mov_b32_e32 v27, v54
s_mov_b32 s23, exec_lo
v_cmpx_ne_i32_e32 -1, v27
s_cbranch_execz _L15
BBF0_15:
v_lshlrev_b32_e32 v48, 2, v46 <--
ds_write_b32 v48, v27
v_add_nc_u32_e32 v46, 32, v46
On Q2RTC indirect lighting this saves about 2.3 VALU instructions
per loop iteration, which is ~4% of VALU instructions (we're at
58 per iteration now according to RGP).
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18538 >
2022-09-20 01:29:04 +02:00