Commit Graph

98 Commits

Author SHA1 Message Date
Qiang Yu 67244fc88a aco: remove p_end_with_regs from needs_exact()
ps needs to handle wqm:
1. main part may compute with args from prolog in wqm mode, so
   prolog need to compute these args in wqm mode too.
2. prolog and main part need to end with exact exec, so next
   shader part which inherit previous shader part's exec won't
   do valid job for helper threads

1 need p_end_with_regs to operate in wqm mode and itself can't
be exact, otherwise some move instruction added by it won't be
in wqm mode so helper threads' compute result is not passed to
next shader part as args.

2 is done by p_end_wqm added by finish_program automatically
after p_end_with_regs.

Piglit tests can trigger the problem:

1. gl-2.1-polygon-stipple-fs
  a. ps prolog call discard_if
  b. ps main pass wqm exec to epilog
  c. ps epilog export color for discarded pixel

2. fs-fwidth-color.shader_test
  a. ps prolog need to pass args computed in wqm mode
  b. set p_end_with_regs to exact will end wqm mode before
     the move instructions, so helper threads's result is not
     passed to next shader part

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24973>
2023-10-10 02:36:33 +00:00
Daniel Schürmann 6eaf416f35 aco/insert_exec_mask: Simplify WQM handling (2/2)
by calculating WQM requirements on demand.

No fossil-db changes.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:23 +00:00
Daniel Schürmann 5f66723188 aco/insert_exec_mask: Simplify WQM handling (1/2)
by using p_end_wqm as indicator for when to end WQM mode.

Totals from 10049 (13.12% of 76572) affected shaders: (GFX11)

MaxWaves: 301126 -> 301136 (+0.00%)
Instrs: 7061909 -> 7049272 (-0.18%); split: -0.21%, +0.03%
CodeSize: 37720684 -> 37664244 (-0.15%); split: -0.18%, +0.03%
VGPRs: 357204 -> 357180 (-0.01%); split: -0.13%, +0.12%
Latency: 62757830 -> 62827080 (+0.11%); split: -0.06%, +0.17%
InvThroughput: 8589248 -> 8589963 (+0.01%); split: -0.02%, +0.02%
VClause: 132541 -> 132547 (+0.00%); split: -0.03%, +0.03%
SClause: 322916 -> 322964 (+0.01%); split: -0.04%, +0.05%
Copies: 546446 -> 547657 (+0.22%); split: -0.13%, +0.35%
Branches: 189527 -> 188293 (-0.65%)
PreSGPRs: 332792 -> 332529 (-0.08%); split: -0.08%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:23 +00:00
Daniel Schürmann 45f6d38a76 aco: insert a single p_end_wqm after the last derivative calculation
This new instruction replaces p_wqm.

Totals from 28065 (36.65% of 76572) affected shaders: (GFX11)
MaxWaves: 823922 -> 823952 (+0.00%); split: +0.01%, -0.01%
Instrs: 22221375 -> 22180465 (-0.18%); split: -0.26%, +0.08%
CodeSize: 117310676 -> 117040684 (-0.23%); split: -0.30%, +0.07%
VGPRs: 1183476 -> 1186656 (+0.27%); split: -0.19%, +0.46%
SpillSGPRs: 2305 -> 2302 (-0.13%)
Latency: 176559310 -> 176427793 (-0.07%); split: -0.21%, +0.14%
InvThroughput: 26245204 -> 26195550 (-0.19%); split: -0.26%, +0.07%
VClause: 368058 -> 369460 (+0.38%); split: -0.21%, +0.59%
SClause: 857077 -> 842588 (-1.69%); split: -2.06%, +0.37%
Copies: 1245650 -> 1249434 (+0.30%); split: -0.33%, +0.63%
Branches: 394837 -> 396070 (+0.31%); split: -0.01%, +0.32%
PreSGPRs: 1019139 -> 1019567 (+0.04%); split: -0.02%, +0.06%
PreVGPRs: 925739 -> 931860 (+0.66%); split: -0.00%, +0.66%

Changes are due to scheduling and re-enabling cross-lane optimizations.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:23 +00:00
Daniel Schürmann 0907b53740 aco/insert_exec_mask: set Exact mode after p_discard_if when necessary
Fixes: 5e9df85b1a ('aco: optimize discard_if when WQM is not needed afterwards')
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Rhys Perry 41b6020ff3 aco: remove fast path in insert_exec_mask's process_instructions
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25038>
2023-09-14 09:25:22 +00:00
Samuel Pitoiset 37aa6d25e1 aco: ensure to initialize exec manually for non-monolithic {VS,TES}/GS on GFX9+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24862>
2023-08-25 10:22:41 +00:00
Samuel Pitoiset 196b355db6 aco: ensure to initialize exec manually for VS as LS on GFX9+
When VS and TCS are compiled separately with shader object on GFX9+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24697>
2023-08-25 07:22:04 +00:00
Qiang Yu 85d9646288 aco: add p_end_with_regs pseudo instruction
Used by radeonsi shader parts to pass args from one part to another.
It has variable number of operands to reserve fixed registers with
wanted value.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24442>
2023-08-16 02:27:45 +00:00
Timur Kristóf 05928f4200 aco: Use ac_hw_stage instead of aco-specific HWStage.
The new ac_hw_stage is going to be used by drivers as well.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23597>
2023-06-23 12:49:04 +00:00
Eric Engestrom 6b21653ab4 aco: reformat according to its .clang-format
Signed-off-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23253>
2023-06-16 19:59:52 +00:00
Friedrich Vock 9de8134410 aco: Fix assert in insert_exec_mask
This assert would trigger on unconditional demotes, because the demotes
don't remove the mask_type_global flag from the exec mask.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23594>
2023-06-12 14:20:28 +00:00
Timur Kristóf 8e9d269da6 aco: Don't use nir_selection_control in aco_ir.
We don't want to rely on any NIR structures in ACO, because
we would like to avoid the need to include nir.h in aco_ir.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22241>
2023-04-10 20:01:28 +00:00
Daniel Schürmann caec48529b aco/insert_exec_mask: allow for disconnected CFG
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20853>
2023-03-12 18:07:18 +00:00
Timur Kristóf 81620fc7b0 aco: Enable constant exec mask based optimization on compute shaders.
We know for sure exec is initially -1 when the shader always has full subgroups.

Fossil DB stats on GFX11:
Totals from 3884 (2.88% of 134913) affected shaders:
SpillSGPRs: 1673 -> 1697 (+1.43%); split: -1.67%, +3.11%
SpillVGPRs: 2316 -> 2310 (-0.26%); split: -0.65%, +0.39%
CodeSize: 19584436 -> 19567156 (-0.09%); split: -0.13%, +0.04%
Scratch: 217088 -> 216832 (-0.12%)
Instrs: 3784596 -> 3780303 (-0.11%); split: -0.15%, +0.03%
Latency: 39971204 -> 39794967 (-0.44%); split: -0.47%, +0.03%
InvThroughput: 7885552 -> 7801247 (-1.07%); split: -1.14%, +0.07%
VClause: 74654 -> 74611 (-0.06%); split: -0.07%, +0.01%
SClause: 103139 -> 103043 (-0.09%); split: -0.13%, +0.04%
Copies: 279864 -> 281995 (+0.76%); split: -0.72%, +1.48%
Branches: 92082 -> 92084 (+0.00%); split: -0.03%, +0.03%
PreSGPRs: 155637 -> 149491 (-3.95%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670>
2023-01-26 01:59:26 +00:00
Rhys Perry c3dd1931d9 aco: allow Builder::Result to be dereferenced
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251>
2023-01-10 16:01:38 +00:00
Samuel Pitoiset bb90d29660 aco: add p_dual_src_export_gfx11 for dual source blending on GFX11
Dual source blending must be in strict WQM mode.

Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19643>
2022-11-16 18:35:10 +00:00
Timur Kristóf d8639b7a80 aco: Allow explicitly removing jumps on GFX10+ when beneficial.
"Removing jumps" in ACO means skipping the jump instruction
at the beginning of a divergent branch (but still modify exec).

ACO already supports implicitly removing jumps when it decides
that executing a branch with empty exec mask is more beneficial
than a jump.

This commit adds the possibility to use this explicitly
through nir_selection_control. ACO will respect this
setting and remove the branch instructions when this is specified,
unless it decides that this would cause bugs (eg. exp instruction).

There are two cases that benefit from the new change:

1. When the application requests to "flatten" a branch (ie.
remove control flow), we now respect that.
2. When the compiler stack determines that a divergent branch
is always taken.

v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks

Fossil DB stats on Navi 21:

Totals from 13 (0.01% of 134906) affected shaders:
CodeSize: 136616 -> 136496 (-0.09%)
Instrs: 26196 -> 26166 (-0.11%)
Latency: 417928 -> 417889 (-0.01%)
Branches: 1241 -> 1211 (-2.42%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921>
2022-10-11 15:42:54 +00:00
Samuel Pitoiset e840ba9ed8 aco: requires Exact for p_jump_to_epilog
Otherwise, in presence of p_exit_early_if the main FS will always
jump to the PS epilog regardless the exact mask.

This fixes dEQP-VK.draw.renderpass.shader_invocation.helper_invocation
and few vkd3d-proton regressions when PS epilogs are forced.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17617>
2022-07-19 17:52:36 +00:00
Daniel Schürmann ac39e7bf23 aco: fix assertion in insert_exec_mask
The exec mask might also be of type mask_type_loop.

Fixes: d068eb53e8 ('aco/insert_exec_mask: optimize top-level transition to exact before demote')
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17402>
2022-07-19 16:56:37 +00:00
Daniel Schürmann 6de68c5dca aco: Avoid live-range splits in Exact mode
Because the data register of atomic VMEM instructions
is shared between src and dst, it might be necessary
to create live-range splits during RA.
Make the live-range splits explicit in WQM mode.

Totals from 7 (0.01% of 134913) affected shaders: (GFX10.3)
Latency: 17209 -> 17210 (+0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15347>
2022-07-19 16:30:49 +00:00
Rhys Perry d2d94b62f2 aco: initialize scratch base registers on GFX9-GFX10.3
fossil-db (navi21):
Totals from 1142 (0.70% of 162293) affected shaders:
Instrs: 271636 -> 271974 (+0.12%)
CodeSize: 1532020 -> 1533792 (+0.12%)
Latency: 7484066 -> 7485698 (+0.02%)
InvThroughput: 4048824 -> 4049579 (+0.02%)
SClause: 4171 -> 4212 (+0.98%)
PreSGPRs: 11203 -> 12276 (+9.58%)

fossil-db (vega10):
Totals from 3327 (2.06% of 161355) affected shaders:
Instrs: 257413 -> 257601 (+0.07%)
CodeSize: 1424244 -> 1425372 (+0.08%)
Latency: 8598402 -> 8600466 (+0.02%)
InvThroughput: 7906335 -> 7908234 (+0.02%)
SClause: 4932 -> 4973 (+0.83%)
PreSGPRs: 22010 -> 25405 (+15.42%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17079>
2022-07-08 14:49:03 +00:00
Rhys Perry d068eb53e8 aco/insert_exec_mask: optimize top-level transition to exact before demote
fossil-db (Sienna Cichlid):
Totals from 5767 (3.55% of 162293) affected shaders:
Instrs: 3264949 -> 3257527 (-0.23%); split: -0.23%, +0.00%
CodeSize: 17835692 -> 17806004 (-0.17%); split: -0.17%, +0.00%
Latency: 45990060 -> 45987924 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 7643850 -> 7643835 (-0.00%); split: -0.00%, +0.00%
Copies: 193641 -> 186219 (-3.83%); split: -3.84%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
2022-03-08 12:49:59 +00:00
Rhys Perry 42a5be975a aco/insert_exec_mask: use get_exec_op
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
2022-03-08 12:49:59 +00:00
Rhys Perry aa55ecc296 aco/insert_exec_mask: fix top-level to-exact with non-global exact mask
After transitioning to exact after a discard, the exec stack might be:
[exact|global, wqm, exact]

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244>
2022-03-08 12:49:59 +00:00
Rhys Perry ceca5e68c4 aco: remove vcc hint from branch definitions
This doesn't seem to have much benefit anymore.

fossil-db (Sienna Cichlid):
Totals from 198 (0.15% of 134913) affected shaders:
CodeSize: 2610536 -> 2610872 (+0.01%); split: -0.01%, +0.02%
Instrs: 479001 -> 479085 (+0.02%); split: -0.01%, +0.03%
Latency: 7310684 -> 7300735 (-0.14%); split: -0.16%, +0.02%
InvThroughput: 2439084 -> 2437446 (-0.07%); split: -0.07%, +0.00%
SClause: 14760 -> 14722 (-0.26%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432>
2022-03-03 20:21:08 +00:00
Daniel Schürmann 1bbbabedb7 aco/insert_exec_mask: refactor and remove some unnecessary WQM handling code
Some cases cannot happen and don't need to be handled anymore.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann d7d7b9974a aco/insert_exec_mask: refactor and simplify get_block_needs()
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann fcc5dec8d6 aco/insert_exec_mask: remove ever_again_needs and Exact_Branch
This information is not required anymore.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann cbb1b095ca aco/insert_exec_mask: remove some unnecessary WQM loop handling code
These workarounds are were necessary to prevent infinite loops
with helper lane registers containing wrong data.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann 580a63b4ac aco/insert_exec_mask: remove Preserve_WQM flag
If WQM is needed anywhere after discard_if(), it will also
be flagged as WQM. We can rely on that to preserve the WQM mask.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann f816dd1be7 aco: don't propagate WQM for p_as_uniform
This was needed, so that in case of active helper lanes,
these contain the correct value. It is now handled implicitly.

Totals from 1004 (0.74% of 134913) affected shaders: (GFX10.3)
CodeSize: 7581020 -> 7580892 (-0.00%); split: -0.00%, +0.00%
Instrs: 1454940 -> 1454908 (-0.00%); split: -0.00%, +0.00%
Latency: 12984953 -> 12984894 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 3173037 -> 3173049 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 47498 -> 47273 (-0.47%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann 825cd696dc aco/insert_exec_mask: stay in WQM while helper lanes are still needed
This patch flags all instructions WQM which don't require
Exact mode, but depend on the exec mask as long as WQM
is needed on any control flow path afterwards.
This will mostly prevent accidental copies of WQM values
within Exact mode, and also makes a lot of other workarounds
unnecessary.

Totals from 17374 (12.88% of 134913) affected shaders: (GFX10.3)
VGPRs: 526952 -> 527384 (+0.08%); split: -0.01%, +0.09%
CodeSize: 33740512 -> 33766636 (+0.08%); split: -0.06%, +0.14%
MaxWaves: 488166 -> 488108 (-0.01%); split: +0.00%, -0.02%
Instrs: 6254240 -> 6260557 (+0.10%); split: -0.08%, +0.18%
Latency: 66497580 -> 66463472 (-0.05%); split: -0.15%, +0.10%
InvThroughput: 13265741 -> 13264036 (-0.01%); split: -0.03%, +0.01%
VClause: 122962 -> 122975 (+0.01%); split: -0.01%, +0.02%
SClause: 334805 -> 334405 (-0.12%); split: -0.51%, +0.39%
Copies: 275728 -> 282341 (+2.40%); split: -0.91%, +3.31%
Branches: 92546 -> 90990 (-1.68%); split: -1.68%, +0.00%
PreSGPRs: 504119 -> 504352 (+0.05%); split: -0.00%, +0.05%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951>
2022-02-11 19:05:30 +00:00
Daniel Schürmann 5e9df85b1a aco: optimize discard_if when WQM is not needed afterwards
Totals from 11560 (8.57% of 134913) affected shaders: (GFX10.3)
CodeSize: 12092560 -> 11997652 (-0.78%)
Instrs: 2205325 -> 2181598 (-1.08%)
Latency: 15376048 -> 15356958 (-0.12%); split: -0.12%, +0.00%
InvThroughput: 3526105 -> 3525120 (-0.03%); split: -0.03%, +0.00%
Copies: 98543 -> 87601 (-11.10%)
Branches: 16919 -> 16873 (-0.27%)
PreSGPRs: 291584 -> 291532 (-0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
2022-02-08 16:16:07 +00:00
Daniel Schürmann 13c3137960 aco: merge block_kind_uses_[demote|discard_if]
These serve the same purpose. The new name is
block_kind_uses_discard.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
2022-02-08 16:16:07 +00:00
Daniel Schürmann e7d1c8cc5e aco: make Preserve_WQM independent from block_kind_uses_discard_if
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
2022-02-08 16:16:07 +00:00
Daniel Schürmann 08b8500dfb aco: remove block_kind_discard
This case doesn't seem to happen in practice.
No need to micro-optimize it.

This patch merges instruction selection for discard/discard_if.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
2022-02-08 16:16:07 +00:00
Daniel Schürmann b67092e685 aco: emit nir_intrinsic_discard() as p_discard_if()
This simplifies the code and emits a slightly better
sequence in some cases.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805>
2022-02-08 16:16:07 +00:00
Timur Kristóf e66f54e5c8 aco: Allow elect to take advantage of knowing when all lanes are active.
Implement elect using a pseudo-op which is lowered during the
insert_exec_mask pass. This makes it possible to emit a more
optimal sequence when the exec mask is constant.

Fossil DB results on Sienna Cichlid:
Totals from 211 (0.16% of 128647) affected shaders:
CodeSize: 2254356 -> 2240468 (-0.62%); split: -0.62%, +0.00%
Instrs: 438471 -> 434996 (-0.79%); split: -0.80%, +0.01%
Latency: 2717082 -> 2709400 (-0.28%); split: -0.28%, +0.00%
InvThroughput: 566987 -> 566342 (-0.11%); split: -0.11%, +0.00%
Copies: 40058 -> 40162 (+0.26%)
Branches: 31209 -> 31211 (+0.01%)
PreSGPRs: 9927 -> 10125 (+1.99%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458>
2021-07-16 14:31:54 +00:00
Tony Wasserka 66e51dc474 aco: Remove use of deprecated Operand constructors
This migration was done with libclang-based automatic tooling, which
performed these replacements:
* Operand(uint8_t) -> Operand::c8
* Operand(uint16_t) -> Operand::c16
* Operand(uint32_t, false) -> Operand::c32
* Operand(uint32_t, bool) -> Operand::c32_or_c64
* Operand(uint64_t) -> Operand::c64
* Operand(0) -> Operand::zero(num_bytes)

Casts that were previously used for constructor selection have automatically
been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)).

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653>
2021-07-13 17:43:26 +00:00
Daniel Schürmann 1e2639026f aco: Format.
Manually adjusted some comments for more intuitive line breaks.

Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11258>
2021-07-12 21:27:31 +00:00
Daniel Schürmann 59fdaa1985 aco: reorder and cleanup #includes
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271>
2021-07-12 12:09:31 +00:00
Daniel Schürmann 32c7d17120 aco: remove condition operand from branch in invert block
As value numbering only handles logical blocks, this
could lead to invalid IR until insert_exec_mask().
No fossil-db changes.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10894>
2021-05-20 17:44:20 +00:00
Timur Kristóf c4f6e4d6b0 aco/insert_exec_mask: Fixed unused variable warning in release build.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10806>
2021-05-20 17:11:22 +00:00
Timur Kristóf 25a7947da7 aco: Don't use s_and_saveexec with branches when exec is constant.
When exec is constant, we can remember the constant as the old exec,
and just copy the condition and use it as the new exec. There is no
need to save the constant.

Due to using p_parallelcopy which is lowered to s_mov_b64 (or 32),
many exec restores now become copies, hence the increase in the copy
stats.

Fossil DB changes on Sienna Cichlid:

Totals from 73969 (49.37% of 149839) affected shaders:
SpillSGPRs: 1768 -> 1610 (-8.94%)
CodeSize: 99053892 -> 99047884 (-0.01%); split: -0.02%, +0.01%
Instrs: 19372852 -> 19370398 (-0.01%); split: -0.02%, +0.01%
VClause: 515154 -> 515142 (-0.00%); split: -0.00%, +0.00%
SClause: 719236 -> 718395 (-0.12%); split: -0.14%, +0.02%
Copies: 1109770 -> 1254634 (+13.05%); split: -0.07%, +13.12%
Branches: 374338 -> 374348 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1776481 -> 1653761 (-6.91%)

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691>
2021-05-18 11:48:22 +00:00
Timur Kristóf c850af936a aco: Remember when exec mask is const, and restore the const then.
Previously, we would store even the constant -1 exec mask from the
beginning of every merged shader. With this change it is no longer
necessary because we can restore to constant exec mask directly.

Hence, this frees up a register pair (single register for Wave32)
in every merged shader.

No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691>
2021-05-18 11:48:22 +00:00
Timur Kristóf 04f90db9a0 aco: Use Operand instead of Temp for the exec mask stack.
This will enable us to store non-temporary values,
such as constant operands there.

No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691>
2021-05-18 11:48:22 +00:00
Rhys Perry 961361cdc9 aco: ensure loops nested in a WQM loop are in WQM
Fixes a potential empty exec mask in this situation:
enter_wqm()
loop {
   ... wqm code ...
   enter_exact()
   loop {
      ... no wqm code ...
   }
}

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Fixes: f0074a6f05 ("aco: do not flag all blocks WQM to ensure we enter all nested loops in WQM")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4546
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10075>
2021-04-08 09:56:25 +00:00
Timur Kristóf 8205cce007 aco: Use ASSERTED to avoid unused variable warning.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9632>
2021-03-16 21:46:52 +00:00
Rhys Perry 5f1b354472 aco: calculate all p_as_uniform and v_readfirstlane_b32 sources in WQM
We should avoid a situation where a v_readfirstlane_b32 is in WQM but it's
source is calculated in Exact.

Fixes hang when running Assassin's Creed: Valhalla benchmark.

fossil-db (GFX10.3):
Totals from 1021 (0.70% of 146267) affected shaders:
CodeSize: 7835228 -> 7842992 (+0.10%); split: -0.00%, +0.10%
Instrs: 1519208 -> 1521149 (+0.13%); split: -0.00%, +0.13%
SClause: 78921 -> 78920 (-0.00%)
Copies: 44456 -> 45421 (+2.17%); split: -0.05%, +2.22%
Branches: 12987 -> 13933 (+7.28%)
PreSGPRs: 47599 -> 47813 (+0.45%)
Cycles: 10037540 -> 10045304 (+0.08%); split: -0.00%, +0.08%
VMEM: 538381 -> 538777 (+0.07%); split: +0.11%, -0.03%
SMEM: 84553 -> 84554 (+0.00%); split: +0.01%, -0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9288>
2021-02-26 13:33:56 +00:00