Timur Kristóf
05928f4200
aco: Use ac_hw_stage instead of aco-specific HWStage.
...
The new ac_hw_stage is going to be used by drivers as well.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Qiang Yu <yuq825@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23597 >
2023-06-23 12:49:04 +00:00
Eric Engestrom
6b21653ab4
aco: reformat according to its .clang-format
...
Signed-off-by: Eric Engestrom <eric@igalia.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23253 >
2023-06-16 19:59:52 +00:00
Friedrich Vock
9de8134410
aco: Fix assert in insert_exec_mask
...
This assert would trigger on unconditional demotes, because the demotes
don't remove the mask_type_global flag from the exec mask.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23594 >
2023-06-12 14:20:28 +00:00
Timur Kristóf
8e9d269da6
aco: Don't use nir_selection_control in aco_ir.
...
We don't want to rely on any NIR structures in ACO, because
we would like to avoid the need to include nir.h in aco_ir.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22241 >
2023-04-10 20:01:28 +00:00
Daniel Schürmann
caec48529b
aco/insert_exec_mask: allow for disconnected CFG
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20853 >
2023-03-12 18:07:18 +00:00
Timur Kristóf
81620fc7b0
aco: Enable constant exec mask based optimization on compute shaders.
...
We know for sure exec is initially -1 when the shader always has full subgroups.
Fossil DB stats on GFX11:
Totals from 3884 (2.88% of 134913) affected shaders:
SpillSGPRs: 1673 -> 1697 (+1.43%); split: -1.67%, +3.11%
SpillVGPRs: 2316 -> 2310 (-0.26%); split: -0.65%, +0.39%
CodeSize: 19584436 -> 19567156 (-0.09%); split: -0.13%, +0.04%
Scratch: 217088 -> 216832 (-0.12%)
Instrs: 3784596 -> 3780303 (-0.11%); split: -0.15%, +0.03%
Latency: 39971204 -> 39794967 (-0.44%); split: -0.47%, +0.03%
InvThroughput: 7885552 -> 7801247 (-1.07%); split: -1.14%, +0.07%
VClause: 74654 -> 74611 (-0.06%); split: -0.07%, +0.01%
SClause: 103139 -> 103043 (-0.09%); split: -0.13%, +0.04%
Copies: 279864 -> 281995 (+0.76%); split: -0.72%, +1.48%
Branches: 92082 -> 92084 (+0.00%); split: -0.03%, +0.03%
PreSGPRs: 155637 -> 149491 (-3.95%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20670 >
2023-01-26 01:59:26 +00:00
Rhys Perry
c3dd1931d9
aco: allow Builder::Result to be dereferenced
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20251 >
2023-01-10 16:01:38 +00:00
Samuel Pitoiset
bb90d29660
aco: add p_dual_src_export_gfx11 for dual source blending on GFX11
...
Dual source blending must be in strict WQM mode.
Cc: 22.3 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19643 >
2022-11-16 18:35:10 +00:00
Timur Kristóf
d8639b7a80
aco: Allow explicitly removing jumps on GFX10+ when beneficial.
...
"Removing jumps" in ACO means skipping the jump instruction
at the beginning of a divergent branch (but still modify exec).
ACO already supports implicitly removing jumps when it decides
that executing a branch with empty exec mask is more beneficial
than a jump.
This commit adds the possibility to use this explicitly
through nir_selection_control. ACO will respect this
setting and remove the branch instructions when this is specified,
unless it decides that this would cause bugs (eg. exp instruction).
There are two cases that benefit from the new change:
1. When the application requests to "flatten" a branch (ie.
remove control flow), we now respect that.
2. When the compiler stack determines that a divergent branch
is always taken.
v2 by Georg Lehmann: fixed applying sel_ctrl to else blocks
Fossil DB stats on Navi 21:
Totals from 13 (0.01% of 134906) affected shaders:
CodeSize: 136616 -> 136496 (-0.09%)
Instrs: 26196 -> 26166 (-0.11%)
Latency: 417928 -> 417889 (-0.01%)
Branches: 1241 -> 1211 (-2.42%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Reviewed-By: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17921 >
2022-10-11 15:42:54 +00:00
Samuel Pitoiset
e840ba9ed8
aco: requires Exact for p_jump_to_epilog
...
Otherwise, in presence of p_exit_early_if the main FS will always
jump to the PS epilog regardless the exact mask.
This fixes dEQP-VK.draw.renderpass.shader_invocation.helper_invocation
and few vkd3d-proton regressions when PS epilogs are forced.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17617 >
2022-07-19 17:52:36 +00:00
Daniel Schürmann
ac39e7bf23
aco: fix assertion in insert_exec_mask
...
The exec mask might also be of type mask_type_loop.
Fixes: d068eb53e8 ('aco/insert_exec_mask: optimize top-level transition to exact before demote')
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17402 >
2022-07-19 16:56:37 +00:00
Daniel Schürmann
6de68c5dca
aco: Avoid live-range splits in Exact mode
...
Because the data register of atomic VMEM instructions
is shared between src and dst, it might be necessary
to create live-range splits during RA.
Make the live-range splits explicit in WQM mode.
Totals from 7 (0.01% of 134913) affected shaders: (GFX10.3)
Latency: 17209 -> 17210 (+0.01%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15347 >
2022-07-19 16:30:49 +00:00
Rhys Perry
d2d94b62f2
aco: initialize scratch base registers on GFX9-GFX10.3
...
fossil-db (navi21):
Totals from 1142 (0.70% of 162293) affected shaders:
Instrs: 271636 -> 271974 (+0.12%)
CodeSize: 1532020 -> 1533792 (+0.12%)
Latency: 7484066 -> 7485698 (+0.02%)
InvThroughput: 4048824 -> 4049579 (+0.02%)
SClause: 4171 -> 4212 (+0.98%)
PreSGPRs: 11203 -> 12276 (+9.58%)
fossil-db (vega10):
Totals from 3327 (2.06% of 161355) affected shaders:
Instrs: 257413 -> 257601 (+0.07%)
CodeSize: 1424244 -> 1425372 (+0.08%)
Latency: 8598402 -> 8600466 (+0.02%)
InvThroughput: 7906335 -> 7908234 (+0.02%)
SClause: 4932 -> 4973 (+0.83%)
PreSGPRs: 22010 -> 25405 (+15.42%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17079 >
2022-07-08 14:49:03 +00:00
Rhys Perry
d068eb53e8
aco/insert_exec_mask: optimize top-level transition to exact before demote
...
fossil-db (Sienna Cichlid):
Totals from 5767 (3.55% of 162293) affected shaders:
Instrs: 3264949 -> 3257527 (-0.23%); split: -0.23%, +0.00%
CodeSize: 17835692 -> 17806004 (-0.17%); split: -0.17%, +0.00%
Latency: 45990060 -> 45987924 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 7643850 -> 7643835 (-0.00%); split: -0.00%, +0.00%
Copies: 193641 -> 186219 (-3.83%); split: -3.84%, +0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244 >
2022-03-08 12:49:59 +00:00
Rhys Perry
42a5be975a
aco/insert_exec_mask: use get_exec_op
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244 >
2022-03-08 12:49:59 +00:00
Rhys Perry
aa55ecc296
aco/insert_exec_mask: fix top-level to-exact with non-global exact mask
...
After transitioning to exact after a discard, the exec stack might be:
[exact|global, wqm, exact]
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15244 >
2022-03-08 12:49:59 +00:00
Rhys Perry
ceca5e68c4
aco: remove vcc hint from branch definitions
...
This doesn't seem to have much benefit anymore.
fossil-db (Sienna Cichlid):
Totals from 198 (0.15% of 134913) affected shaders:
CodeSize: 2610536 -> 2610872 (+0.01%); split: -0.01%, +0.02%
Instrs: 479001 -> 479085 (+0.02%); split: -0.01%, +0.03%
Latency: 7310684 -> 7300735 (-0.14%); split: -0.16%, +0.02%
InvThroughput: 2439084 -> 2437446 (-0.07%); split: -0.07%, +0.00%
SClause: 14760 -> 14722 (-0.26%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13432 >
2022-03-03 20:21:08 +00:00
Daniel Schürmann
1bbbabedb7
aco/insert_exec_mask: refactor and remove some unnecessary WQM handling code
...
Some cases cannot happen and don't need to be handled anymore.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951 >
2022-02-11 19:05:30 +00:00
Daniel Schürmann
d7d7b9974a
aco/insert_exec_mask: refactor and simplify get_block_needs()
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951 >
2022-02-11 19:05:30 +00:00
Daniel Schürmann
fcc5dec8d6
aco/insert_exec_mask: remove ever_again_needs and Exact_Branch
...
This information is not required anymore.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951 >
2022-02-11 19:05:30 +00:00
Daniel Schürmann
cbb1b095ca
aco/insert_exec_mask: remove some unnecessary WQM loop handling code
...
These workarounds are were necessary to prevent infinite loops
with helper lane registers containing wrong data.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951 >
2022-02-11 19:05:30 +00:00
Daniel Schürmann
580a63b4ac
aco/insert_exec_mask: remove Preserve_WQM flag
...
If WQM is needed anywhere after discard_if(), it will also
be flagged as WQM. We can rely on that to preserve the WQM mask.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951 >
2022-02-11 19:05:30 +00:00
Daniel Schürmann
f816dd1be7
aco: don't propagate WQM for p_as_uniform
...
This was needed, so that in case of active helper lanes,
these contain the correct value. It is now handled implicitly.
Totals from 1004 (0.74% of 134913) affected shaders: (GFX10.3)
CodeSize: 7581020 -> 7580892 (-0.00%); split: -0.00%, +0.00%
Instrs: 1454940 -> 1454908 (-0.00%); split: -0.00%, +0.00%
Latency: 12984953 -> 12984894 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 3173037 -> 3173049 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 47498 -> 47273 (-0.47%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951 >
2022-02-11 19:05:30 +00:00
Daniel Schürmann
825cd696dc
aco/insert_exec_mask: stay in WQM while helper lanes are still needed
...
This patch flags all instructions WQM which don't require
Exact mode, but depend on the exec mask as long as WQM
is needed on any control flow path afterwards.
This will mostly prevent accidental copies of WQM values
within Exact mode, and also makes a lot of other workarounds
unnecessary.
Totals from 17374 (12.88% of 134913) affected shaders: (GFX10.3)
VGPRs: 526952 -> 527384 (+0.08%); split: -0.01%, +0.09%
CodeSize: 33740512 -> 33766636 (+0.08%); split: -0.06%, +0.14%
MaxWaves: 488166 -> 488108 (-0.01%); split: +0.00%, -0.02%
Instrs: 6254240 -> 6260557 (+0.10%); split: -0.08%, +0.18%
Latency: 66497580 -> 66463472 (-0.05%); split: -0.15%, +0.10%
InvThroughput: 13265741 -> 13264036 (-0.01%); split: -0.03%, +0.01%
VClause: 122962 -> 122975 (+0.01%); split: -0.01%, +0.02%
SClause: 334805 -> 334405 (-0.12%); split: -0.51%, +0.39%
Copies: 275728 -> 282341 (+2.40%); split: -0.91%, +3.31%
Branches: 92546 -> 90990 (-1.68%); split: -1.68%, +0.00%
PreSGPRs: 504119 -> 504352 (+0.05%); split: -0.00%, +0.05%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14951 >
2022-02-11 19:05:30 +00:00
Daniel Schürmann
5e9df85b1a
aco: optimize discard_if when WQM is not needed afterwards
...
Totals from 11560 (8.57% of 134913) affected shaders: (GFX10.3)
CodeSize: 12092560 -> 11997652 (-0.78%)
Instrs: 2205325 -> 2181598 (-1.08%)
Latency: 15376048 -> 15356958 (-0.12%); split: -0.12%, +0.00%
InvThroughput: 3526105 -> 3525120 (-0.03%); split: -0.03%, +0.00%
Copies: 98543 -> 87601 (-11.10%)
Branches: 16919 -> 16873 (-0.27%)
PreSGPRs: 291584 -> 291532 (-0.02%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805 >
2022-02-08 16:16:07 +00:00
Daniel Schürmann
13c3137960
aco: merge block_kind_uses_[demote|discard_if]
...
These serve the same purpose. The new name is
block_kind_uses_discard.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805 >
2022-02-08 16:16:07 +00:00
Daniel Schürmann
e7d1c8cc5e
aco: make Preserve_WQM independent from block_kind_uses_discard_if
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805 >
2022-02-08 16:16:07 +00:00
Daniel Schürmann
08b8500dfb
aco: remove block_kind_discard
...
This case doesn't seem to happen in practice.
No need to micro-optimize it.
This patch merges instruction selection for discard/discard_if.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805 >
2022-02-08 16:16:07 +00:00
Daniel Schürmann
b67092e685
aco: emit nir_intrinsic_discard() as p_discard_if()
...
This simplifies the code and emits a slightly better
sequence in some cases.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14805 >
2022-02-08 16:16:07 +00:00
Timur Kristóf
e66f54e5c8
aco: Allow elect to take advantage of knowing when all lanes are active.
...
Implement elect using a pseudo-op which is lowered during the
insert_exec_mask pass. This makes it possible to emit a more
optimal sequence when the exec mask is constant.
Fossil DB results on Sienna Cichlid:
Totals from 211 (0.16% of 128647) affected shaders:
CodeSize: 2254356 -> 2240468 (-0.62%); split: -0.62%, +0.00%
Instrs: 438471 -> 434996 (-0.79%); split: -0.80%, +0.01%
Latency: 2717082 -> 2709400 (-0.28%); split: -0.28%, +0.00%
InvThroughput: 566987 -> 566342 (-0.11%); split: -0.11%, +0.00%
Copies: 40058 -> 40162 (+0.26%)
Branches: 31209 -> 31211 (+0.01%)
PreSGPRs: 9927 -> 10125 (+1.99%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11458 >
2021-07-16 14:31:54 +00:00
Tony Wasserka
66e51dc474
aco: Remove use of deprecated Operand constructors
...
This migration was done with libclang-based automatic tooling, which
performed these replacements:
* Operand(uint8_t) -> Operand::c8
* Operand(uint16_t) -> Operand::c16
* Operand(uint32_t, false) -> Operand::c32
* Operand(uint32_t, bool) -> Operand::c32_or_c64
* Operand(uint64_t) -> Operand::c64
* Operand(0) -> Operand::zero(num_bytes)
Casts that were previously used for constructor selection have automatically
been removed (e.g. Operand((uint16_t)1) -> Operand::c16(1)).
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11653 >
2021-07-13 17:43:26 +00:00
Daniel Schürmann
1e2639026f
aco: Format.
...
Manually adjusted some comments for more intuitive line breaks.
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11258 >
2021-07-12 21:27:31 +00:00
Daniel Schürmann
59fdaa1985
aco: reorder and cleanup #includes
...
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11271 >
2021-07-12 12:09:31 +00:00
Daniel Schürmann
32c7d17120
aco: remove condition operand from branch in invert block
...
As value numbering only handles logical blocks, this
could lead to invalid IR until insert_exec_mask().
No fossil-db changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10894 >
2021-05-20 17:44:20 +00:00
Timur Kristóf
c4f6e4d6b0
aco/insert_exec_mask: Fixed unused variable warning in release build.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10806 >
2021-05-20 17:11:22 +00:00
Timur Kristóf
25a7947da7
aco: Don't use s_and_saveexec with branches when exec is constant.
...
When exec is constant, we can remember the constant as the old exec,
and just copy the condition and use it as the new exec. There is no
need to save the constant.
Due to using p_parallelcopy which is lowered to s_mov_b64 (or 32),
many exec restores now become copies, hence the increase in the copy
stats.
Fossil DB changes on Sienna Cichlid:
Totals from 73969 (49.37% of 149839) affected shaders:
SpillSGPRs: 1768 -> 1610 (-8.94%)
CodeSize: 99053892 -> 99047884 (-0.01%); split: -0.02%, +0.01%
Instrs: 19372852 -> 19370398 (-0.01%); split: -0.02%, +0.01%
VClause: 515154 -> 515142 (-0.00%); split: -0.00%, +0.00%
SClause: 719236 -> 718395 (-0.12%); split: -0.14%, +0.02%
Copies: 1109770 -> 1254634 (+13.05%); split: -0.07%, +13.12%
Branches: 374338 -> 374348 (+0.00%); split: -0.00%, +0.00%
PreSGPRs: 1776481 -> 1653761 (-6.91%)
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691 >
2021-05-18 11:48:22 +00:00
Timur Kristóf
c850af936a
aco: Remember when exec mask is const, and restore the const then.
...
Previously, we would store even the constant -1 exec mask from the
beginning of every merged shader. With this change it is no longer
necessary because we can restore to constant exec mask directly.
Hence, this frees up a register pair (single register for Wave32)
in every merged shader.
No Fossil DB changes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691 >
2021-05-18 11:48:22 +00:00
Timur Kristóf
04f90db9a0
aco: Use Operand instead of Temp for the exec mask stack.
...
This will enable us to store non-temporary values,
such as constant operands there.
No Fossil DB changes.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Tony Wasserka <tony.wasserka@gmx.de >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10691 >
2021-05-18 11:48:22 +00:00
Rhys Perry
961361cdc9
aco: ensure loops nested in a WQM loop are in WQM
...
Fixes a potential empty exec mask in this situation:
enter_wqm()
loop {
... wqm code ...
enter_exact()
loop {
... no wqm code ...
}
}
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Fixes: f0074a6f05 ("aco: do not flag all blocks WQM to ensure we enter all nested loops in WQM")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4546
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10075 >
2021-04-08 09:56:25 +00:00
Timur Kristóf
8205cce007
aco: Use ASSERTED to avoid unused variable warning.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9632 >
2021-03-16 21:46:52 +00:00
Rhys Perry
5f1b354472
aco: calculate all p_as_uniform and v_readfirstlane_b32 sources in WQM
...
We should avoid a situation where a v_readfirstlane_b32 is in WQM but it's
source is calculated in Exact.
Fixes hang when running Assassin's Creed: Valhalla benchmark.
fossil-db (GFX10.3):
Totals from 1021 (0.70% of 146267) affected shaders:
CodeSize: 7835228 -> 7842992 (+0.10%); split: -0.00%, +0.10%
Instrs: 1519208 -> 1521149 (+0.13%); split: -0.00%, +0.13%
SClause: 78921 -> 78920 (-0.00%)
Copies: 44456 -> 45421 (+2.17%); split: -0.05%, +2.22%
Branches: 12987 -> 13933 (+7.28%)
PreSGPRs: 47599 -> 47813 (+0.45%)
Cycles: 10037540 -> 10045304 (+0.08%); split: -0.00%, +0.08%
VMEM: 538381 -> 538777 (+0.07%); split: +0.11%, -0.03%
SMEM: 84553 -> 84554 (+0.00%); split: +0.01%, -0.01%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9288 >
2021-02-26 13:33:56 +00:00
Daniel Schürmann
29b866fef6
aco: remove special handling of load_helper_invocation
...
These should now behave the same as is_helper_invocation.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9058 >
2021-02-17 21:53:52 +00:00
Daniel Schürmann
fc6b5be666
aco: fix assertion in insert_exec_mask pass
...
Fixes: a56ddca4e8 ('aco: make all exec accesses non-temporaries ')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9047 >
2021-02-15 19:50:16 +00:00
Rhys Perry
ddce1ec5f5
aco: fix transition_to_{WQM,Exact} if exec.back() is not in exec
...
This can happen at merge blocks.
fossil-db (GFX10.3):
Totals from 25229 (17.25% of 146267) affected shaders:
CodeSize: 58575920 -> 58571376 (-0.01%); split: -0.01%, +0.00%
Instrs: 10979245 -> 10978109 (-0.01%); split: -0.01%, +0.00%
SClause: 591817 -> 591816 (-0.00%)
Copies: 604987 -> 603851 (-0.19%); split: -0.19%, +0.00%
Cycles: 96088796 -> 96084252 (-0.00%); split: -0.00%, +0.00%
VMEM: 10470372 -> 10470368 (-0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Fixes: a56ddca4e8 ("aco: make all exec accesses non-temporaries")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4299
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9047 >
2021-02-15 19:50:16 +00:00
Daniel Schürmann
8b793f9567
aco: remove dead code for the handling of exec temporaries
...
Totals from 26026 (18.67% of 139391) affected shaders (Navi10):
PreSGPRs: 370993 -> 326177 (-12.08%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8870 >
2021-02-12 22:41:31 +00:00
Daniel Schürmann
a56ddca4e8
aco: make all exec accesses non-temporaries
...
So that they are not counted into the register demand.
Totals from 107336 (77.00% of 139391) affected shaders (Navi10):
VGPRs: 4023452 -> 4023248 (-0.01%); split: -0.01%, +0.01%
SpillSGPRs: 14088 -> 12571 (-10.77%); split: -11.03%, +0.26%
CodeSize: 266816164 -> 266765528 (-0.02%); split: -0.04%, +0.02%
MaxWaves: 1553339 -> 1553374 (+0.00%); split: +0.00%, -0.00%
Instrs: 50977701 -> 50973093 (-0.01%); split: -0.02%, +0.01%
Cycles: 1733911128 -> 1733605320 (-0.02%); split: -0.05%, +0.03%
VMEM: 40867650 -> 40900204 (+0.08%); split: +0.13%, -0.05%
SMEM: 6835980 -> 6829073 (-0.10%); split: +0.10%, -0.20%
VClause: 1032783 -> 1032788 (+0.00%); split: -0.01%, +0.01%
SClause: 2103705 -> 2104115 (+0.02%); split: -0.09%, +0.11%
Copies: 3195658 -> 3193656 (-0.06%); split: -0.30%, +0.24%
Branches: 1140213 -> 1140120 (-0.01%); split: -0.05%, +0.04%
PreSGPRs: 3603785 -> 3437064 (-4.63%); split: -5.13%, +0.50%
PreVGPRs: 3321996 -> 3321850 (-0.00%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8870 >
2021-02-12 22:41:31 +00:00
Daniel Schürmann
e663a15098
aco: don't create unnecessary exec phi on merge blocks
...
No fossil-db changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8870 >
2021-02-12 22:41:31 +00:00
Rhys Perry
e2608312d3
aco: remove loop to flag loop blocks as WQM
...
This is no longer necessary.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8446 >
2021-02-09 17:52:17 +00:00
Rhys Perry
ed020008b5
aco: rewrite setting of Exact_Branch
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8446 >
2021-02-09 17:52:17 +00:00
Rhys Perry
f0074a6f05
aco: do not flag all blocks WQM to ensure we enter all nested loops in WQM
...
This should no longer be necessary since the mark_block_wqm() we use to
flag break conditions as WQM now adds block to the worklist. With them
added to the worklist, get_block_needs() will add WQM to block_needs.
Adding WQM to block_needs here without adding the block to the worklist
(like we do here) can cause issues because it does not ensure that the
predecessors' branches are in WQM (needed for it to be possible to
transition to WQM in the block). This happened in an Overwatch shader.
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Fixes: 661922f6ac ("aco: add block to worklist in mark_block_wqm()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4066
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8446 >
2021-02-09 17:52:17 +00:00