These dependency hints were primarily useful for the vec4 backend, where
it was common to write subsets of a vec4's components across multiple
instructions. In the scalar backend, we rarely used them. They also no
longer exist on Tigerlake and later in favor of software scoreboarding.
Dropping this allows us to clean up the IR a bit.
We still use the hardware hints in the generator in a couple places:
- Gfx9-12.0 scratch headers
- Quad swizzles
- Indirect MOV lowering
In theory we might want them back if we moved that lowering to the IR.
For scratch at least, I suspect it won't have a huge impact, as we're
already incurring the cost of spills/fills. The others are fairly rare
as well, so it may not be worth keeping.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
NIR is going to use exec_node/list without the C++ code, and may switch to
a different linked list implementation in the future.
GLSL is going to use ir_exec_node/list, which we want to keep private
for GLSL, so that we can change it easily.
Thus, it's better to fork the C++ version of list.h for Intel.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36425>
At this point, using the per-register granularity will only help in
conjuction with fragment shader discard (which is implemented using f1).
v2: Loop restructuring and code cleanups. Suggested by Curro.
v3: Only apply Wa on Gfx12.5+. Suggested by Curro.
v4: Also apply to implicit flag reads. Suggested by Curro. This version
affects a *lot* more shaders (10,936 on Meteor Lake shader-db versus
4,482 before). The results are still very much in the 🤷 territory.
v5: Add missing dependency. I thought I got them all the previous
time. :( Noticed by Curro.
shader-db:
Lunar Lake
total cycles in shared programs: 886315282 -> 886391040 (<.01%)
cycles in affected programs: 204907250 -> 204983008 (0.04%)
helped: 1 / HURT: 6716
LOST: 0
GAINED: 1
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total cycles in shared programs: 883774789 -> 883921507 (0.02%)
cycles in affected programs: 481836784 -> 481983502 (0.03%)
helped: 4 / HURT: 10936
LOST: 3
GAINED: 7
fossil-db:
Lunar Lake
Totals:
Cycle count: 32600441334 -> 32601862658 (+0.00%); split: -0.00%, +0.00%
Totals from 90283 (11.44% of 789260) affected shaders:
Cycle count: 17265933202 -> 17267354526 (+0.01%); split: -0.00%, +0.01%
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Cycle count: 26477292677 -> 26480321805 (+0.01%); split: -0.00%, +0.01%
Max dispatch width: 8010440 -> 8010984 (+0.01%)
Totals from 132952 (14.71% of 903925) affected shaders:
Cycle count: 15349555348 -> 15352584476 (+0.02%); split: -0.00%, +0.02%
Max dispatch width: 1085416 -> 1085960 (+0.05%)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35415>
The unordered mode stored in dependencies might be a bitmask and not
only a single mode. In practice, only the "stronger" mode will stick.
Make sure that the code testing for the mode uses "&" instead of "==",
to avoid prevent some valid combinations to happen, e.g.
```
// ...
add(16) g104<1>F g94<1,1,0>F g34<1,1,0>F { align1 1H @7 $7.dst compacted };
```
which without the fix ends up as
```
// ...
sync nop(1) null<0,1,0>UB { align1 WE_all 1N F@7 };
add(16) g104<1>F g94<1,1,0>F g34<1,1,0>F { align1 1H $7.dst compacted };
```
Enables two tests for the scoreboard pass that illustrate this case.
For measuring the effect, re-enabled the sync.nop accounting on total of
instructions and got the following results.
```
Totals:
Instrs: 322041261 -> 321748285 (-0.09%)
Cycle count: 22864587567 -> 22863073741 (-0.01%)
Max dispatch width: 7989040 -> 7989024 (-0.00%); split: +0.00%, -0.00%
Totals from 88212 (9.78% of 902056) affected shaders:
Instrs: 102282050 -> 101989074 (-0.29%)
Cycle count: 12787629859 -> 12786116033 (-0.01%)
Max dispatch width: 525336 -> 525320 (-0.00%); split: +0.01%, -0.01%
```
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36096>
Make the "block after DO" more stable so that adding instructions after
a DO doesn't require repairing the CFG. Use a new SHADER_OPCODE_FLOW
instruction that is a placeholder representing "go to the next block"
and disappears at code generation.
For some context, there are a few facts about how CFG currently works
- Blocks are assumed to not be empty;
- DO is always by itself in a block, i.e. starts and ends a block;
- There are no empty blocks;
- Predicated WHILE and CONTINUE will link to the "block after DO";
- When nesting loops, it is possible that the "block after DO" is
another "DO".
Reasons and further explanations for those are in the brw_cfg.c comments.
What makes this new change useful is that a pass might want to add
instructions between two DO instructions. When that happens, a new
block must be created and any predicated WHILE and CONTINUE must be
repaired.
So, instead of requiring a repair (which has proven to be tricky in
the past), this change adds a block that can be "virtually" empty but
allow instructions to be added without further changes.
One alternative design would be allowing empty blocks, that would be
a deeper change since the blocks are currently assumed to be not empty
in various places. We'll save that for when other changes are made to
the CFG.
The problem described happens in brw_opt_combine_constants, and a
different patch will clean that up.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33536>
Xe3 supports 32 SBID tokens per thread regardless of the number of
register blocks allocated per thread. Take advantage of the increased
number of SBIDs in the scoreboard pass to reduce the frequency of
false dependencies on Xe3+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32664>
When a SEND instruction is a EOT, the scoreboard lowering will not
allocate a new SBID for it, since nothing needs to wait for it. In
Gfx12 this allowed the SEND to get out-of-order $.dst or $.src
dependencies.
Starting on Xe2+ this is not supported anymore, in favor of supporting
more combined modes.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32712>