It prevents the hazard when in the following case:
ldc.1.k.imm c[a1.x], 0, 1
(ss)mova1 a1.x, 8
The correct way is:
ldc.1.k.imm c[a1.x], 0, 1
(ss)mova1 a1.x, (r)8
Without it ldc may use a1.x which is set after ldc.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27462>
This used to work by luck because the current DGC prepare shader
is using one descriptor set and it was the currently bound compute
shader... Using two descriptor sets or starting from 1 would just fail.
For indirect compute pipelines, descriptors must be emitted from the
DGC shader because there is no bound compute pipeline at all. This
solution is using indirect descriptor sets because it's much shorter
and easier to implement. This could be improved but nothing uses
indirect compute pipelines and this is like experimental stuff.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29700>
On newer devices where ZPASS_DONE events have sample count writing
abilities the firmware expects these events to come in begin-end pairs,
essentially corresponding to a typical occlusion query usage. Since this
event is also used in the autotuner we have to avoid event pairs to be
emitted in an interleaved fashion.
Additional renderpass state now tracks whether a given renderpass contains
an occlusion query. If so, autotuner will emit miscellaneous ZPASS_DONE
events in order to form its own begin-end pairs before and after the
renderpass commands.
Occlusion query behavior inside a renderpass doesn't change. But when used
outside of a renderpass, possible autotuner usage requires to again emit
ZPASS_DONE events that end up forming begin-end pairs of these events both
at the start and the end of the query.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Fixes: 4e6a1f8852 ("tu/autotune: Use `CP_EVENT_WRITE7::ZPASS_DONE` on A7XX")
Tested-by: Mike Lothian <mike@fireburn.co.uk>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29403>
When both an interval and some of its children would be live-in, we used
to add phis for all of them. This could lead to cases where the pressure
after spilling was higher than before.
This happens, for example, when both a split and its parent are live-in.
Before spilling, the split wouldn't add to the pressure because its
parent had already been inserted. After spilling, since we created a phi
for the split, the link with its parent would be lost and it would add
to the pressure.
Fix this by only adding phis for top-level intervals and adding splits
after them.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
There were a few places that used an instruction pointer to decide where
new instructions should be created. NULL was used to add them at the end
of the block. While fixing a spilling bug, a new option was needed to
add instructions at the beginning of the block. This will be much easier
to implement using cursors.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
Whenever instructions need to be created at specific locations, ir3
often passes around an instruction pointer. When set, new instructions
are added before or after it (depending on the context). When NULL, new
instructions are added at the end of the block. This whole scheme is
confusing.
This patch adds ir3_cursor and ir3_builder structs and the associated
helper functions. The API mirrors the one from nir_cursor/nir_builder.
This patch does not refactor existing code to use the new API. This will
happen in future patches.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
ir3_force_merge (through merge_merge_sets) expects instructions to be
indexed. However, the instructions created during spilling would not be
automatically indexed at this point.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
It might happen that a collect that cannot be coalesced with one of its
sources while spilling can be coalesced with it afterwards. In this
case, we might be able to remove it in remove_src_early during spilling
but not afterwards (because it may have a child interval). If this
happens, we could end up with a register pressure that is higher after
spilling than before. Prevent this by never removing collects early
while spilling.
Fixes: 613eaac7b5 ("ir3: Initial support for spilling non-shared registers")
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29497>
Copy propagation often eliminates all uses of an instruction. If we
detect that we've done so, we can eliminate the instruction ourselves
rather than leaving it hanging until the next DCE pass.
This saves some CPU time as other passes don't see dead code.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
The new def-based pass works better in many cases, and should be less
resource intensive. However, the limited visibility of the defs-based
pass due to many values not being SSA yet makes it unable to fully
replace the old pass. Try the new one, and if it can't make progress,
then try the old one. That way, things will mostly be handled by the
new pass, but everything that was being cleaned up still will be.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
While the limited visibility due to partial SSA is a downside to the new
pass, it has a huge number of advantages that make it worth switching
over even now. It's much more efficient, can eliminate redundant memory
loads across blocks, and doesn't generate loads of unnecessary copies
that other passes have to clean up. This means we also eliminate the
infighting between the old CSE, coalescing, and copy propagation passes.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
This has a number of advantages compared to the pass I wrote years ago:
- It can easily perform either Global CSE or block-local CSE, without
needing to roll any dataflow analysis, thanks to SSA def analysis.
This global CSE is able to detect and coalesce memory loads across
blocks. Although it may increase spilling a little, the reduction
in memory loads seems to more than compensate.
- Because SSA guarantees that values are never written more than once,
the new CSE pass can directly reuse an existing value. The old pass
emitted copies at the point where it discovered a value because it
had no idea whether it'd be mutated later. This led it to generate
a ton of trash for copy propagation to clean up later, and also a
nasty fragility where CSE, register coalescing, and copy propagation
could all fight one another by generating and cleaning up copies,
leading to infinite optimization loops unless we were really careful.
Generating less trash improves our CPU efficiency.
- It uses hash tables like nir_instr_set and nir_opt_cse, instead of
linearly walking lists and comparing each element. This is much more
CPU efficient.
- It doesn't use liveness analysis, which is one of the most expensive
analysis passes that we have. Def analysis is cheaper.
In addition to CSE'ing SSA values, we continue to handle flag writes,
as this is a huge source of CSE'able values. These remain block local.
However, we can simply track the last flag write, rather than creating
entire sets of instruction entries like the old pass. Much simpler.
The only real downside to this pass is that, because the backend is
currently only partially SSA, it has limited visibility and isn't able
to see all values. However, the results appear to be good enough that
the new pass can effectively replace the old pass in almost all cases.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>
Like NIR, we print SSA defs as %1, %2, and so on. The number here is
the VGRF number. VGRFs that don't correspond to a SSA def remain
printed as vgrf1, vgrf2, and so on.
This makes it much easier to see what values are SSA and which aren't.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28666>