Instead of creating a cmps.s.ne for every use of a predicate, create
just one and insert it after the instruction whose def is tested. This
reduces the number of compares or, when they are folded into bitwise
operations, those operations.
It also decreases register pressure on GPRs by increasing pressure on
predicate registers. This should be preferred in general since at worst,
the predicate register will be spilled to a GPR again.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
On a6xx+, bitwise operations can directly write to predicate registers.
The result will be 1 iff the result of the non-predicate operation would
be non-zero.
When generating instructions that need a predicate source, ir3 will
insert a cmps.s.ne 0 instruction to guarantee a predicate can be
produced. This is kept in place by this patch and we add a pass that
tries to optimize useless comparisons away.
Concretely:
- Look through chains of multiple cmps.s.ne instructions and remove all
but the first.
- If the source of the cmps.s.ne can write directly to predicates,
remove the cmps.s.ne.
In both cases, no instructions are actually removed but clones are made
and we rely on DCE to remove anything that became unused. Note that it's
fine to always make a clone since even in the case that the original
instruction is also used for non-predicate sources (so it won't be
DCE'd), we replaced a cmps.ne.s with another instruction so this pass
should never increase instruction count.
Note that this pass replaces the double-comparison folding that was
performed by ir3_cp before.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
In principle, validating the result of predicate RA works exactly the
same as for normal RA. There is one slight issue: spilling is
implemented by cloning the instruction that produced the original def
which might cause different defs to legitimately reach the same source.
For example:
bool b = ...;
if (...) {
// b gets spilled and reloaded
} else {
// b is not spilled
}
use b;
The use of b might see different reaching defs. To solve this, RA will
store a pointer to the original def in the instruction's data field.
Validation then uses the original def.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
Up to now, ir3 only supported one predicate register (p0.x). However,
since a6xx, four predicate registers are available. This patch adds a
register allocator for predicate registers that allows all of them to be
used. The RA also works for older generations with only one register.
The use of p0.x was hard-coded in many places in ir3. This has been
replaced by a new flag, IR3_REG_PREDICATE, to indicate that an SSA value
should be allocated to a predicate register.
The RA uses the standard liveness analysis available in ir3. Using this,
registers are allocated in a single pass over all blocks. For each block
we keep track of currently live defs in the registers. Predicate
destinations allocate a new register and sources take the register from
their def.
The live defs of a block are initialized with the intersection of the
live-out defs of their predecessors: if all predecessors have the same
live-out def in the same register, it is used as live-in. However, we
only do this for defs that are actually live-in according to the
liveness analysis.
This doesn't work for loops: since predecessors from back edges are
processed after their successors, we don't know their live-out state
yet. We solve this by ignoring such predecessors while calculating the
live-in state. When this predecessor is later processed, we fix-up its
live-out state to match what its successor expects by reloading defs if
necessary.
Spilling is implemented by reloading, or rematerializing, the
instruction that produced the def. Whenever we need a new register while
none are available, we simply free one. If the freed def is later needed
again, we clone the original instruction in front on the new use. We
keep track of the original def the reload is cloned from so that
subsequent uses can reuse the reload.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
We currently have a bit of a confusing situation where we have both
opcodes for the different branches (OPC_BR, OPC_BRAA,...) and branch
types which are supposed to be used with OPC_B (BRANCH_PLAIN,
BRANCH_AND,...). However, not every kind of branch has a corresponding
type. For example, getone is represented by OPC_GETONE instead of a
branch type.
This patch proposes to get rid of the branch types and use opcodes
everywhere. I think this makes the representation of branches more
consistent. It also removes the for the encoder to translate branch
types into opcodes.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
Instead of using brtype and condition fields in blocks, explicitly add
terminator branches in their instruction lists. This makes it more
uniform to deal with branches in passes. This will be especially useful
for passes that need to deal with predicate registers like the new
register allocator.
Note that only a single terminator is added to blocks: either an
unconditional jump in case of a single successor, or a conditional
branch in case of two successors. In the latter case, the unconditional
jump to the second successor is implicit. This makes it slightly easier
to handle terminators since there is always exactly one.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27411>
Let's say you have an image in R32_UINT format, a view is created in
R32_SFLOAT and used as color attachment.
When resolving the attachment, our current code uses the image format
(R32_UINT in this case). But resolve mode might apply only to SFLOAT,
so we currently run into an assert in blorp.
We should instead use the view format. There is an exception for
depth/stencil view because the format we want to resolve is actually
the depth/stencil format, not just the depth or stencil aspect.
This fixes vkd3d-proton's test_multisample_resolve_formats.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27875>
Using whatever version is the latest at the time of the image build is
bad practice from a stability & reproducibility point of view, and the
latest version is currently broken, preventing any change that rebuilds
the android image from being merged.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27911>
Various improvements to the CS related definitions:
- make the field name consistent across all instructions using the same
pattern
- define missing fields,
- replace the CEU prefix by a CS prefix
- define enums where it makes sense
- re-order instruction definitions by IDs
- add missing instructions
While at it, extend decode_csf.c to support all known instructions.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26358>
Panthor is a new kernel driver handling CSF-based GPUs. It's designed
around the new VM model where:
- VM management is all explicit (you get to choose where objects are
mapped in the GPU VA space)
- synchronization is explicit too (there's not BO_WAIT, and we don't
pass BOs around to serve as implicit deps)
We add a few panthor specific helpers (those exposed in panthor_kmod.h)
too:
- panthor_kmod_xxx_sync_point() are needed to make pan_kmod_bo_wait()
work with the new synchronization/VM model
- panthor_kmod_get_flush_id() is exposing the LATEST_FLUSH_ID register
- panthor_kmod_vm_handle() is providing a way to query the VM handle
attached to the pan_kmod_vm object (needed for a few panthor ioctls)
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26358>
Rework the way we compute thread info to make it mostly GPU-agnostic
outside of the kmod backend.
The new logic is based on the following information extracted from
GPU registers:
- mximum number of threads per core
- maximum number ot threads per workgroup
- number of registers per core
If the GPU doesn't provide this information (registers are zero), we
pick the per-arch defaults we had in panfrost_max_thread_count().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26358>
This flag reflects whether the backend should track the VM activity or
not. Needed if PAN_KMOD_VM_OP_MODE_DEFER_TO_NEXT_IDLE_POINT is used, so
we can insert proper dependencies on our VM operations.
Patch the gallium driver to set this flag at VM creation time since it
calls pan_kmod_vm_bind() with
mode=PAN_KMOD_VM_OP_MODE_DEFER_TO_NEXT_IDLE_POINT in the BO
destruction path.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26358>