So far we were packing by hand unnormalized coordinates at the
V3D41_TMU_CONFIG_PARAMETER_1 pack structure. To get this working we
hardcoded V3D_VERSION to 41 at v3dv_uniforms, that works for v71
because the structure are the same. But that is somewhat ugly, and
will not work if a new hw generation have a different structure.
Additionally, we found that for v3d this will be also needed.
So this commit adds a helper on the compiler. For now, and to simplify
it also use just one method for both generations. This solves the
problem of the same code needed on both v3d and v3dv.
But the idea is that in the future we need a similar need, but the
structure different on each generation, it would have used a similar
approach to other generation dependent function calls (like
v3d40_vir_emit_tex), having the implementation on a source file that
can safely include the hw generation headers.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25544>
V3D hardware doesn't support vector access for general TMU load/store
operations like the ones we use for UBO and SSBO, so we need to split
these to scalar operations.
It should be noted that we also have a vectorization pass (which runs
later, during optimization), that may reconstruct some of these into
32-bit operations when possible (i.e. when the resulting operation
is 32-bit aligned).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14648>
Currently this is useful to clean up after DCEing leading ldunifa
instructions, but it can be expanded to handle more cases which
may allow to simplify the compiler code in places where we have
been trying to optimize manually for similar cases.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9128>
When line smoothing is enabled, the driver now increases the width of
the line so that it can add some semi-transparent pixels to either side
of the line. A lowering pass is added which modifies the alpha component
of every write to fragment output 0 so that if the fragment is outside
the width of the line then the alpha is reduced. It additionally
discards fragments that are completely invisible. It might seem bad to
use discard on a tiled renderer but the assumption is that any bad
effects from using discard will also happen anyway because of enabling
alpha blending.
v2: Disable the line smoothing pass entirely when the framebuffer
contains an integer colour output or one with no alpha channel.
Calculate the coverage once upfront and store in a global variable
instead of calculating each time an output write is modified. Also
do the conditional discard once upfront.
v3: Don’t check whether the output buffer has an alpha channel. Only
look at output 0. Use aa_line_width intrinsic instead of calculating
the real line width in the shader. Clamp the coverage as part of the
global variable, not per output write.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5624>
This implements support for OpenGL logic operations by emitting code to read
from the TLB if needed and blending the fragment output accordingly. It is
similar to VC4's blend lowering pass, but exclusive to logic operations, since
blending is otherwise supported in hardware.
The pass doesn't handle MSAA targets yet.
Fixes the following piglit tests:
spec/!opengl 1.0/gl-1.0-logicop/*
spec/!opengl 1.1/gl-1.1-xor
spec/!opengl 1.1/gl-1.1-xor-copypixels
It also fixes text cursor rendering in Libreoffice with the GTK+2 theme, which
is rendered via glamor using the XOR logic operation.
v2: fix checks for allowed variable location and maximum render target (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
We can use the same register spilling infrastructure for our loads/stores
of indirect access of temp variables, instead of doing an if ladder.
Cuts 50% of instructions and max-temps from 2 KSP shaders in shader-db.
Also causes several other KSP shaders with large bodies and large loop
counts to not be force-unrolled.
The change was originally motivated by NOLTIS slightly modifying register
pressure in piglit temp mat4 array read/write tests, triggering register
allocation failures.
Our exec masking introduces lots of redundant flags updates, and even
without that there will be cases where NIR comparisons on the same sources
for different reasons may generate the same comparison instruction before
the selection.
total instructions in shared programs: 6492930 -> 6460934 (-0.49%)
total uniforms in shared programs: 2117460 -> 2115106 (-0.11%)
total spills in shared programs: 4983 -> 4987 (0.08%)
total fills in shared programs: 6408 -> 6416 (0.12%)
The idea is that for repeated use of the same uniform, we could avoid
loading it on each consumer. The results look pretty good.
total instructions in shared programs: 6413571 -> 6521464 (1.68%)
total threads in shared programs: 154214 -> 154000 (-0.14%)
total uniforms in shared programs: 2393604 -> 2119629 (-11.45%)
total spills in shared programs: 4960 -> 4984 (0.48%)
total fills in shared programs: 6350 -> 6418 (1.07%)
Once we do scheduling at the NIR level, the register pressure (and thus
also instructions) issues we see here will drop back down.
This is only exposed on V3D 4.1+, because we didn't have the TMU write
operations for images on 3.3 (To do GLES 3.1 there, you have to lower it
to SSBO load/stores, which is a problem to solve later).
We can do one per instruction, and we have to be careful not to overwrite
raddr_b, but this greatly reduces the pressure on uniform loads
(particularly around ldvpm/stvpm instructions).
total instructions in shared programs: 90768 -> 88220 (-2.81%)
instructions in affected programs: 82711 -> 80163 (-3.08%)
This creates two new internal dependencies, idep_nir_headers and
idep_nir. The former encapsulates the generation of nir_opcodes.h and
nir_builder_opcodes.h and adding src/compiler/nir as an include path.
This ensures that any target that needs nir headers will have the
includes and that the generated headers will be generated before the
target is build. The second, idep_nir, includes the first and
additionally links to libnir.
This is intended to make it easier to avoid race conditions in the build
when using nir, since the number of consumers for libnir and it's
headers are quite high.
Acked-by: Eric Engestrom <eric.engestrom@imgtec.com>
Signed-off-by: Dylan Baker <dylan.c.baker@intel.com>
The HW has no native sampler support for multisample textures, but since
we only need to support txf_ms and the layout is UIF, we just need to
scale up the texcoords and then add in the sample.
This drops the old TEXTURE_MSAA_ADDR special uniform, since we're treating
MSAA textures as textures, rather than basically texbos like VC4 had to.
v2: Default vc5 to off, since it requires the simulator currently. Add
missing dep on the XML generation from libbroadcom_vc5.
Reviewed-by: Dylan Baker <dylan@pnwbakers.com> (v1)