The blob uses these, and it fixes a bunch of dEQP stencil sampling tests
involving border colors. Probably the Z-based samplers work somehow
differently wrt border colors when using the stencil swizzle.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
- Use the same instruction area on GC3000 as the Vivante driver.
This allows the same number of instructions on GC3000 as GC2000
instead of half.
- Makes sure that the "PE to FE" stall before updating the shader code
or constants is hit (which is conditional on vs_offset > 0x4000). This
is necessary on GC3000 too, it increases stability.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Update from etnaviv repository rnndb. This adds some newly
discovered state for GC3000 (and some GC2000) features.
Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
On SM35 there does not appear to be a way to emit a ATOM.EXCH with a
null destination. This should be functionally equivalent to a plain
store however, so just do that.
Fixes GL45-CTS.compute_shader.atomic-case2 on SM35.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
The former logic just plain didn't work at all. We need to write the
subsequent dword to the next buffer location.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
We have logic to short-circuit such retrievals to zero. However "zero"
was an immediate, and some logic expected to get registers (to later be
propagated). Fix this by using loadImm.
Fixes GL45-CTS.gpu_shader5.images_array_indexing
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
gallium's blitter expects that it can set the sample mask even when the
rasterizer doesn't have the flag on.
Between this and the previous test, 10 new ext_framebuffer_multisample
tests start passing.
gallium's quad-based blitter for copying MSAA depth textures expects to be
able to do 4 passes updating a sample at a time using glSampleMask, and
there's no color buffer bound when it's doing that.
In the hardware we only get to declare 8 vertex elements (GLES2's
minimum), so we should be exposing that number here. Fixes an assertion
failure in piglit texrect-many, at the expense of various GL 2.0-ish
minmax tests now complaining that our count is too low.
The kernel will reject our shader if we emit one here, and having 4, 8, or
12 as the top end of our UBO clamp rare is enough that it's not worth
making the kernel let us.
Fixes piglit fs-const-array-of-struct and
fs-const-array-of-struct-of-array since recent GLSL linking changes made
us get this as an indirect load of a uniform, instead of a tempoary.
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
We want cached GTT for all non-persistent read mappings.
Set level = 0 on purpose.
Use dma_copy, because resource_copy_region causes a failure in the PBO
read of piglit/getteximage-luminance.
If Rocket League used the READ flag, it should get cached GTT.
v2: mask out UNSYNCHRONIZED
Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
UMR is our new debugging tool. It must have +s set for Mesa to use it
without root privileges:
sudo chmod +s .../umr
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
The instruction has an associated label when Instruction.Label == 1,
as can be seen in ureg_emit_label() or tgsi_build_full_instruction().
This fixes dump generating extra :0 labels on conditionals, and virgl
parsing more than the expected tokens and eventually reaching "Illegal
command buffer" (when parsing more than a safety margin of 10 we
currently have).
Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We just increased the max UBO, so we should also increase the clamp that
we do for robustness. Similarly, as we're including the fileIndex in the
new indirect value, we should reset fileIndex to 0 so that it is not
added in a second time.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Many many many compute shaders only define a 1- or 2-dimensional block,
but then continue to use system values that take the full 3d into
account (like gl_LocalInvocationIndex, etc). So for the special case
that a dimension is exactly 1, we know that the thread id along that
axis will always be 0, so return it as such and allow constant folding
to fix things up.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Kepler and up unfortunately only support up to 8 constbufs. We work
around this by loading from constbufs as if they were storage buffers.
However we were not consistently applying limits to loads from these
buffers. Make sure to do the same thing we do for storage buffers.
Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but
it's unclear what it refers to). We need to expose an extra binding
point for the "program parameters", which means this must be 15. Remove
the last vestige of the "use c14 for immediates" idea.
Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
There's all kinds of logic that doesn't like there being holes in defs
or srcs lists. Avoid them. This also fixes the sched logic for maxwell.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have
to do a bit more work to get the job done.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
A few thoughts:
- Some of that LegalizeSSA logic should really live much earlier and be
subject to the likes of DCE and other useful passes
- Some of the "lowering" done in from_tgsi should be done later so that
proper optimization might be done.
However this all works and the above can be improved upon later.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Hardware does not support 64-bit integers MAD and MUL operations, so we need
to transform them in 32-bit operations.
Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
We were never emitting a .X flag for consuming condition code on SET,
and weren't emitting a signed type for SLCT comparison. Discovered while
working on int64 logic.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
These operations allow you to compute min/max on arbitrary-width
integers, 32 bits at a time.
Note that the low/med ops implicitly set the condition code, and the
med/high ops implicitly consume it.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Nouveau does not currently have logic to implement this as a library
function. Even though such a library could be written, there's no big
advantage to do it that way for now given that int64 is a very uncommon
use-case. Allow a driver to expose INT64 without supporting division and
modulo operations.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Removed temporary scafolding in PA, widended the PA_STATE interface
for SIMD16, and implemented PA_STATE_CUT and PA_TESS for SIMD16.
PA_STATE_CUT and PA_TESS now work in SIMD16.
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>