Commit Graph

30079 Commits

Author SHA1 Message Date
Ilia Mirkin 3f8b886e73 nv50,nvc0: use alternate samplers for stencil
The blob uses these, and it fixes a bunch of dEQP stencil sampling tests
involving border colors. Probably the Z-based samplers work somehow
differently wrt border colors when using the stencil swizzle.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-12 18:22:17 -05:00
Wladimir J. van der Laan 55e00c7cfe etnaviv: Set shader instruction area correctly for GC3000
- Use the same instruction area on GC3000 as the Vivante driver.
  This allows the same number of instructions on GC3000 as GC2000
  instead of half.

- Makes sure that the "PE to FE" stall before updating the shader code
  or constants is hit (which is conditional on vs_offset > 0x4000). This
  is necessary on GC3000 too, it increases stability.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-02-12 20:42:37 +01:00
Wladimir J. van der Laan 0fe60e4fcc etnaviv: Update hw header files
Update from etnaviv repository rnndb. This adds some newly
discovered state for GC3000 (and some GC2000) features.

Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com>
Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>
2017-02-12 20:38:56 +01:00
Ilia Mirkin 48f04862c1 nvc0: set the render condition in the compute object
Fixes GL45-CTS.compute_shader.conditional-dispatching

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-02-11 21:06:52 -05:00
Ilia Mirkin 7e75f0913a gm107/ir: fix address offset bitfield for ATOMS
Fixes GL45-CTS.compute_shader.atomic-case1 on Maxwell

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: mesa-stable@lists.freedesktop.org
2017-02-11 21:06:41 -05:00
Ilia Mirkin b38aab50a0 nv50/ir: convert an ATOM.EXCH without a destination into a store
On SM35 there does not appear to be a way to emit a ATOM.EXCH with a
null destination. This should be functionally equivalent to a plain
store however, so just do that.

Fixes GL45-CTS.compute_shader.atomic-case2 on SM35.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-11 20:25:26 -05:00
Ilia Mirkin 2b0580123e nvc0: fix 64-bit integer query buffer writes
The former logic just plain didn't work at all. We need to write the
subsequent dword to the next buffer location.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-11 20:25:26 -05:00
Ilia Mirkin 399e267f0e nv50/ir: return a register when retrieving thread id sysval
We have logic to short-circuit such retrievals to zero. However "zero"
was an immediate, and some logic expected to get registers (to later be
propagated). Fix this by using loadImm.

Fixes GL45-CTS.gpu_shader5.images_array_indexing

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-11 20:25:26 -05:00
Ilia Mirkin 0d1edb01ec nv50/ir: add missing break after DSSG
Recently broken during int64 addition.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-11 17:21:55 -05:00
Christian Gmeiner 137ad879d5 etnaviv: shader-db traces
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>
2017-02-11 21:22:53 +01:00
Christian Gmeiner 7256ed3c79 etnaviv: keep track of emitted loops
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>
2017-02-11 21:22:48 +01:00
Christian Gmeiner 5a3ea68895 etnaviv: wire up core pipe_debug_callback
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
2017-02-11 21:22:42 +01:00
Eric Anholt 0514b0bdc9 vc4: Enable glSampleMask() even when !rasterizer->multisample.
gallium's blitter expects that it can set the sample mask even when the
rasterizer doesn't have the flag on.

Between this and the previous test, 10 new ext_framebuffer_multisample
tests start passing.
2017-02-10 14:17:05 -08:00
Eric Anholt 5c86f119b9 vc4: Respect glSampleMask() even when we're not writing color.
gallium's quad-based blitter for copying MSAA depth textures expects to be
able to do 4 passes updating a sample at a time using glSampleMask, and
there's no color buffer bound when it's doing that.
2017-02-10 14:17:04 -08:00
Eric Anholt 30237193f5 vc4: Use the nir_builder helper for loading sample mask. 2017-02-10 14:17:04 -08:00
Eric Anholt ce538a443d vc4: Use accurate 1/w in coordinate shader as well as vert shader.
We probably shouldn't be emitting different scaled viewport coordinates
between vertex and coord.
2017-02-10 14:17:04 -08:00
Eric Anholt a0b6841838 vc4: Drop VS inputs to 8.
In the hardware we only get to declare 8 vertex elements (GLES2's
minimum), so we should be exposing that number here.  Fixes an assertion
failure in piglit texrect-many, at the expense of various GL 2.0-ish
minmax tests now complaining that our count is too low.
2017-02-10 14:17:04 -08:00
Eric Anholt b230939303 vc4: Avoid emitting small immediates for UBO indirect load address guards.
The kernel will reject our shader if we emit one here, and having 4, 8, or
12 as the top end of our UBO clamp rare is enough that it's not worth
making the kernel let us.

Fixes piglit fs-const-array-of-struct and
fs-const-array-of-struct-of-array since recent GLSL linking changes made
us get this as an indirect load of a uniform, instead of a tempoary.

Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
2017-02-10 14:17:04 -08:00
Emil Velikov 463236bd31 st/nine: update configure options in the README
Cc: Axel Davy <axel.davy@ens.fr>
Signed-off-by: Emil Velikov <emil.velikov@collabora.com>
2017-02-10 11:47:24 +00:00
Marek Olšák 43a2ba1b7d gallium/radeon: use staging for texture read mappings from GTT WC
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák dc7483f445 gallium/radeon: ignore the level parameter in buffer_transfer_map
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák d86099df0a gallium/radeon: fix performance of buffer readbacks
We want cached GTT for all non-persistent read mappings.
Set level = 0 on purpose.

Use dma_copy, because resource_copy_region causes a failure in the PBO
read of piglit/getteximage-luminance.

If Rocket League used the READ flag, it should get cached GTT.

v2: mask out UNSYNCHRONIZED

Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 24e3b06408 radeonsi: align vertex buffer descriptor list size for optimal prefetch
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 3a534c5c7d radeonsi: align shader binaries to CP DMA alignment for optimal prefetch
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 1a392a4377 radeonsi: move CP_DMA_ALIGNMENT definition
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 4c288c73ea radeonsi: remove SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER
not necessary

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 65df38b191 radeonsi: remove separate CB/DB_META flush flags
not used separately

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 8a2ae4153b radeonsi: reduce the number of FMASK input coordinates
Before:
  image_load v3, v[0:3] ...
After:
  image_load v3, v[0:1] ...

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 28c06b3ceb radeonsi: write shader asm annotated with wave info into GPU hang reports
Note that the disassembly is written twice - first the unmodified compiler
output and then the wave-annotated output only if there are waves executing
the shader.

Sample output from a real GPU hang most likely caused by image_sample:

The number of active waves = 28

Pixel Shader - annotated disassembly:
    s_mov_b64 s[6:7], exec                                        ; BE86017E [PC=0x10f3e3800, off=0, size=4]
    s_wqm_b64 exec, exec                                          ; BEFE077E [PC=0x10f3e3804, off=4, size=4]
...
    image_sample v[7:9], v[0:1], s[12:19], s[20:23] dmask:0x7     ; F0800700 00A30700 [PC=0x10f3e3a94, off=660, size=8]
    s_buffer_load_dword s20, s[0:3], 0x50                         ; C0220500 00000050 [PC=0x10f3e3a9c, off=668, size=8]
    s_load_dwordx4 s[24:27], s[4:5], 0x170                        ; C00A0602 00000170 [PC=0x10f3e3aa4, off=676, size=8]
    s_load_dwordx8 s[12:19], s[4:5], 0x140                        ; C00E0302 00000140 [PC=0x10f3e3aac, off=684, size=8]
    s_buffer_load_dword s11, s[0:3], 0x5c                         ; C02202C0 0000005C [PC=0x10f3e3ab4, off=692, size=8]
    s_buffer_load_dword s21, s[0:3], 0x54                         ; C0220540 00000054 [PC=0x10f3e3abc, off=700, size=8]
    s_buffer_load_dword s22, s[0:3], 0x58                         ; C0220580 00000058 [PC=0x10f3e3ac4, off=708, size=8]
    s_waitcnt vmcnt(0)                                            ; BF8C0F70 [PC=0x10f3e3acc, off=716, size=4]
          ^ SE0 SH0 CU1 SIMD1 WAVE0  EXEC=aaaaaaa555aaaaaa  INST32=BF8C0F70
          ^ SE0 SH0 CU1 SIMD2 WAVE0  EXEC=aaaa85555555552a  INST32=BF8C0F70
          ^ SE0 SH0 CU1 SIMD3 WAVE0  EXEC=000000000000000a  INST32=BF8C0F70
          ^ SE0 SH0 CU6 SIMD1 WAVE0  EXEC=25a5a5aa82aaaaaa  INST32=BF8C0F70
          ^ SE0 SH0 CU6 SIMD3 WAVE0  EXEC=50aaaa8fffa55555  INST32=BF8C0F70
          ^ SE0 SH0 CU7 SIMD0 WAVE0  EXEC=5554aaaaaaa1a555  INST32=BF8C0F70
          ^ SE0 SH0 CU7 SIMD0 WAVE1  EXEC=aaaa5555ffffffff  INST32=BF8C0F70
          ^ SE0 SH0 CU7 SIMD1 WAVE0  EXEC=555557aaaaaaaaa5  INST32=BF8C0F70
          ^ SE0 SH0 CU7 SIMD3 WAVE0  EXEC=5555aaaaaaaaaa85  INST32=BF8C0F70
          ^ SE1 SH0 CU3 SIMD1 WAVE0  EXEC=aaaaaaaaaaaaaaaa  INST32=BF8C0F70
          ^ SE1 SH0 CU4 SIMD0 WAVE0  EXEC=aaaaaaaa5a5a5a5a  INST32=BF8C0F70
          ^ SE1 SH0 CU4 SIMD1 WAVE0  EXEC=aaaaaaa5a5a5a4a5  INST32=BF8C0F70
          ^ SE1 SH0 CU4 SIMD2 WAVE0  EXEC=5555555000000000  INST32=BF8C0F70
          ^ SE1 SH0 CU4 SIMD3 WAVE0  EXEC=aa555554155aaaaa  INST32=BF8C0F70
          ^ SE1 SH0 CU5 SIMD0 WAVE0  EXEC=55ffff55555555aa  INST32=BF8C0F70
          ^ SE1 SH0 CU5 SIMD1 WAVE0  EXEC=555555555aaaaaaa  INST32=BF8C0F70
          ^ SE1 SH0 CU5 SIMD2 WAVE0  EXEC=a0aaaaaaa8555555  INST32=BF8C0F70
          ^ SE1 SH0 CU5 SIMD3 WAVE0  EXEC=8aaaaaaaaaaaa555  INST32=BF8C0F70
          ^ SE1 SH0 CU6 SIMD0 WAVE0  EXEC=000000002aaaaaaa  INST32=BF8C0F70
          ^ SE2 SH0 CU1 SIMD0 WAVE0  EXEC=5aaaa5400aaaa15a  INST32=BF8C0F70
          ^ SE2 SH0 CU1 SIMD1 WAVE0  EXEC=00aaaaaaaa5555aa  INST32=BF8C0F70
          ^ SE2 SH0 CU1 SIMD2 WAVE0  EXEC=aa00005555554555  INST32=BF8C0F70
          ^ SE2 SH0 CU1 SIMD3 WAVE0  EXEC=aaaaaaa000000000  INST32=BF8C0F70
          ^ SE3 SH0 CU4 SIMD0 WAVE0  EXEC=5555aaaaaaaaaaaa  INST32=BF8C0F70
          ^ SE3 SH0 CU4 SIMD2 WAVE0  EXEC=ffaaaaaaaaaa5555  INST32=BF8C0F70
          ^ SE3 SH0 CU4 SIMD3 WAVE0  EXEC=aaaa55555555aa00  INST32=BF8C0F70
          ^ SE3 SH0 CU5 SIMD0 WAVE0  EXEC=00aaaaaaaaaaaa5a  INST32=BF8C0F70
          ^ SE3 SH0 CU5 SIMD1 WAVE0  EXEC=5a555555005555ff  INST32=BF8C0F70
    v_mul_f32_e32 v7, s6, v7                                      ; 0A0E0E06 [PC=0x10f3e3ad0, off=720, size=4]
...

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marek Olšák 3de8c5a3c5 radeonsi: write wave information into GPU hang reports
UMR is our new debugging tool. It must have +s set for Mesa to use it
without root privileges:
  sudo chmod +s .../umr

Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-10 11:27:50 +01:00
Marc-André Lureau dc2d9b8da1 tgsi-dump: dump label if instruction has one
The instruction has an associated label when Instruction.Label == 1,
as can be seen in ureg_emit_label() or tgsi_build_full_instruction().

This fixes dump generating extra :0 labels on conditionals, and virgl
parsing more than the expected tokens and eventually reaching "Illegal
command buffer" (when parsing more than a safety margin of 10 we
currently have).

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-10 12:46:33 +10:00
Marc-André Lureau bd1cab1168 tgsi: remove ureg_label_insn
Unused since commit 2897cb3dba.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2017-02-10 12:46:23 +10:00
Ilia Mirkin c95f821cb4 nvc0/ir: fix ubo max clamp, reset file index
We just increased the max UBO, so we should also increase the clamp that
we do for robustness. Similarly, as we're including the fileIndex in the
new indirect value, we should reset fileIndex to 0 so that it is not
added in a second time.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-02-09 15:50:58 -05:00
Ilia Mirkin e4a698cb97 nv50/ir: always return 0 when trying to read thread id along unit dim
Many many many compute shaders only define a 1- or 2-dimensional block,
but then continue to use system values that take the full 3d into
account (like gl_LocalInvocationIndex, etc). So for the special case
that a dimension is exactly 1, we know that the thread id along that
axis will always be 0, so return it as such and allow constant folding
to fix things up.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Pierre Moreau <pierre.morrow@free.fr>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2017-02-09 15:15:36 -05:00
Ilia Mirkin 1acdd62847 nvc0/ir: fix robustness guarantees for constbuf loads on kepler+ compute
Kepler and up unfortunately only support up to 8 constbufs. We work
around this by loading from constbufs as if they were storage buffers.
However we were not consistently applying limits to loads from these
buffers. Make sure to do the same thing we do for storage buffers.

Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-02-09 15:15:22 -05:00
Ilia Mirkin 59ca352fc5 nvc0: increase number of ubo binding points
Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but
it's unclear what it refers to). We need to expose an extra binding
point for the "program parameters", which means this must be 15. Remove
the last vestige of the "use c14 for immediates" idea.

Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
2017-02-09 15:15:08 -05:00
Ilia Mirkin 1e4f5988ed nvc0: expose int64
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:49 -05:00
Ilia Mirkin ab00a41a6e nvc0/ir: make it possible to have the flags def in def0
There's all kinds of logic that doesn't like there being holes in defs
or srcs lists. Avoid them. This also fixes the sched logic for maxwell.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin 61d7676df7 nvc0/ir: add support for 64-bit shift lowering on SM20/SM30
Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have
to do a bit more work to get the job done.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin 1aefd6159c nvc0/ir: add support for all the new int64 tgsi opcodes
A few thoughts:
 - Some of that LegalizeSSA logic should really live much earlier and be
   subject to the likes of DCE and other useful passes
 - Some of the "lowering" done in from_tgsi should be done later so that
   proper optimization might be done.

However this all works and the above can be improved upon later.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Pierre Moreau 009c54aa7a nv50/ir: Split 64-bit integer MAD/MUL operations
Hardware does not support 64-bit integers MAD and MUL operations, so we need
to transform them in 32-bit operations.

Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>
2017-02-09 12:57:48 -05:00
Ilia Mirkin 22c705ea8c nvc0/ir: add a "high" subop for shifts, emit shf.l/shf.r for 64-bit
Note that this is not available for SM20/SM30.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin 2e986fa806 nvc0/ir: fix SET and SLCT emission
We were never emitting a .X flag for consuming condition code on SET,
and weren't emitting a signed type for SLCT comparison. Discovered while
working on int64 logic.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin eac5099c11 nvc0/ir: add support for emitting partial min/max ops for int64
These operations allow you to compute min/max on arbitrary-width
integers, 32 bits at a time.

Note that the low/med ops implicitly set the condition code, and the
med/high ops implicitly consume it.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2017-02-09 12:57:48 -05:00
Ilia Mirkin b090033087 gallium: add separate PIPE_CAP_INT64_DIVMOD
Nouveau does not currently have logic to implement this as a library
function. Even though such a library could be written, there's no big
advantage to do it that way for now given that int64 is a very uncommon
use-case. Allow a driver to expose INT64 without supporting division and
modulo operations.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
2017-02-09 12:57:21 -05:00
Tim Rowley c1aa444a3e swr: [rasterizer jitter] Pass LLVM-IR size into jitter
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:58:13 -06:00
Tim Rowley e0a829d320 swr: [rasterizer core] Frontend SIMD16 WIP
Removed temporary scafolding in PA, widended the PA_STATE interface
for SIMD16, and implemented PA_STATE_CUT and PA_TESS for SIMD16.

PA_STATE_CUT and PA_TESS now work in SIMD16.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:58:06 -06:00
Tim Rowley 79174e52b5 swr: [rasterizer jitter] Disable unsafe FP optimizations in the jitter
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:58:00 -06:00
Tim Rowley db599e316a swr: [rasterizer core] Frontend SIMD16 WIP
Widen simdvertex to SIMD16/simd16vertex in frontend for passing VS
attributes from VS to PA.

Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:52 -06:00
Tim Rowley 09c54cfd2d swr: [rasterizer jitter] Add DEBUGTRAP jit builder function
Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>
2017-02-08 13:57:47 -06:00