AlexIndustrial/mesa

Author	SHA1	Message	Date
Ilia Mirkin	3f8b886e73	nv50,nvc0: use alternate samplers for stencil The blob uses these, and it fixes a bunch of dEQP stencil sampling tests involving border colors. Probably the Z-based samplers work somehow differently wrt border colors when using the stencil swizzle. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-12 18:22:17 -05:00
Wladimir J. van der Laan	55e00c7cfe	etnaviv: Set shader instruction area correctly for GC3000 - Use the same instruction area on GC3000 as the Vivante driver. This allows the same number of instructions on GC3000 as GC2000 instead of half. - Makes sure that the "PE to FE" stall before updating the shader code or constants is hit (which is conditional on vs_offset > 0x4000). This is necessary on GC3000 too, it increases stability. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2017-02-12 20:42:37 +01:00
Wladimir J. van der Laan	0fe60e4fcc	etnaviv: Update hw header files Update from etnaviv repository rnndb. This adds some newly discovered state for GC3000 (and some GC2000) features. Signed-off-by: Wladimir J. van der Laan <laanwj@gmail.com> Acked-by: Christian Gmeiner <christian.gmeiner@gmail.com>	2017-02-12 20:38:56 +01:00
Ilia Mirkin	48f04862c1	nvc0: set the render condition in the compute object Fixes GL45-CTS.compute_shader.conditional-dispatching Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2017-02-11 21:06:52 -05:00
Ilia Mirkin	7e75f0913a	gm107/ir: fix address offset bitfield for ATOMS Fixes GL45-CTS.compute_shader.atomic-case1 on Maxwell Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Cc: mesa-stable@lists.freedesktop.org	2017-02-11 21:06:41 -05:00
Ilia Mirkin	b38aab50a0	nv50/ir: convert an ATOM.EXCH without a destination into a store On SM35 there does not appear to be a way to emit a ATOM.EXCH with a null destination. This should be functionally equivalent to a plain store however, so just do that. Fixes GL45-CTS.compute_shader.atomic-case2 on SM35. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-11 20:25:26 -05:00
Ilia Mirkin	2b0580123e	nvc0: fix 64-bit integer query buffer writes The former logic just plain didn't work at all. We need to write the subsequent dword to the next buffer location. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-11 20:25:26 -05:00
Ilia Mirkin	399e267f0e	nv50/ir: return a register when retrieving thread id sysval We have logic to short-circuit such retrievals to zero. However "zero" was an immediate, and some logic expected to get registers (to later be propagated). Fix this by using loadImm. Fixes GL45-CTS.gpu_shader5.images_array_indexing Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-11 20:25:26 -05:00
Ilia Mirkin	0d1edb01ec	nv50/ir: add missing break after DSSG Recently broken during int64 addition. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-11 17:21:55 -05:00
Christian Gmeiner	137ad879d5	etnaviv: shader-db traces Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-By: Wladimir J. van der Laan <laanwj@gmail.com>	2017-02-11 21:22:53 +01:00
Christian Gmeiner	7256ed3c79	etnaviv: keep track of emitted loops Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de> Reviewed-by: Wladimir J. van der Laan <laanwj@gmail.com>	2017-02-11 21:22:48 +01:00
Christian Gmeiner	5a3ea68895	etnaviv: wire up core pipe_debug_callback Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com> Reviewed-by: Lucas Stach <l.stach@pengutronix.de>	2017-02-11 21:22:42 +01:00
Eric Anholt	0514b0bdc9	vc4: Enable glSampleMask() even when !rasterizer->multisample. gallium's blitter expects that it can set the sample mask even when the rasterizer doesn't have the flag on. Between this and the previous test, 10 new ext_framebuffer_multisample tests start passing.	2017-02-10 14:17:05 -08:00
Eric Anholt	5c86f119b9	vc4: Respect glSampleMask() even when we're not writing color. gallium's quad-based blitter for copying MSAA depth textures expects to be able to do 4 passes updating a sample at a time using glSampleMask, and there's no color buffer bound when it's doing that.	2017-02-10 14:17:04 -08:00
Eric Anholt	30237193f5	vc4: Use the nir_builder helper for loading sample mask.	2017-02-10 14:17:04 -08:00
Eric Anholt	ce538a443d	vc4: Use accurate 1/w in coordinate shader as well as vert shader. We probably shouldn't be emitting different scaled viewport coordinates between vertex and coord.	2017-02-10 14:17:04 -08:00
Eric Anholt	a0b6841838	vc4: Drop VS inputs to 8. In the hardware we only get to declare 8 vertex elements (GLES2's minimum), so we should be exposing that number here. Fixes an assertion failure in piglit texrect-many, at the expense of various GL 2.0-ish minmax tests now complaining that our count is too low.	2017-02-10 14:17:04 -08:00
Eric Anholt	b230939303	vc4: Avoid emitting small immediates for UBO indirect load address guards. The kernel will reject our shader if we emit one here, and having 4, 8, or 12 as the top end of our UBO clamp rare is enough that it's not worth making the kernel let us. Fixes piglit fs-const-array-of-struct and fs-const-array-of-struct-of-array since recent GLSL linking changes made us get this as an indirect load of a uniform, instead of a tempoary. Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org>	2017-02-10 14:17:04 -08:00
Emil Velikov	463236bd31	st/nine: update configure options in the README Cc: Axel Davy <axel.davy@ens.fr> Signed-off-by: Emil Velikov <emil.velikov@collabora.com>	2017-02-10 11:47:24 +00:00
Marek Olšák	43a2ba1b7d	gallium/radeon: use staging for texture read mappings from GTT WC Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	dc7483f445	gallium/radeon: ignore the level parameter in buffer_transfer_map Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	d86099df0a	gallium/radeon: fix performance of buffer readbacks We want cached GTT for all non-persistent read mappings. Set level = 0 on purpose. Use dma_copy, because resource_copy_region causes a failure in the PBO read of piglit/getteximage-luminance. If Rocket League used the READ flag, it should get cached GTT. v2: mask out UNSYNCHRONIZED Cc: 13.0 17.0 <mesa-stable@lists.freedesktop.org> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	24e3b06408	radeonsi: align vertex buffer descriptor list size for optimal prefetch Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	3a534c5c7d	radeonsi: align shader binaries to CP DMA alignment for optimal prefetch Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	1a392a4377	radeonsi: move CP_DMA_ALIGNMENT definition Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	4c288c73ea	radeonsi: remove SI_CONTEXT_FLUSH_AND_INV_FRAMEBUFFER not necessary Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	65df38b191	radeonsi: remove separate CB/DB_META flush flags not used separately Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	8a2ae4153b	radeonsi: reduce the number of FMASK input coordinates Before: image_load v3, v[0:3] ... After: image_load v3, v[0:1] ... Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	28c06b3ceb	radeonsi: write shader asm annotated with wave info into GPU hang reports Note that the disassembly is written twice - first the unmodified compiler output and then the wave-annotated output only if there are waves executing the shader. Sample output from a real GPU hang most likely caused by image_sample: The number of active waves = 28 Pixel Shader - annotated disassembly: s_mov_b64 s[6:7], exec ; BE86017E [PC=0x10f3e3800, off=0, size=4] s_wqm_b64 exec, exec ; BEFE077E [PC=0x10f3e3804, off=4, size=4] ... image_sample v[7:9], v[0:1], s[12:19], s[20:23] dmask:0x7 ; F0800700 00A30700 [PC=0x10f3e3a94, off=660, size=8] s_buffer_load_dword s20, s[0:3], 0x50 ; C0220500 00000050 [PC=0x10f3e3a9c, off=668, size=8] s_load_dwordx4 s[24:27], s[4:5], 0x170 ; C00A0602 00000170 [PC=0x10f3e3aa4, off=676, size=8] s_load_dwordx8 s[12:19], s[4:5], 0x140 ; C00E0302 00000140 [PC=0x10f3e3aac, off=684, size=8] s_buffer_load_dword s11, s[0:3], 0x5c ; C02202C0 0000005C [PC=0x10f3e3ab4, off=692, size=8] s_buffer_load_dword s21, s[0:3], 0x54 ; C0220540 00000054 [PC=0x10f3e3abc, off=700, size=8] s_buffer_load_dword s22, s[0:3], 0x58 ; C0220580 00000058 [PC=0x10f3e3ac4, off=708, size=8] s_waitcnt vmcnt(0) ; BF8C0F70 [PC=0x10f3e3acc, off=716, size=4] ^ SE0 SH0 CU1 SIMD1 WAVE0 EXEC=aaaaaaa555aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD2 WAVE0 EXEC=aaaa85555555552a INST32=BF8C0F70 ^ SE0 SH0 CU1 SIMD3 WAVE0 EXEC=000000000000000a INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD1 WAVE0 EXEC=25a5a5aa82aaaaaa INST32=BF8C0F70 ^ SE0 SH0 CU6 SIMD3 WAVE0 EXEC=50aaaa8fffa55555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE0 EXEC=5554aaaaaaa1a555 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD0 WAVE1 EXEC=aaaa5555ffffffff INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD1 WAVE0 EXEC=555557aaaaaaaaa5 INST32=BF8C0F70 ^ SE0 SH0 CU7 SIMD3 WAVE0 EXEC=5555aaaaaaaaaa85 INST32=BF8C0F70 ^ SE1 SH0 CU3 SIMD1 WAVE0 EXEC=aaaaaaaaaaaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD0 WAVE0 EXEC=aaaaaaaa5a5a5a5a INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD1 WAVE0 EXEC=aaaaaaa5a5a5a4a5 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD2 WAVE0 EXEC=5555555000000000 INST32=BF8C0F70 ^ SE1 SH0 CU4 SIMD3 WAVE0 EXEC=aa555554155aaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD0 WAVE0 EXEC=55ffff55555555aa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD1 WAVE0 EXEC=555555555aaaaaaa INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD2 WAVE0 EXEC=a0aaaaaaa8555555 INST32=BF8C0F70 ^ SE1 SH0 CU5 SIMD3 WAVE0 EXEC=8aaaaaaaaaaaa555 INST32=BF8C0F70 ^ SE1 SH0 CU6 SIMD0 WAVE0 EXEC=000000002aaaaaaa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD0 WAVE0 EXEC=5aaaa5400aaaa15a INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD1 WAVE0 EXEC=00aaaaaaaa5555aa INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD2 WAVE0 EXEC=aa00005555554555 INST32=BF8C0F70 ^ SE2 SH0 CU1 SIMD3 WAVE0 EXEC=aaaaaaa000000000 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD0 WAVE0 EXEC=5555aaaaaaaaaaaa INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD2 WAVE0 EXEC=ffaaaaaaaaaa5555 INST32=BF8C0F70 ^ SE3 SH0 CU4 SIMD3 WAVE0 EXEC=aaaa55555555aa00 INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD0 WAVE0 EXEC=00aaaaaaaaaaaa5a INST32=BF8C0F70 ^ SE3 SH0 CU5 SIMD1 WAVE0 EXEC=5a555555005555ff INST32=BF8C0F70 v_mul_f32_e32 v7, s6, v7 ; 0A0E0E06 [PC=0x10f3e3ad0, off=720, size=4] ... Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marek Olšák	3de8c5a3c5	radeonsi: write wave information into GPU hang reports UMR is our new debugging tool. It must have +s set for Mesa to use it without root privileges: sudo chmod +s .../umr Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-10 11:27:50 +01:00
Marc-André Lureau	dc2d9b8da1	tgsi-dump: dump label if instruction has one The instruction has an associated label when Instruction.Label == 1, as can be seen in ureg_emit_label() or tgsi_build_full_instruction(). This fixes dump generating extra :0 labels on conditionals, and virgl parsing more than the expected tokens and eventually reaching "Illegal command buffer" (when parsing more than a safety margin of 10 we currently have). Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Cc: "13.0 17.0" <mesa-stable@lists.freedesktop.org> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-02-10 12:46:33 +10:00
Marc-André Lureau	bd1cab1168	tgsi: remove ureg_label_insn Unused since commit `2897cb3dba`. Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com> Signed-off-by: Dave Airlie <airlied@redhat.com>	2017-02-10 12:46:23 +10:00
Ilia Mirkin	c95f821cb4	nvc0/ir: fix ubo max clamp, reset file index We just increased the max UBO, so we should also increase the clamp that we do for robustness. Similarly, as we're including the fileIndex in the new indirect value, we should reset fileIndex to 0 so that it is not added in a second time. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-02-09 15:50:58 -05:00
Ilia Mirkin	e4a698cb97	nv50/ir: always return 0 when trying to read thread id along unit dim Many many many compute shaders only define a 1- or 2-dimensional block, but then continue to use system values that take the full 3d into account (like gl_LocalInvocationIndex, etc). So for the special case that a dimension is exactly 1, we know that the thread id along that axis will always be 0, so return it as such and allow constant folding to fix things up. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Pierre Moreau <pierre.morrow@free.fr> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>	2017-02-09 15:15:36 -05:00
Ilia Mirkin	1acdd62847	nvc0/ir: fix robustness guarantees for constbuf loads on kepler+ compute Kepler and up unfortunately only support up to 8 constbufs. We work around this by loading from constbufs as if they were storage buffers. However we were not consistently applying limits to loads from these buffers. Make sure to do the same thing we do for storage buffers. Fixes GL45-CTS.robust_buffer_access_behavior.uniform_buffer Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-02-09 15:15:22 -05:00
Ilia Mirkin	59ca352fc5	nvc0: increase number of ubo binding points Apparently GL 4.5 requires 14 of these (there's a "*" in the spec, but it's unclear what it refers to). We need to expose an extra binding point for the "program parameters", which means this must be 15. Remove the last vestige of the "use c14 for immediates" idea. Fixes GL45-CTS.shading_language_420pack.binding_uniform_block_array Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Cc: mesa-stable@lists.freedesktop.org	2017-02-09 15:15:08 -05:00
Ilia Mirkin	1e4f5988ed	nvc0: expose int64 Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-09 12:57:49 -05:00
Ilia Mirkin	ab00a41a6e	nvc0/ir: make it possible to have the flags def in def0 There's all kinds of logic that doesn't like there being holes in defs or srcs lists. Avoid them. This also fixes the sched logic for maxwell. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-09 12:57:48 -05:00
Ilia Mirkin	61d7676df7	nvc0/ir: add support for 64-bit shift lowering on SM20/SM30 Unfortunately there is no SHF.L/SHF.R instruction pre-SM35. So we have to do a bit more work to get the job done. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-09 12:57:48 -05:00
Ilia Mirkin	1aefd6159c	nvc0/ir: add support for all the new int64 tgsi opcodes A few thoughts: - Some of that LegalizeSSA logic should really live much earlier and be subject to the likes of DCE and other useful passes - Some of the "lowering" done in from_tgsi should be done later so that proper optimization might be done. However this all works and the above can be improved upon later. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-09 12:57:48 -05:00
Pierre Moreau	009c54aa7a	nv50/ir: Split 64-bit integer MAD/MUL operations Hardware does not support 64-bit integers MAD and MUL operations, so we need to transform them in 32-bit operations. Signed-off-by: Pierre Moreau <pierre.morrow@free.fr>	2017-02-09 12:57:48 -05:00
Ilia Mirkin	22c705ea8c	nvc0/ir: add a "high" subop for shifts, emit shf.l/shf.r for 64-bit Note that this is not available for SM20/SM30. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-09 12:57:48 -05:00
Ilia Mirkin	2e986fa806	nvc0/ir: fix SET and SLCT emission We were never emitting a .X flag for consuming condition code on SET, and weren't emitting a signed type for SLCT comparison. Discovered while working on int64 logic. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-09 12:57:48 -05:00
Ilia Mirkin	eac5099c11	nvc0/ir: add support for emitting partial min/max ops for int64 These operations allow you to compute min/max on arbitrary-width integers, 32 bits at a time. Note that the low/med ops implicitly set the condition code, and the med/high ops implicitly consume it. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>	2017-02-09 12:57:48 -05:00
Ilia Mirkin	b090033087	gallium: add separate PIPE_CAP_INT64_DIVMOD Nouveau does not currently have logic to implement this as a library function. Even though such a library could be written, there's no big advantage to do it that way for now given that int64 is a very uncommon use-case. Allow a driver to expose INT64 without supporting division and modulo operations. Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu> Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>	2017-02-09 12:57:21 -05:00
Tim Rowley	c1aa444a3e	swr: [rasterizer jitter] Pass LLVM-IR size into jitter Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-02-08 13:58:13 -06:00
Tim Rowley	e0a829d320	swr: [rasterizer core] Frontend SIMD16 WIP Removed temporary scafolding in PA, widended the PA_STATE interface for SIMD16, and implemented PA_STATE_CUT and PA_TESS for SIMD16. PA_STATE_CUT and PA_TESS now work in SIMD16. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-02-08 13:58:06 -06:00
Tim Rowley	79174e52b5	swr: [rasterizer jitter] Disable unsafe FP optimizations in the jitter Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-02-08 13:58:00 -06:00
Tim Rowley	db599e316a	swr: [rasterizer core] Frontend SIMD16 WIP Widen simdvertex to SIMD16/simd16vertex in frontend for passing VS attributes from VS to PA. Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-02-08 13:57:52 -06:00
Tim Rowley	09c54cfd2d	swr: [rasterizer jitter] Add DEBUGTRAP jit builder function Reviewed-by: Bruce Cherniak <bruce.cherniak@intel.com>	2017-02-08 13:57:47 -06:00

1 2 3 4 5 ...

30079 Commits