Commit Graph

116624 Commits

Author SHA1 Message Date
Kenneth Graunke c352cdf970 st/mesa: Silence chatty debug printf
Other debug_printf's in this file are in if (0) blocks.

Trivial.
2019-10-22 18:01:41 -07:00
Chris Wilson 0899bf55d4 st/mesa: Map MESA_FORMAT_RGB_UNORM8 <-> PIPE_FORMAT_R8G8B8_UNORM
This is useful for PBO texture upload with GL_RGB and GL_UNSIGNED_BYTE.

v2: Vasily Khoruzhick provided an update for the Lima CI expectations.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-22 22:13:14 +00:00
Lionel Landwerlin 0dfa643feb anv: fix unwind of vkCreateDevice fail
We're skipping the context destruction in some cases which is the
grand scheme of thing is not that important because closing device->fd
will destroy the associated context as well.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reported-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Fixes: b30e01aef5 ("anv: fix memory leak on device destroy")
2019-10-22 20:44:26 +00:00
Rhys Perry 118a32e5ba Revert "aco: only emit waitcnt on loop continues if we there was some load or export"
We don't properly pass on ctx.lgkm_cnt/ctx.barrier_imm/etc, so this
waitcnt was necessary for barriers and correctly waiting for SMEM before
s_dcache_wb on GFX10.

Totals from affected shaders:
SGPRS: 33200 -> 33200 (0.00 %)
VGPRS: 31376 -> 31376 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 2431804 -> 2433956 (0.09 %) bytes
LDS: 316 -> 316 (0.00 %) blocks
Max Waves: 1609 -> 1609 (0.00 %)

This reverts commit 2c050b49b3.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry 964ce47abc aco: add missing bld.scc()
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry c96289a70e aco: keep can_reorder/barrier when combining addition into SMEM
Affects 30 shaders in the pipeline-db (all youngblood).

Totals from affected shaders:
SGPRS: 2656 -> 2456 (-7.53 %)
VGPRS: 2260 -> 2260 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 240680 -> 240944 (0.11 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 90 -> 90 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry 57c2cfb608 aco: add a few missing checks in value numbering
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry a8d0101d69 aco: use ds_read2_b64/ds_write2_b64
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry bdf47a1273 aco: properly combine additions into ds_write2_b64/ds_read2_b64
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry 58d4aee5df aco: fix sparse store_lds()
p_extract_vector's second operand is in units of the definition size, not
dwords.

v2: move extract_subvector() to right before ds_write_helper

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry a856629e8f aco: create load_lds/store_lds helpers
We'll want these for GS, since VS->GS IO on Vega is done using LDS.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry a400928f4a aco: fix 64-bit p_extract_vector on 32-bit p_create_vector
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Rhys Perry f6f15859de aco: small stage corrections
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-22 18:52:29 +00:00
Marek Olšák f764725b3e st/mesa: replace pipe_shader_state with tgsi_token* in st_vp_variant
we don't need more than that

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-22 14:41:25 -04:00
Marek Olšák a0b711d8e9 nir: allow nir_lower_uniforms_to_ubo to be run repeatedly
for st/mesa

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-22 14:41:23 -04:00
Rob Clark aa8515463e freedreno/ir3: fixup register footprint fixup
Small typo resulted in not converting footprint to vec4, meaning that we
could potentially ask for quite a few more registers than required

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-22 17:46:19 +00:00
Rob Clark 4c060235a2 freedreno/ir3: handle scalarized varying inputs
If the load_interpolated_input is scalarized, we would be too
conservative about deciding the tex instruction wasn't a candidate to
pre-fetch:

	vec1 32 ssa_0 = load_const (0x00000000 /* 0.000000 */)
	vec2 32 ssa_1 = intrinsic load_barycentric_pixel () (0) /* interp_mode=0 */
	vec1 32 ssa_2 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 0) /* base=0 */ /* component=0 */	/* packed:v_uv,v_uv1 */
	vec1 32 ssa_3 = intrinsic load_interpolated_input (ssa_1, ssa_0) (0, 1) /* base=0 */ /* component=1 */	/* packed:v_uv,v_uv1 */
	vec2 32 ssa_8 = vec2 ssa_2, ssa_3
	vec4 32 ssa_9 = tex ssa_8 (coord), 0 (texture), 0 (sampler)

Really we don't care that the texcoord components come from different
load_interpolated_input instructions, just that they have consecutive
varying offsets.

Reported-by: Eduardo Lima Mitev <elima@igalia.com>
Reviewed-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-22 17:46:19 +00:00
Daniel Schürmann 3a20ef4a32 aco: refactor value numbering
Previously, we used one hashset per BB, so that we could
always initialize the current hashset from the immediate
dominator. This patch changes the behavior to a single
hashmap using the block index per instruction to resolve
dominance.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-22 17:18:59 +02:00
Erik Faye-Lund 3a71e1d27b mesa/st: assert that lowering is supported
Some of these lowerings aren't supported for drivers that supports
tesselation and geometry shaders. Let's add a couple of asserts to make
it obvious if these have been enabled when it's not possible.

Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-10-22 12:07:23 +00:00
Michel Dänzer 793f6b30d9 gitlab-ci: Enable llvmpipe in ARM build jobs
v2:
* Use LLVM 8 from buster-backports
v3:
* Use LLVM 7 again for armhf, llvmpipe is still broken there with LLVM 8

Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-10-22 10:26:29 +00:00
Michel Dänzer 59e7f1413c gitlab-ci: Update the meson cross file for LLVM_VERSION as well
Cross builds don't use the llvm-config path from the native file.
2019-10-22 10:26:29 +00:00
Michel Dänzer 163ec5d808 gitlab-ci: Use native aarch64 runner for ARM build jobs
This allows running the regression tests.

One downside is that we can't easily build the Vulkan overlay layer,
because only x86 binaries of the glslang validator are available. If
that's important, we could either use those binaries via qemu, or build
it from source.

v2:
* Add :amd64 suffix to existing debian-9/10 job names (Eric Engestrom)

Acked-by: Eric Engestrom <eric.engestrom@intel.com> # v1
2019-10-22 10:26:29 +00:00
Michel Dänzer c5aa2711a4 gitlab-ci: Explicitly list debian-10 in needs: for .deqp-test template
Apparently needs: in a definition overwrites inherited ones. So
.deqp-test effectively didn't declare needs: for debian-10, which means
any jobs based on .deqp-test could spuriously run after the debian-10
job failed or was cancelled.
2019-10-22 10:26:29 +00:00
Michel Dänzer 38d42cf1d5 gitlab-ci: Bring ARM docker image install script in line with x86_64
Use https:// URLs in the APT configuration.

Drop --no-install-recommends, the image generation template disables
installation of recommended packages in /etc/apt/apt.conf.

Run apt-get autoremove at the end, cleaning up packages which were
installed to satisfy dependencies but are no longer needed.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-10-22 10:26:29 +00:00
Michel Dänzer e3c7e04dfa gitlab-ci: Sort ARM docker image packages in alphabetical order
No functional change.

Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
2019-10-22 10:26:29 +00:00
Samuel Pitoiset a13320370e radv: fix updating bound fast ds clear values with different aspects
On GFX9, the driver is able to do an optimized fast depth/stencil
clear with only one aspect (ie. clear the stencil part of a
depth/stencil image). When this happens, the driver should only
update the clear values of the given aspect.

Note that it's currently only supported on GFX9 but I have some
local patches that extend this optimized path for other gens.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1967
Cc: 19.2 <mesa-stable@lists.freedesktop.org>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-22 11:16:13 +02:00
Sagar Ghuge 97e6d34e66 intel/compiler: Refactor disassembly of sources in 3src instruction
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge 18b28b5654 intel/compiler: Don't move immediate in register
On Gen12, we support mixed mode HF/F operands, and also 3 source
instruction supports immediate value support, so keep immediate as it
is, if it fits properly in 16 bit field.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge bf943bdf24 intel/compiler: Set bits according to source file
On Gen >= 12, if src0 or src2 holds immediate value, we need set
src[0/2]_is_imm bits instead of register file.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Sagar Ghuge c018c5a339 intel/compiler: Add Immediate support for 3 source instruction
On Gen >= 10, Either src0 or src2 can use 16-bit immediate value, but
not both.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-10-21 20:32:43 -07:00
Eric Anholt fb9362c6fb ci: Disable lima until its farm can get fixed.
It's been throwing the following error today:

"<Fault -32603: 'Internal Server Error (contact server administrator
for details): could not extend file "base/17952/18226": No space left
on device\nHINT: Check free disk space.\n'>"

Reviewed-by: Daniel Stone <daniels@collabora.com>
2019-10-21 20:31:34 -07:00
Sagar Ghuge 7fb75ddfa7 intel: Add missing entry for brw_nir_lower_alpha_to_coverage in Makefile
Fixes: 7ecfbd4f6d ("nir: Add alpha_to_coverage lowering pass")

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-10-21 16:19:24 -07:00
Dave Airlie bde08ce4d7 llvmpipe: handle compute shader launch with 0 threads
If you set LP_NUM_THREADS=0 compute shaders would hang,
just execute the workloads in sequence if we have no threads
in the pool.

Fixes: 1b24e3ba75 ("llvmpipe: add compute threadpool + mutex")
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
2019-10-21 22:51:23 +00:00
Marijn Suijten 0141a4cdc0 freedreno/ir3: Add missing ir3_nir_lower_tex_prefetch.c to Android.mk
This file is created in 2a0d45ae6c but
addition to android makefiles was omitted. It breaks the build with
missing references which are defined in this file.
List the file in ir3_SOURCES to make the build succeed.

Signed-off-by: Marijn Suijten <marijns95@gmail.com>
2019-10-21 22:43:00 +00:00
Samuel Pitoiset 39760793b5 ac/llvm: fix ac_to_integer_type() for 32-bit const addr space pointers
This fixes some crashes with dEQP-VK.descriptor_indexing.* when
read_first_invocation has its source from a descriptor.

Most of these tests still fail because of an LLVM bug (they work
with ACO).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-21 22:32:01 +02:00
Rhys Perry 73184e51d1 aco: run opt_algebraic in a loop
Totals from affected shaders:
SGPRS: 13920 -> 13656 (-1.90 %)
VGPRS: 12972 -> 12960 (-0.09 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 1005680 -> 1000648 (-0.50 %) bytes
LDS: 91 -> 91 (0.00 %) blocks
Max Waves: 688 -> 688 (0.00 %)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 19:18:30 +00:00
Rhys Perry 132ae89b19 aco: use nir_lower_idiv_precise
v7: rename _nv50/_llvm to _fast/_precise

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 18:49:46 +00:00
Rhys Perry 8b98d0954e nir/lower_idiv: add new llvm-based path
v2: make variable names snake_case
v2: minor cleanups in emit_udiv()
v2: fix Panfrost build failure
v3: use an enum instead of a boolean flag in nir_lower_idiv()'s signature
v4: remove nir_op_urcp
v5: drop nv50 path
v5: rebase
v6: add back nv50 path
v6: add comment for nir_lower_idiv_path enum
v7: rename _nv50/_llvm to _fast/_precise
v8: fix etnaviv build failure

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 18:49:46 +00:00
Sagar Ghuge f729ecefef intel/compiler: Remove emit_alpha_to_coverage workaround from backend
Remove emit_alpha_to_coverage workaround from backend compiler and start
using ported workaround from NIR.

v2: Copy comment from brw_fs_visitor (Caio Marcelo de Oliveira Filho)

Fixes piglit test on HSW:
- arb_sample_shading-builtin-gl-sample-mask-mrt-alpha-to-coverage-combinations

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-21 11:27:29 -07:00
Sagar Ghuge 7ecfbd4f6d nir: Add alpha_to_coverage lowering pass
Importing this pass from fs_visitor::emit_alpha_to_coverage_workaround()
in intel/compiler.

v2 (Caio Marcelo de Oliveira Filho):
- Track store output and sample mask instruction
- Nest math insturction for more readability
- Bail out early if no gl_SampleMask

v3: (Caio Marcelo de Oliveira Filho):
- Do math instructions after instruction block
- Restructure code
- Move pass under src/intel/compiler

v4: (Caio Marcelo de Oliveira Filho):
- Organize dither mask calculation

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-10-21 11:27:29 -07:00
Daniel Schürmann 0e4bd261b1 aco: ensure that uniform booleans are computed in WQM if their uses happen in WQM
This fixes graphical corruption in SC2.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
2019-10-21 17:39:46 +00:00
Dylan Baker a9a9249288 meson: Require meson >= 0.49.1 when using icc or icl
0.49.0 can compile most of mesa with ICC or ICL, but not SWR without
additional workarounds in our meson.build files. Bumping patch version
is easier and shouldn't be a big burden anyway, especially to cover a
niche compiler. The check originally only covered ICC, but now covers
ICL as well.

Fixes: 3740ffb59c
       ("meson: add switches for SWR with MSVC")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1937
Acked-by: Eric Engestrom <eric.engestrom@intel.com>
2019-10-21 17:21:57 +00:00
Juan A. Suarez Romero d33fe2d5eb docs: update calendar, add news item and link release notes for 19.1.8
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
2019-10-21 19:13:55 +02:00
Juan A. Suarez Romero 62a0e8421e docs: add release notes for 19.1.8
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit cc88eeb6ffc4e86d76dfdbfc601d519bc35b6c41)
2019-10-21 19:10:52 +02:00
Juan A. Suarez Romero 7aa63ffe4f docs: add release notes for 19.1.8
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
(cherry picked from commit 5c6d266c591208b1c27e06f61b814210fc6e095f)
2019-10-21 19:10:49 +02:00
Timur Kristóf 7e5f87b533 aco/gfx10: Update constant addresses in fix_branches_gfx10.
Due to a bug in GFX10 hardware, s_nop instructions must be added
if a branch is at 0x3f. We already do this, but forgot to also update
the constant addresses that come after this instruction.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 14:33:54 +00:00
Timur Kristóf f380398f8f aco/gfx10: Fix PS exports for SPI_SHADER_32_AR.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 14:33:54 +00:00
Timur Kristóf 1749953ea3 aco/gfx10: Wait for pending SMEM stores before loads
Currently if you have an SMEM store followed by an SMEM load that
loads the same location as was written, it won't work because the
store isn't finished before the load is executed. This is NOT
mitigated by an s_nop instruction on GFX10.

Since we currently don't have proper alias analysis, this commit adds
a workaround which will insert an s_waitcnt lgkmcnt(0) before each
SSBO load if they follow a store. We should further refine this in
the future when we can make sure to only add the wait when we load the
same thing as has been stored.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
2019-10-21 14:33:54 +00:00
Boris Brezillon 7fa5cd3ee3 panfrost: Fix the DISCARD_WHOLE_RES case in transfer_map()
The current implementation does not synchronize on BO readiness when
DISCARD_WHOLE_RES flag is set, which can lead to misbehaviours when the
resource being updated is being used by one of the pending or already
flushed batches.

Adding unconditional BO synchronization would do the trick, but we can
sometimes optimize this path by re-allocating a new BO instead of
waiting for the existing one to be ready.

Reported-by: Daniel Stone <daniels@collabora.com>
Reported-by: Heinrich Fink <heinrich.fink@daqri.com>
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-10-21 14:37:02 +02:00
Iago Toral Quiroga 2d5edf2558 st/mesa: only require ESSL 3.1 for geometry shaders
According to the OES_geometry_shader spec, section Dependencies:

   "OpenGL ES 3.1 and OpenGL ES Shading Language 3.10
    are required."

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-10-21 09:09:15 +00:00