Commit Graph

109245 Commits

Author SHA1 Message Date
Karol Herbst 11f736a6f9 nir/tests: add serializer tests
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-12-11 13:00:44 +01:00
Karol Herbst 676232d76f nir/serialize: fix vec8 and vec16
Nir serializes uses nir_ssa_alu_instr_src_components in a few places to
determine how many components a src has, but that's not what this function
returns. It simply returns how many channels are used, which is still fine
for most of the code.

This was breaking code like this:

vec16 32 ssa_1 = intrinsic load_global
vec1  32 ssa_2 = fmax ssa_1.a, ssa_2.b

v2: make the 16bit encoding work for identify swizzles again

Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-12-11 13:00:44 +01:00
Bas Nieuwenhuizen 2e44bfc14f radv: Fix RGBX Android<->Vulkan format correspondence.
This is correct per the Vulkan spec format equivalence table.

Fixes: f36b52740a "radv/android: Add android hardware buffer queries."
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-11 11:40:13 +01:00
Tomeu Vizoso 63ae9e61c1 panfrost: Add PAN_MESA_DEBUG=sync
Sometimes it's useful to get information about GPU faults in the
console, so it's synchronized with other messages.

This commit will cause Mesa to wait for completion and check if there
are any faults raised by the GPU.

Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-12-11 08:01:20 +01:00
Kenneth Graunke 2e654db27a iris: Create smaller program keys without legacy features
A lot of the brw_*_prog_key fields are for emulating features on legacy
hardware that iris doesn't support.  In particular, all of the texture
swizzle fields take up a lot of space.  These dead fields make hashing
the shader keys more expensive than it ought to be.

We introduce iris-specific keys with only the information we need, and
translate them to brw keys when actually compiling new variants.  This
way, key comparisons can use the small keys.  The size reductions are:

   VS:  328 bytes ->  8 bytes
   TCS: 312 bytes -> 24 bytes
   TES: 304 bytes -> 24 bytes
   GS:  284 bytes ->  8 bytes
   FS:  304 bytes -> 16 bytes
   CS:  280 bytes ->  4 bytes

Scores for the Piglit drawoverhead microbenchmark case with a shader
program change improve by roughly 30%.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-10 22:25:41 -08:00
Pierre Moreau 8ccd3f48a0 compiler/spirv: Fix uses of gnu struct = {} extension
Fixes: a24d6fbae6 ("meson: Add -Werror=gnu-empty-initializer to MSVC compat args")
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Vinson Lee <vlee@freedesktop.org>
Signed-off-by: Pierre Moreau <dev@pmoreau.org>
2019-12-11 06:03:22 +00:00
Vinson Lee 9661fc9cdb util/u_thread: Restrict u_thread_get_time_nano on macOS.
macOS does not have pthread_getcpuclockid.

src/util/u_thread.h:156:4: error: implicit declaration of function 'pthread_getcpuclockid' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
   pthread_getcpuclockid(thread, &cid);
   ^

Fixes: 4913215d14 ("util/u_thread: don't restrict u_thread_get_time_nano() to __linux__")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2171
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Acked-by: Eric Engestrom <eric@engestrom.ch>
2019-12-10 21:35:47 -08:00
Eric Anholt 8bf590b46b tu: Move UBWC layout into fdl6_layout() and use that function.
This gets us shared non-UBWC layout code between gallium and turnip.
Until I fix up the rest of gallium to handle UBWC mipmapping, we do the
single-level UBWC setup in gallium as a fixup after layout.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-12-11 04:24:18 +00:00
Eric Anholt de619d7503 freedreno: Switch the 16-bit workaround to match what turnip does.
Prevents regressions on argb1555 and rgb565 when making turnip use
freedreno's layout.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-12-11 04:24:18 +00:00
Eric Anholt d9cf3e76bd freedreno: Move a6xx's setup_slices() to a shareable helper function.
We pass in all the parameters for setting up the layout, though freedreno
still sets a few of them up early (since it uses layout helpers in making
some decisions about the layout setup parameters that will be cleaned up
once krh's blitter work lands).
2019-12-11 04:24:18 +00:00
Eric Anholt 67258a44d2 tu: Move our image layout into a freedreno_layout struct.
This lets us start using some of the fdl_* helpers and have more obviously
matching code between gallium and turnip.  We can't yet use the fdl_* UBWC
helpers, since the gallium driver doesn't do UBWC mipmaps (which I'm
working on in another branch).

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-12-11 04:24:18 +00:00
Eric Anholt ea7631a9a6 freedreno: Move UBWC layout into a slices array like the non-UBWC slices.
This is a little refactor in preparation for UBWC mipmapping support.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-12-11 04:24:18 +00:00
Eric Anholt bbe84c6c31 freedreno: Refactor the UBWC flags registers emission.
It's the same logic for each of these being emitted, and I was about to
change the rsc->layout.* for UBWC.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-12-11 04:24:18 +00:00
Eric Anholt 97be9503bb freedreno: Drop the extra offset field for mipmap slices.
We can just bake the UBWC-goes-first delta into the slices at setup time.
I did have to fix up the resource shadowing swap path to swap the slice
fields, as it was missing and regressed the format reinterpets otherwise.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-12-11 04:24:18 +00:00
Kenneth Graunke 69d7782b15 intel/decoder: Make get_state_size take a full 64-bit address and a base
i965 wants to use an offset from a base because everything is in a
single buffer whose address may be relocated, and all base addresses
are set to the start of that buffer.

iris wants to use a full 64-bit address, because state lives in separate
buffers which may be in the shader, surface, and dynamic memory zones,
where addresses grow downward from the top of a 4GB zone,  So it's very
possible for a 32-bit offset to exist relative to multiple bases,
leading to the wrong state size.
2019-12-10 19:10:49 -08:00
Dongwon Kim 8a8534a698 iris: INTEL performance query implementation
low-level implementation of INTEL-performance-query APIs in
Intel iris driver. Most of functions and procedures defined here
are adopted from i965 driver (brw_performance_query.c)

v2: - replace genX_init_performance_query with
      iris_init_perfquery_functions which is gen's version agnositic
    - general code clean-up

v3: include gen_perf_gens.h as some of defines were moved to this new
    header file

v4: - checking for kernel 4.13+ won't be needed here as Iris won't be
      loaded anyway without DRM_SYNCOBJ that is enabled after Kernel
      4.13.

    - checking whether gen < 8 or is_cherryview won't be required as
      well because those cases are screened in iris_screen_create.

v5: remove genX(init_performance_query)

v6: - remove oa_metrics_kernel_support as iris works only with kernel
    4.18 and newer.

    - use perf functions defined in separate file, iris_perf.h/c

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-10 17:02:58 -08:00
Mark Janes ca2dd99bf6 iris: separating out common perf code
The configuration of the gen_perf vtable will be the same for
INTEL_performance_query and AMD_performance_monitor.
Initialize the table in a single routine that can be called from both
implementations.

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-10 17:02:58 -08:00
Dongwon Kim 106054ef79 gallium: enable INTEL_PERFORMANCE_QUERY
new state tracker APIs added for INTEL_performance_query
This extension is enabled if all vendor specific functions for it
exist.

v2: add st_cb_perfquery.* to the list of sources in Makefile
v3: minor code clean-up
v4: - add driver hooks for intel-performance-query apis
    - add PIPE level performance counter and type enums that
      match to OpenGL enums
    - do conversion of pipe_perf_counter_type and
      pipe_perf_counter_data_type enums to GL defines in state_tracker

Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-12-10 17:02:58 -08:00
Dylan Baker d0eebda990 meson/broadcom: libbroadcom_cle also needs zlib
Fixes: 1ae8018a6a
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-11 00:49:44 +00:00
Kenneth Graunke 0f2f561a10 anv: Enable Gen11 Color/Z write merging optimization
TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:46 -08:00
Kenneth Graunke 5cc7636993 iris: Enable Gen11 Color/Z write merging optimization
TCCNTLREG contains additional L3 cache write merging optimizations.

The default value on my system appears to be:
- URB Partial Write Merging (bit 0)
- L3 Data Partial Write Merging (bit 2)
- TC Disable (bit 3)

Windows drivers appear to set bit 1 as well to enable "Color/Z Partial
Write Merging".  This should solve an issue we were seeing where MRT
benchmarks were using substantially more bandwidth than they ought.
However, we have not observed it to cause measurable FPS gains.

It is unclear whether we should be setting bit 0 or bit 3, so for now
we leave those at the hardware default value.

Improves performance in Manhattan 3.0 by 6% on ICL 8x8 at a fixed
frequency, according to Felix Degrood.  I didn't see any improvements
at out-of-the-box power management settings, however.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:43 -08:00
Kenneth Graunke 0b74f85870 intel/genxml: Add a partial TCCNTLREG definition
TCCNTLREG contains additional cache programming settings.  In
particular, there are several write combining controls we'd like to use.

Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:19:33 -08:00
Kenneth Graunke 74665eaf3a util: Detect use-after-destroy in simple_mtx
This makes simple_mtx_destroy set the counter to an invalid canary
value and then makes lock/unlock assert that the value is legal.

That way, calling lock/unlock after destroy will assert fail,
rather than deadlocking or potentially even working.

This has caught real deadlocks in dEQP multithreaded tests (in st/mesa
shader variant zombie list handling), which have since been fixed.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
2019-12-10 23:48:40 +00:00
Rob Clark fc97643c57 freedreno/a6xx: enable LRZ by default
Now that dEQP should be happy, lets flip the switch.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Rob Clark 1b4c12d3ee freedreno/a6xx: fix LRZ logic
In particular, we need to invalidate the LRZ state when we cannot be
confident in what the Z state would be during rendering:

1) depth test modes not supported by LRZ
2) stencil test, which would require full rasterization and stencil
   test in the binning pass (whereas LRZ normally just needs to
   determine the min and max z value in an 8x8 quad)

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Rob Clark 3c479849c5 freedreno/a6xx: fix LRZ layout
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Rob Clark 6cf101402d freedreno/a5xx+a6xx: split LRZ layout to per-gen
Seems to be a bit different for a6xx, so let's split this out.

Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Rob Clark 3b074a2e53 freedreno/a6xx: disable LRZ when blending
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-12-10 22:55:21 +00:00
Marek Olšák a305543c8d radeonsi: don't rely on CLEAR_STATE to set PA_SC_GENERIC_SCISSOR_*
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:37 -05:00
Marek Olšák aced18aa61 radeonsi/gfx10: simplify the tess_turns_off_ngg condition
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:36 -05:00
Marek Olšák 42f921387b radeonsi/gfx10: disable vertex grouping
based on PAL.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 16:32:34 -05:00
Marek Olšák 75ce078a0a radeonsi: enable NIR by default and document GL 4.6 support
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
2019-12-10 15:48:58 -05:00
Marek Olšák 42b28e7ac3 st/dri: assume external consumers of back buffers can write to the buffers
This was reverted needlessly because if was part of another series.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-By: Tapani Pälli <tapani.palli@intel.com>
2019-12-10 15:37:37 -05:00
Jason Ekstrand 41691ac016 ANV: Stop advertising smoothLines support on gen10+
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
2019-12-10 20:13:56 +00:00
Dylan Baker 85a9698ac3 meson/broadcom: libbroadcom_cle needs expat headers
Fixes: 1ae8018a6a
       ("meson: Add support for the vc4 driver.")
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-12-10 10:48:38 -08:00
Lionel Landwerlin 5fdea9f401 anv: fix incorrect VMA alignment for CCS main surfaces
Maybe finer way of dealing with this requirement would be to increase
the number of pdevice->memory.types[] to add a category for special
alignment cases.

Meanwhile this fixes the problem of CCS surface alignment and it's
probably not going to cause issues given the size of our address
space.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 6af8a4acc4 ("anv: Add aux-map translation for gen12+")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:06:54 +00:00
Lionel Landwerlin dcfe1903c3 anv: fix missing gen12 handling
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 181be14d43 ("anv: Build for gen12")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-12-10 16:06:54 +00:00
Eric Engestrom d90e656fa7 meson: drop intel_ prefix on imgui_core
Again, no real effect, just the name of a temporary build file.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-12-10 15:16:02 +00:00
Eric Engestrom 2b0e3e9fd1 meson: drop duplicate lib prefix on libiris_gen*
This has no real effect other than the names of the temporary files in
the build folder.

Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
2019-12-10 15:16:02 +00:00
Samuel Pitoiset e4c8491bdf radv: implement VK_KHR_separate_depth_stencil_layouts
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 13:16:17 +01:00
Samuel Pitoiset 48ee62178f radv: initialize HTILE for separate depth/stencil aspects
It either clears the whole HTILE buffer or part of it depending
on the HTILE mask parameter.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 13:09:29 +01:00
Samuel Pitoiset 41cebfc9c1 radv: do not init HTILE as compressed state when dst layout allows it
I don't think this makes much differences and a potential clear
following the initialization will overwrite HTILE anyways.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 13:09:26 +01:00
Samuel Pitoiset b603cc8c84 radv: synchronize after performing a separate depth/stencil fast clears
For depth+stencil images, the driver might use an optimized path
if only one aspect is cleared. It either clears the depth or the
stencil part of HTILE. Because the two separate aspects might use
the same HTILE memory we have to synchronize.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 13:09:22 +01:00
Krzysztof Raszkowski cfe00a52f0 gallivm: add TGSI bit arithmetic opcodes support
Add TGSI_OPCODE_BFI, TGSI_OPCODE_POPC, TGSI_OPCODE_LSB,
TGSI_OPCODE_IMSB, TGSI_OPCODE_UMSB, TGSI_OPCODE_IBFE,
TGSI_OPCODE_UBFE, TGSI_OPCODE_BREV support.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Jan Zielinski <jan.zielinski@intel.com>
2019-12-10 10:34:18 +00:00
Samuel Pitoiset 008fe909ca radv: fix possibly wrong PA_SC_AA_CONFIG value for conservative rast
PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.

Found by inspection, doesn't fix anything known.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 11:04:43 +01:00
Samuel Pitoiset 4f659224c8 radv: move emission of two PA_SC_* registers to the pipeline CS
They don't have to be updated dynamically.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-12-10 11:04:40 +01:00
Pierre-Eric Pelloux-Prayer 87f7ec8a2c st/dri: use st->flush callback to flush the backbuffer
Previously the flush was done before the call to st->flush but
could lead to problems as FLUSH_VERTICES could push some work
that would change the backbuffer (or modify it).

With this commit, all the backbuffer flushing code is executed
right before the call to st_flush.

Closes: https://gitlab.freedesktop.org/drm/amd/issues/842
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205049

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 09:25:28 +01:00
Pierre-Eric Pelloux-Prayer cc0d0afe3b st/mesa: add a notify_before_flush callback param to flush
The new callback is called right before the flush is done to allow
users of st->flush to do some work after all the previous work has
been flushed.

This will be used by dri_flush in the next commit.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 09:25:28 +01:00
Pierre-Eric Pelloux-Prayer f5c1cb2383 radeonsi: dcc dirty flag
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 09:25:28 +01:00
Pierre-Eric Pelloux-Prayer e3e91cebcd radeonsi: fix multi plane buffers creation
When using 3 planes, the sequence produces this chain:
  plane0 -> plane2
This commit fixes this to produce:
  plane0 -> plane1 -> plane2

Fixes: 86e60bc265 ("radeonsi: remove si_vid_join_surfaces and use combined planar allocations")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2193
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-12-10 08:52:16 +01:00