Commit Graph

108095 Commits

Author SHA1 Message Date
Samuel Pitoiset d4e0bef1bb radv: fix dumping SPIR-V into hang reports
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-30 13:02:08 +00:00
Tapani Pälli 4f8c86e6a5 mesa: enable ARB_gpu_shader_int64 in compat profile
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 14:37:27 +02:00
Tapani Pälli 2d8b8d3bd1 mesa: add [Program]Uniform*64ARB display list support
This is required for int64 to be enabled in compat profile.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-10-30 14:37:27 +02:00
Bas Nieuwenhuizen 396195e8f1 radv: Enable VK_KHR_timeline_semaphore.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen 4aa75bb3bd radv: Add wait-before-submit support for timelines.
This is actually a non-threaded implementation. I'd summarize this
as event-based submission.

When submit happens we walk a tree of submissions that depend on
the syncobj signal operations to be submitted and if those submission
we no other dependencies we start to execute them immediately.

Or, well I still use a list to avoid issues with long chains and
the stacksize when using recursion.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen 88d41367b8 radv: Add timelines with a VK_KHR_timeline_semaphore impl.
This does not fully do wait-before-submit, to be done in a follow
up patch.

For kernels without support for timeline syncobjs, this adds an
implementation of non-shareable timelines using legacy syncobjs.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen 2117c53b72 radv: Add temporary datastructure for submissions.
So we can defer them.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen c3eae659e7 radv: Split semaphore into two parts as enum+union.
This is in preparation to adding more types.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen 84d9551b23 radv: Always enable syncobj when supported for all fences/semaphores.
This simplifies code for timeline semaphores by needing to support
less configurations.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen 45f4a639a8 radv: Improve fence signalling in QueueSubmit.
Only signalling it once.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen a9c8424e08 radv: Do sparse binding in queue submission.
So we have one place to do queue things if we end up deferring
submissions.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen 915e9178fa radv: Split out commandbuffer submission.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen 43ba44357c radv: Clean up unused variable.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2019-10-30 11:57:07 +01:00
Bas Nieuwenhuizen 2e3a635ee6 radv: Add an early exit in the secure compile if we already have the cache entries.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-10-30 11:38:50 +01:00
Bas Nieuwenhuizen d78809632f radv: Compute hashes in secure process for secure compilation.
To prevent poisoning arbitrary cache entries.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-10-30 11:37:41 +01:00
Erik Faye-Lund 4c4ac2d4d5 zink: drop nop descriptor-updates
If there's nothing to be done, let's actually do nothing. Seems like a
good idea.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-10-30 10:29:23 +00:00
Erik Faye-Lund b222f28357 zink: use bitfield for dirty flagging
Bitfields are a bit more ideomatic than explicit flags, and harder to
get wrong.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-10-30 10:29:23 +00:00
Erik Faye-Lund 6d30abb4f1 zink: use dynamic state for line-width
This will lead to fewer pipelines in the cache, which is assumed to
become our most unavoidable performance bottle-neck down the line.

Reviewed-by: Dave Airlie <airlied@redhat.com>
2019-10-30 10:29:23 +00:00
Duncan Hopkins d2bb63c8d4 zink: Use optimal layout instead of general. Reduces valid layer warnings. Fixes RADV image noise.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-10-30 09:09:49 +00:00
Timothy Arceri cf25664686 radv: make use of radv_sc_read()
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-30 04:49:58 +00:00
Timothy Arceri 28fff3efbc radv: add radv_sc_read() helper
This is a function with timeout support for reading from the pipe
between processes used for secure compile.

Initially we hardcode the timeout to 5 seconds. We can adjust the
timeout limit in future if needed.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-30 04:49:58 +00:00
Timothy Arceri 23a6827e4d radv: allow select() calls in secure compile
This will be used in the following patch to support timeouts for
reading the pipe between processes.

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-10-30 04:49:58 +00:00
Lepton Wu 1abf05764b mapi: Improve the x86 tsd stubs performance.
This skips touching %ebx most times and it shows that glGetString performance
increased from 114M/s to 120M/s on my desktop.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
2019-10-29 20:50:05 -07:00
Lepton Wu 41407d5e9f mapi: Inline call x86_current_tls.
This saves one return and a simple benchmark which calls glGetString
repeatedly on my desktop shows it improves calls per second from 123M
to 141M.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/1997
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
2019-10-29 17:18:06 -07:00
Lepton Wu b2b8639d8e mapi: Clean up entry_patch_public for x86 tls
Remove hard coded 16 and use entry_generate_or_patch to patch
public stubs. The generated code actually is sightly tighter
than before since the "nop" instructions before the final "jmp"
get removed.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
2019-10-29 17:18:06 -07:00
Lepton Wu 1fb75bee90 mapi: split entry_generate_or_patch for x86 tls
The code works exactly the same with before. Just split this function
out so we can reuse it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
2019-10-29 17:18:06 -07:00
Jonathan Gray 45206d7673 mapi: Adapted libglvnd x86 tsd changes
The x86 assembly language stub in src/mapi/entry_x86_tsd.h does not
generate PIC (position-independent code). This causes text relocations
which bring troubles on recent versions of FreeBSD, OpenBSD, Android.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108541
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Lepton Wu <lepton@chromium.org>
2019-10-29 17:13:14 -07:00
Caio Marcelo de Oliveira Filho 9c3c206e71 spirv: Don't fail if multiple ordering semantics bits are set
Vulkan requires that only one bit for the ordering is set, but old
versions of GLSLang just set all the bits.  This was fixed as part of
https://github.com/KhronosGroup/glslang/commit/c51287d744fb6e7e9ccc09f6f8451e6c64b1dad6
but we can still find older versions (or shaders compiled with it)
around.

So instead of failing, emit a warning and fallback to the effective
result of any combination of multiple bits: AcquireRelease.

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2018
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-29 14:53:46 -07:00
Sagar Ghuge f0db4c5204 intel/isl: Allow stencil buffer to support compression on Gen12+
v2: (Nanley Chery)
- Fix commit title
- Fix comment

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge b22b349443 iris: Resolve stencil resource prior to copy or used by CPU
v2: Decide aux usage in get_copy_region_aux_settings (Nanley Chery)

v3: Use isl_surf_usage_is_stencil function (Nanley Chery)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge 5d331251cf iris: Prepare resources before stencil blit operation
We have to resolve destination surfaces if we are bliting to and from
the same surface.

v2: Revert unrelated change (Nanley Chery)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge 4e0ed40ed7 iris: Prepare depth resource if clear_depth enable
Avoid preparing depth resource, if we did fast depth clear before.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge 81de49a9f2 iris: Prepare stencil resource before clear depth stencil
Let aux surface state tracker track the stencil buffer's aux state while
clearing depth stencil buffer.

v2: Fix condition check (Nanley Chery)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge b8223991b5 iris: Resolve stencil buffer lossless compression with WM_HZ_OP packet
Even though stencil buffer compression looks like regular lossless color
compression w/o fast clear support, we have to resolve stencil buffer
with WM_HZ_OP packet.

v2: Check if resource is stencil with helper function (Nanley Chery)

v3: Remove unnecessary included file (Nanley Chery)

v4: (Nanley Chery)
- Avoid stencil buffer aux state transition by improving condition check

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge 87c57b8dae intel/blorp: Set stencil resolve enable bit
When set, the stencil buffer is filled with the true stencil values and
we have to disable stencil buffer clear enable bit.

v2: 1) Refactor code little bit (Nanley Chery)
    2) Fix assertion (Nanley Chery)

v3: 1) Remove unncessary assignment (Nanley Chery)
    2) Fix GEN_GEN check (Nanley Chery)

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge c401186762 intel: Track stencil aux usage on Gen12+
Enable stencil compression enable and control surface enable bit if
stencil buffer lossless compression is enabled.

v2: Remove unnecessary GEN_GEN check (Nanley Chery)

v3: (Nanley Chery)
- Change commit subject tag from intel/isl to intel
- Keep assignment order correct

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge 53d472df24 intel/blorp: Add helper function for stencil buffer resolve
On Gen12+, Stencil buffer's lossless compression should be resolved
with WM_HZ_OP packet.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge ce208be2d8 intel/blorp: Assign correct view while clearing depth stencil
We never saw any failures regarding this typo but it's good to assign
correct stencil view while constructing blorp_params.

Fixes: 0cabf93b80 "intel/blorp: Add an entrypoint for clearing depth and stencil"

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Sagar Ghuge 4287e0a4e4 genxml/gen12: Add Stencil Buffer Resolve Enable bit
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-10-29 14:46:15 -07:00
Nanley Chery 0a2a9a4a5b iris: Allocate main and aux surfaces together
On Gen12, the CCS buffer address doesn't have to be referenced in state
packets. In the case of a stencil buffer with CCS, the kernel won't know
the location of the CCS unless an extra call is made to pin its address.
To avoid this extra call, make the CCS part of the main surface.

v2. Update comment above bo_size. (Jordan)

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-10-29 14:46:15 -07:00
Nanley Chery ff5bc81b51 iris: Determine aux offsets within configure_aux
If a resource has a modifier, the main and aux surfaces will share a BO.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-10-29 14:46:15 -07:00
Nanley Chery f0ed86c6c6 iris: Bail resource creation upon aux creation error
The functions used during aux buffer configuration and creation only
return false for exceptional errors. Don't proceed with surface creation
in those cases.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-10-29 14:46:15 -07:00
Nanley Chery 8b62e3d978 iris: Drop iris_resource::aux::extra_aux::bo
The primary and secondary aux buffers are always allocated in the same
BO.

Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2019-10-29 14:46:15 -07:00
Duncan Hopkins bb8e6994cc zink: pass line width from rast_state to gfx_pipeline_state.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
2019-10-29 20:38:26 +00:00
Jason Ekstrand 52aa7f3e05 anv: Reduce the minimum number of relocations
The original value of 256 was under the assumption that you're a batch
buffer which is likely going to have a large number of relocations.
However, pipeline objects on Gen7 will have at most 6 relocations (one
per shader stage and one for the workaround BO) so this is a lot of
per-pipeline wasted space.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-29 20:27:52 +00:00
Jason Ekstrand a3153162a9 anv: Delay allocation of relocation lists
The old relocation list code always allocated 256 relocations and a hash
set up-front without knowing whether or not we really need them.  In
particular, in the softpin case, this is two fairly large allocations
that we don't need to be making.  Also, for pipeline objects on haswell
where we don't have softpin, we don't need relocations unless scratch is
used so this is extra data per-pipeline.  Instead, we should do it
on-demand.  This shaves 3.5% off of a cpu-limited example running with
the Dawn WebGPU implementation.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-10-29 20:27:52 +00:00
Plamena Manolova 4fe2317601 anv: Implement new way for setting streamout buffers.
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-29 19:21:20 +00:00
Plamena Manolova 0f610e17bc iris: Implement new way for setting streamout buffers.
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-29 19:20:25 +00:00
Plamena Manolova 665b81e29a genxml: Add 3DSTATE_SO_BUFFER_INDEX_* instructions
For gen12 we set the streamout buffers using 4 separate
commands instead of 3DSTATE_SO_BUFFER.

Signed-off-by: Plamena Manolova <plamena.manolova@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-10-29 19:19:58 +00:00
Rob Clark ff6e148a3d freedreno/a6xx: add a618 support
Signed-off-by: Rob Clark <robdclark@chromium.org>
2019-10-29 09:19:34 -07:00