Commit Graph

117072 Commits

Author SHA1 Message Date
Alexandros Frantzis f8f222ea36 virgl: Work around possible memory exhaustion
Since we don't normally flush before performing copy transfers, it's
possible in some scenarios to use too much memory for staging resources
and start failing. This can happen either because we exhaust the total
available memory (including system memory virtio-gpu swaps out to), or,
more commonly, because the total size of resources in a command buffer
doesn't fit in virtio-gpu video memory.

To reduce the chances of this happening, force a flush before a copy
transfer if the total size of queued staging resources exceeds a certain
limit. Since after a flush any queued staging resources will be
eventually released, this ensures both that each command buffer doesn't
require too much video memory, and that we don't end up consuming too
much memory for staging resources in total.

Fixes kernel errors reported when running texture_upload tests in glbench.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:45 -07:00
Alexandros Frantzis e34f79c918 virgl: Remove incorrect resource wait condition
Now that we have copy transfers in place, we can remove the incorrect
resource wait condition. Copy transfers and other optimizations minimize
the performance impact of this removal, while providing the correct
behavior.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:43 -07:00
Alexandros Frantzis 236c55f650 virgl: Use copy transfers for textures
Extend copy transfers to also be used for busy textures.

Performance results:
Unigine Valley, qemu before: 22.7 FPS after: 23.1 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:42 -07:00
Alexandros Frantzis a22c5df079 virgl: Use buffer copy transfers to avoid waiting when mapping
We typically need to wait for a buffer to become ready before mapping,
so that we don't write new contents while the host is still using the
old contents. However, if we are allowed to discard the contents of the
mapped buffer range, then we can avoid waiting by using a staging buffer
range which we guarantee to never be busy, copying from the staging
buffer range to the target buffer in the host.

This commit implements this optimization by utilizing a dedicated
u_upload_mgr for the staging buffer.

Performance results:
Twilight Struggle (Steam/Proton), qemu before: 7 FPS after: 25 FPS
glmark2 ubo, qemu before: 38 FPS after: 331 FPS

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Gurchetan Singh <gurchetansingh@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:39 -07:00
Alexandros Frantzis 6e7726e50c virgl: Support copy transfers
Support transfers that use a different resource as the source of data to
transfer. This will be used in upcoming commits to send data to host
buffers through a transfer upload buffer, in order to avoid waiting
when the buffer resource is busy.

Note that we don't support queueing copy transfers in the transfer
queue. Copy transfers should be emitted directly in the command queue,
allowing us to avoid flushes before them and leads to better
performance.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:36 -07:00
Alexandros Frantzis 199d95f29e virgl: Add copy_transfer3d definitions
Introduce definitions for the copy_transfer3d protocol command and virgl
capability. This command transfers data to the host by copying through
another resource, and will be used in upcoming commits to avoid waiting
when transferring data for busy resources.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:34 -07:00
Alexandros Frantzis ccec1555c1 virgl: Make VIRGL_BIND_STAGING resources cacheable
This could help performance when trying to recreate such resources for
copy transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:33 -07:00
Alexandros Frantzis 636345f496 virgl: Support VIRGL_BIND_STAGING
Support a new virgl bind type for staging buffers which don't require
dedicated host-side storage. These will be used to implement copy
transfers.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:31 -07:00
Alexandros Frantzis f38cdaebac virgl: Avoid unfinished transfer_get with PIPE_TRANSFER_DONTBLOCK
If we are not allowed to block, and we know that we will have to wait,
either because the resource is busy, or because it will become busy due
to a readback, return early to avoid performing an incomplete
transfer_get. Such an incomplete transfer_get may finish at any time,
during which another unsynchronized map could write to the resource
contents, leaving the contents in an undefined state.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Suggested-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:22 -07:00
Alexandros Frantzis 8eb8222c10 virgl: Deduplicate checks for resource caching
Also fixes a missed check for VIRGL_BIND_CUSTOM in one of the duplicate
code snippets.

Note that legacy fences also use VIRGL_BIND_CUSTOM, but we ensured they
don't go through the cache in the previous commit.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:20 -07:00
Alexandros Frantzis e0ffcdf16a virgl: Don't try to use cached resources for legacy fences
Resources for fences should not be from the cache, since we are basing
the fence status on the resource creation busy status.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:45:16 -07:00
Alexandros Frantzis 8089d3658a virgl: More info about chosen alignment value
Add more info about why the value of VIRGL_MAP_BUFFER_ALIGNMENT.

Signed-off-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
2019-06-07 21:44:53 -07:00
Chia-I Wu 371743157e virgl: store all info about atomic buffers
We will need the full info.  This also speeds up
virgl_attach_res_atomic_buffers and fixes resource leaks when the
context is destroyed.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu 98fd742d7e virgl: add shader images to virgl_shader_binding_state
It replaces virgl_context::images.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu f965efb3c8 virgl: add SSBOs to virgl_shader_binding_state
It replaces virgl_context::ssbos.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu 920c4143f0 virgl: add UBOs to virgl_shader_binding_state
It replaces virgl_context::ubos.

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Chia-I Wu 2e21d66d7a virgl: add virgl_shader_binding_state
virgl_shader_binding_state will be used to manage all per-stage
shader bindings.  For now, it manages only sampler views.

This replaces virgl_textures_info and fixes some issues

 - start_slot is now honored
 - views outside of [start_slot, slart_slot+count) are unmodified
 - views are released when the context is destroyed

Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Alexandros Frantzis <alexandros.frantzis@collabora.com>
2019-06-07 22:47:07 +00:00
Kenneth Graunke 30314270d4 iris: Zero shs->cbuf0 when binding a passthrough TCS
Fixes valgrind errors when running two CTS tests back to back:
- KHR-GL45.shader_image_load_store.basic-allTargets-loadStoreT*
(The first test has an actual TCS, the second uses passthrough.)
2019-06-07 15:13:42 -07:00
Jason Ekstrand 1e6b32d08c intel/blorp: Only double the fast-clear rect alignment on HSW
This restriction was accidentally added to the BSpec/PRM as an
unrestricted restriction starting with the HSW docs and it was never
removed.  However, it only ever applied to HSW and actually potentially
causes problems on BDW and above where we have mipmapped fast-clears.

Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
2019-06-07 22:00:55 +00:00
Rob Clark 3c456cf583 freedreno/a6xx: re-arrange program stageobj/group
Split out a separate program config state group to run early before the
other groups.

This seems to help w/ intermittent "missed tiles" (although I had
assumed that was a mem2gmem issue), or at least I can't reproduce that
issue with this patch, but can without.

It has the benefit of HLSQ_VS_CNTL.CONSTLEN matching for VS and BS.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark 958f6ffb60 freedreno/a6xx: fix hangs with newer sqe fw
With the newer (v1.76) fw, we were getting hangs (compared to older
v1.66 fw).  Re-work the GMEM code to structure things a bit closer to
the blob.  This moves some PKT7 packets from IB2 to IB1, which I think
is what was confusing SQE and causing it to get stuck in an infinite
loop.  But in general structuring things at least closer to the same way
blob does makes it easier to compare cmdstream.

Note: this is a bit on the large side for what I'd normally consider for
stable.. but right now it is looking  like it is the newer fw that is
headed for linux-firmware.  This should defn have some soak time on
master, but probably a good idea for this patch to end up in distro mesa
builds by the time a630_sqe.fw hits linux-firmware.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark 1d002cfade freedreno/a6xx: WFI before RB_CCU_CNTL writes
This seems to be in a block of non buffered/context regs.  Blob always
WFIs before write, so probably a good idea.

Annoyingly, compared to ealier gens, it is a bit harder to tell from the
register offset whether it is a buffered reg, it isn't as simple as
everything below 0x2000, it seems.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark 8a02ca807d freedreno/a6xx: don't pre-dispatch texture fetch on accident
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Rob Clark b820c09fa8 freedreno/a6xx: fix issues with gallium HUD
In some cases the draw for the text wasn't working.  This seems to be
fixed by resyncing some of the "golded registers" from blob (initial
values were based on somewhat older blob version).

Perhaps good to have a bit of soak time on master, but would be good
to eventually land in 19.x stable branches.

Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
2019-06-07 12:07:29 -07:00
Nanley Chery b4198e792c anv/cmd_buffer: Initalize the clear color struct for CNL+
On CNL+, the clear color struct is composed of RGBA channel values and
fields which are either reserved by the HW or used to control
fast-clears. Currently anv initializes the channel values to zero and
allows the other fields to be undefined.

Satisfy the MBZ field requirements by removing an optimization that
doesn't hold true for CNL+ and pulling in the number of dwords to
initialize from ISL.

Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
2019-06-07 18:43:06 +00:00
Jon Turney 87173ded6e glx/windows: Fix compilation with -Werror-format
Fix compilation where the DWORD type is used with a format, after
-Werror-format added by c9c1e261.

Some Win32 API types are different fundamental types in the 32-bit and
64-bit versions. This problem is then further compounded by the fact
that whilst both 32-bit Cygwin and 32-bit MinGW use the ILP32 data
model, 64-bit MinGW uses the LLP64 data model, but 64-bit Cygwin uses
the LP64 data model. This makes it near impossible to write printf
format specifiers which are correct for all those targets.

In the Win32 API, DWORD is an unsigned, 32-bit type.  So, it is defined
in terms of an unsigned long, except in the LP64 data model used by
64-bit Cygwin, where it is an unsigned int.

It should always be safe to cast it to unsigned int and use %u or %x.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 11:28:48 -07:00
Kenneth Graunke cd796120c9 iris: Rename bind_state to bind_shader_state.
bind_state is possibly the worst name ever.  For create, we used
create_shader_state, which is more descriptive.  Put shader in the name.
2019-06-07 11:26:20 -07:00
Kenneth Graunke d5d2fb5c4c isl: Mark enum isl_channel_select packed so it becomes 1 byte.
I recently discovered that the following code lead to valgrind errors:

   struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY;
   VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle));

which is surprising, because struct isl_swizzle is simply:

   struct isl_swizzle {
      enum isl_channel_select r:4;
      enum isl_channel_select g:4;
      enum isl_channel_select b:4;
      enum isl_channel_select a:4;
   };

and the above code initializes all of them with a C99 initializer.
Iván Briano reminded me that C99 initializers don't necessarily zero
padding.  A quick inspection revealed that sizeof(struct isl_swizzle)
was 4 (rather than the expected 2).  Ian Romanick suggested changing
it to uint16_t, since this is essentially dicing up an unsigned, and
that worked.

This patch marks enum isl_channel_select packed, changing its size
from 4 bytes to 1 byte.  This then makes struct isl_swizzle 2 bytes,
with no bogus padding fields.  This eliminates valgrind undefined
memory warnings.

These isl_swizzle values become part of our BLORP blit program keys,
which are then hashed.  This undefined padding was being included in
the hashing, possibly leading to issues.  I originally saw this error
when running KHR-GL45.texture_size_promotion.functional in iris under
valgrind.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-07 11:09:44 -07:00
Alyssa Rosenzweig e1c14b2820 panfrost/ci: Texture wrap tests are legitimately fixed
These depended on the wallpaper reload.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig 8442dde169 panfrost/midgard: Lower inot to inor with 0
We were previously lowering to inand, but the second arg was not
duplicated so inot would always return ~0. Oops.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig d415748955 panfrost/midgard: Cleanup tag fetch in disassembler
Trivial.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig d3ad8d6b48 panfrost/midgard: Use fancy iterator
Trivial cleanup.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:29 -07:00
Alyssa Rosenzweig ae20bee75e panfrost/midgard: Cull dead branches
This fixes bugs with complex control flow.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig c62f2ff852 panfrost/midgard: Add mir_print_bundle helper
This helps with debugging scheduling/emission.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig fd6d6c1b15 panfrost/midgard/disasm: Pretty-print branch tags
Just makes it a little more obvious what's going on.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig 2ebf22c399 panfrost/ci: Note some since-fixed tests
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig de8d49acdc panfrost/midgard: Vectorize I/O
This uses the new mesa/st functionality for NIR I/O vectorization, which
eliminates a number of corner cases (resulting in assorted dEQP
failures and regressions) and should improve performance substantial due
to lessened pressure on the load/store pipe.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig 4aced18031 panfrost/midgard: Remove varyings delay pass
This pass interfered with the more delicate path required for
non-vectorized I/O. It's also ugly and duplicating the job of an actual
honest-to-goodness scheduler.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Alyssa Rosenzweig 43568f2675 panfrost/midgard: Apply component to load_input
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
2019-06-07 09:05:28 -07:00
Eric Engestrom 440fe0eb43 nir: fix s/&&/||/ typo
Fixes: cd73b6174b "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-07 16:06:25 +01:00
Kristian H. Kristensen b9bbac6234 freedreno/a6xx: Drop struct stage array
This now boils down to just picking between binning or vertex shader
and dummy_fs or real fs, which we can do in a couple of lines of code
instead.  The constlen logic isn't doing what it thinks it's doing,
both constlens at this point

  MAX2(s[VS].constlen, align(state->bs->constlen, 4));

are binning shader constlens.  We'll have to revisit the constlen
logic, but this commit doesn't change how it works.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:33:12 -07:00
Kristian H. Kristensen 9382a3c11d freedreno/a6xx: Drop support for SS6_DIRECT shader upload
a6xx only supports indirect shaders.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:33:10 -07:00
Kristian H. Kristensen 0ef00ceb2e freedreno/a6xx: Share shader_t_to_opcode
We have a similar function in fd6_program.c. Move to fd6_emit.h and
share.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:33:03 -07:00
Kristian H. Kristensen 4552162e2d freedreno/a6xx: Consolidate more of dword 0 building in fd6_draw_vbo
There's already a bit of duplicated logic here and tessellation will
add more. Build up dword 0 in fd6_draw_vbo() and drop the a4xx in the
process.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:32:59 -07:00
Kristian H. Kristensen cae6b4d741 freedreno: Move fd4_size2indextype() helper to freedreno_util.h
In preparation for refactoring fd6_draw.c a bit.

Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 07:32:34 -07:00
Samuel Pitoiset 0905189a25 radv: enable VK_EXT_sample_locations
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:17 +02:00
Samuel Pitoiset 05f5fa661f radv: enable HTILE for images that might need variable sample locations
This is now supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:14 +02:00
Samuel Pitoiset e7677a697b radv: handle sample locations during automatic layout transitions
From the Vulkan spec 1.1.109:

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. [...] and
    VkRenderPassSampleLocationsBeginInfoEXT can be chained from
    VkRenderPassBeginInfo to provide sample locations for layout
    transitions performed implicitly by a render pass instance."

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:11 +02:00
Samuel Pitoiset d0d41e58c3 radv: determine the first subpass id for every attachments
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:08 +02:00
Samuel Pitoiset f58e9f6d69 radv: handle sample locations during explicit depth/stencil transitions
From the Vulkan spec 1.1.109,

   "Some implementations may need to evaluate depth image values
    while performing image layout transitions. To accommodate this,
    instances of the VkSampleLocationsInfoEXT structure can be
    specified for each situation where an explicit or automatic
    layout transition has to take place. VkSampleLocationsInfoEXT
    can be chained from VkImageMemoryBarrier structures to provide
    sample locations for layout transitions performed by
    vkCmdWaitEvents and vkCmdPipelineBarrier calls."

This handles explicit depth/stencil layout transitions performed
with CmdWaitEvents() or CmdPipelineBarrier().

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-By: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2019-06-07 13:11:01 +02:00