The Blitter engine lacks support for 3 components color format so we can
just fallback to RCS companion command buffer for the blit operation.
Even though blitter supports 96-bit support it only supports linear
tiling. We can support other types of tiling by falling back to the RCS
companion command buffer.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26300>
There was a subtle bug related to CFG tracking. Namely, some branch
instructions may point *only* to the block after the DO instruction
for the loop. If the MOV instructions are in the DO block, the may not
have liveness properly tracked.
Like in !25132, having the MOV instructions in blocks that might
contain other instructions helps scheduling.
shader-db:
All Broadwell and newer Intel GPUs had similar results (Ice Lake shown)
total cycles in shared programs: 848577248 -> 848557268 (<.01%)
cycles in affected programs: 78256396 -> 78236416 (-0.03%)
helped: 361 / HURT: 18
fossil-db:
All Skylake and newer Intel GPUs had similar results (Ice Lake shown)
Totals:
Cycles: 15021501924 -> 15021372904 (-0.00%); split: -0.00%, +0.00%
Totals from 735 (0.11% of 656080) affected shaders:
Cycles: 676429502 -> 676300482 (-0.02%); split: -0.02%, +0.00%
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26439>
radv_init_metadata hits several assert failures when the image is
multi-planar. Make sure we use plane 0.
This change should make no difference in practice. Also, this is done
only to follow radeonsi. Since the opaque metadata is mainly for
validations and DCC, and we don't enable DCC for multi-planar images, we
probably don't need to call radv_query_opaque_metadata at all.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25964>
Do not report DCC modifiers for multi-planar formats. We don't support
DCC for them and drmFormatModifierPlaneCount had incorrect values.
Fix vkGetImageSubresourceLayout for multi-planar images with modifiers.
In that case, memory planes and format planes are equivalent.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25964>
This commit nak: implement SHL and SHR on SM50 caused a regression on
KHR-GL45.gpu_shader_fp64.* using zink.
This fixes the regression, by setting the wrap fields.
Fixes: 00be041ffc ("nak: implement SHL and SHR on SM50")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26586>
Summary:
- Add a perf option to force primary ring submission
- Let device own secondary ring(s) for ad-hoc spawn
- For threads where swapchain and command pool are created, track with
TLS to instruct ring dispatch.
- If the pipeline creation or cache retrieval happens on the background
threads not on the hot paths, force synchronous and dispatch to the
secondary ring after waiting for primary ring becoming current.
- If the pipeline creation or cache retrieval happens on the hot paths
threads, dispatch to the primary ring to avoid being blocked by those
tasks on the secondary ring.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
At first, no behavior change in this CL.
The instance level helper for normal command submission is left to work
with the current venus protocol. Meanwhile, we leave the helper to
submit recorded command buffer inside instance to it can later redirect
to the primary ring.
We've internalized a few ring helpers that no longer need to be exposed.
Besides, indirect submission decision is on per-ring basis since the
ring buffer can vary later.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
This change only moves the fields without changing the accessors. It's
better to let ring own its own upload cs encoder (which is backed by
shmem array) to avoid lock contention between indirect submissions
across rings.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
Now we are able to break up the original lock to allow shmem alloc to be
outside the ring mutex, as long as the reply shmem set is still coupled
with ring submission.
Add and expose vn_instance_reply_shmem_alloc helper which will be used
by rings separately later.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
This can be thread-safe only because we have dropped seeking command
stream offset, which requires comparing pool shmem to decide conditional
set stream.
This is to prepare for later sharing reply shmem pool across rings.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>
More considerations and details here:
- The seek is a bit lighter than set, since it assumes renderer side
resource being immutable. It does affect perf when Venus is still
making verbose synchronous calls at runtime (e.g. descriptor set,
buffer, device memory, etc).
- Seek still requires lock protection as the reply shmem must be
immutable before the seek and the followed cmd are committed to the
ring.
- Removing seek without doing set requires renderer change to always
bump the encoder end position according to what the original request
is instead of being ad-hoc upon what the host driver tells to write.
The overhead and extra complexity there isn't negligible.
- Further, removing seek requires each ring to track the prior reply
pool shmem in the multi-ring scenario. While the additional host side
resource lookup isn't costy as the number of resources is must less
than the vk object table.
- The nice thing is that we can make shmem pool thead safe to be more
easily shared across rings.
So we just drop it.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26179>