Creating Android swapchain image from gralloc buffer requires to use
VkImageDrmFormatModifierExplicitCreateInfoEXT. To fill the struct info,
we need to query extended resource info from gralloc.
With the queried modifier from gralloc, we can ask the driver for the
plane count of the given format and modifier pair.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10553>
When a TSY barrier is hit, the entire supergroup will be synchronized.
If the supergoup is large and uses all available QPU threads it would
mean that we would sychronize and stall all running threads until all
of them reach the barrier, which may be inefficient.
This patch makes it so that if the compute shader has any such barriers
we limit the supergroup size so each supergroup only takes half of the
QPU threads available at most, so that if one supergroup hits a
barrier we have at least one other supergroup we can run, reducing
idle QPU time.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
Each supergroup executes a number batches. Each batch has 16 elements
(one per QPU lane), except possibly the last batch which might be
incomplete. Until now, we packed a single workgroup in each supergroup,
which can lead to more incomplete batches and less efficient use
of the QPUs depending on the configuration of workgroups being dispatched.
This patch computes a number of workgroups per supergroup so that
we reduce or completely eliminate incomplete batches if possible.
It should be noted however, that TSY barriers act on supergroups,
so larger supergroups lead to larger syncpoints on barriers too.
A follow-up patch will try to find a good balance for compute shaders
that use such barriers.
This improves performance of the Sascha Willem's computecloth demo
by ~13%.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
The V3D hardware allows us to pack multiple workgroups together to avoid
wasting execution lanes in shader cores.
For example, if we dispatch 16 workgroups with a local size of 1 element, we
can pack all 16 workgroups in a single 16-wide dispatch where each lane
executes a different workgroup, instead of 16 1-wide dispatches.
When we do this, we don't have a uniform workgroup id any more.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>
NIR provides two helper macros to run transformation passes correctly,
NIR_PASS() and NIR_PASS_V(). So far we've seemingly been a bit haphazard
about when to use them.
Let's correct that, and consistently use the NIR helpers here. This
helps us in two ways:
1. We now run nir_validate_shader after each pass, ensuring we didn't
break the shader
2. We now respect the NIR_PRINT environment variable for all NIR passes,
making debugging much less surprising.
In addition, we had an OPT()-macro that doesn't seem to provide much
help other than to hiding some trivial details. But they make our code
different to other users of NIR, which doesn't seem ideal. So let's drop
that macro while we're at it.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10585>
We've had about 1/day of the texelfetch group in the IRC flake reports
since apr 23, and tex-miplevel-selection that I marked before is actually
all the subtests it looks like. Also, you can't include the ",Fail" if
you want to actually match a test name.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10597>
Since the second source is always a constant that is known to be a
number, this should have the same performance as
GALLIVM_NAN_BEHAVIOR_UNDEFINED.
A lofty goal is to eventually remove GALLIVM_NAN_BEHAVIOR_UNDEFINED.
There's still a lot of (mostly implicit) users, and I don't feel like
tackling that right now. :)
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10532>
Like softpipe in mesa!10419, llvmpipe suffers from improper handling
of NaN in nir_op_fmax and nir_op_fmin. nir_op_fsat is already handled
correctly. OpenCL strictly requires the "NaN cleansing" behavior, so
all of the functionality is in place. Just make the graphics APIs use
the OpenCL path.
The majority of the possible performance penalty incurred here should
be resolved in the next commit.
v2: Add updated checksum for bgfx/39-assao.rdc trace. Rendering goes
from mostly garbage to looking correct to me.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10532>
There's a recently discovered HW bug affecting hardware at least as far
back as Skylake where, if the LOD is out-of-bounds for any SIMD lane,
then garbage may be returned in all SIMD lanes. The easy solution is to
set lower_txs_lod so that we always have a constant LOD of 0 which we
know a priori is always in-bounds. Fortunately, not many shaders
actually use textureSize() with LOD.
Shader-db results on Ice Lake:
total instructions in shared programs: 19948537 -> 19948564 (<.01%)
instructions in affected programs: 3859 -> 3886 (0.70%)
helped: 0
HURT: 7
One of the shaders is in Civilization: Beyond Earth, and the rest are
all in Civilization VI.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10538>
The source_depth_to_render_target flag can get set on old gen4-5 HW in a
few cases which are independent of the app writing gl_FragDepth. It
should be safe to just use fetch_payload_reg in that case instead of
depending in interpolation setup. This fixes a bug with certain very
simple shaders where we might end up not including the depth when we
should have.
While we're here, rework the logic around setting src_depth and add a
comment so it's more clear what's going on.
Fixes: 6d4070f3dd "intel/compiler: add support for fragment coordinate..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10596>
Somehow this was missed, but when we emit a query start/stop we need
have something that will need to be flushed in the batch.
Detected due to TC assert, but this had the potential to cause problems
in the non-TC case as well.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10599>