Fix addition where one summand was already used as input to an earlier
operation, for example in the last operation of MobileNet V2 residual
blocks.
Fixes an assertion when trying to run MobileNet V2:
.../src/gallium/drivers/etnaviv/etnaviv_ml.c:58: etna_ml_create_tensor: Assertion `size == pipe_buffer_size(res)' failed.
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31987>
The hardware only supports unsigned 8-bit tensors, but with the
configurable zero point we can map signed 8-bit integers to unsigned
8-bit integers by adding a constant offset of 128 to all values and to
the zero point setting.
This requires adding 128 to all input tensors and subtracting 128
from all output tensors during inference.
Reviewed-by: Tomeu Vizoso <tomeu@tomeuvizoso.net>
Signed-off-by: Philipp Zabel <p.zabel@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31979>
Prior to VCN4, the encode queue is separate from the decode queue. For
encode, the WRITE_MEMORY command can be executed with similar framing as
for VCN4, but notably there is no signature support, so it must be
skipped.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32400>
Use the new event-based tracing system to capture IDVS/COMPUTE/FRAGMENT
jobs and their context.
When tracing is enabled, the descriptor ring buffer is replaced by
a bigger linear buffer such that descriptors are not recycled before
we get a change to decode the trace.
If the decode buffer is too small and a OOB is detected, the driver will
suggest the user to allocate a bigger buffer with the
PANVK_{DESC,CS}_TRACEBUF_SIZE env vars.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32284>
Interpreting the command buffer only really works if everything is
static, but panvk started to make extensive use of loops, and
conditionals which depends on memory values that get updated by the
command stream itself. This makes it impossible to walk back to the
original state in order to replay the CS actions.
Move away from this approach in favor of an event-based tracing
mechanism recording particular CS commands and their context at
execution time. Of course, that means the auxiliary descriptors
shouldn't be recycled until the traces are decoded, but that's more
tractable. We just need to turn the descriptor ring buffers into
linear buffers with a guard page, and crash on OOB, with a message
suggesting the user to tweak the maximum trace buffer sizes.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32284>
The previous code not only left them undefined, but also
didn't increment the array index, so subsequent PS inputs
would be broken after the undefined one.
Note that this doesn't affect any valid Vulkan apps, but it makes
the code a bit simpler and it makes undefined inputs a little more
forgiving, at no expense for valid PS.
This code actually uncovers a bug in Zink, so I'm also documenting
the failing Zink test case.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32220>
There are some FS built-ins that can be per-vertex or
per-primitive depending on whether a mesh shader is used:
primitive ID (implicit in VS), layer and viewport.
However, the HW requires per-primitive FS inputs to be ordered last.
This causes bugs when the same unlinked FS is used together
with VS/TES/GS and MS (with unlinked ESO or fast-linked GPL).
To solve this problem, we reorder the FS inputs so that these
potentially per-primitive inputs go after per-vertex inputs but
before per-primitive inputs.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32220>
switch libagx to the precompilation pipeline. see the big comment in the
previous commit for why we're doing this.
while doing so, we move some dispatch stuff. there was so much churn from
precompile that this avoids doing the churn twice. that new header will be used
for DGC down the road.
there's also a small vtn/bindgen patch in here to skip bindgen'ing entrypoints,
as that conflicts with the new dispatch macros. this is the sane behaviour, we
just need to do the full precomp switch across the tree at once.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32339>
This seems to be a little faster.
insert_NOPs (navi31):
Difference at 95.0% confidence
-11.484 +/- 6.13377
-1.62767% +/- 0.860593%
(Student's t, pooled s = 5.71913)
insert_NOPs (gfx1200):
Difference at 95.0% confidence
-35.6745 +/- 4.97972
-8.1236% +/- 1.10453%
(Student's t, pooled s = 4.6431)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32374>
After the breaking commit, gbm_bo_create_with_modifiers({LINEAR}) returns
a BO with gbm_bo_get_modifier() = INVALID. This restores the functionality
and fixes most notably, hardware cursors for cards without modifiers.
Fixes#12039.
Fixes: 361f362258 ("dri: Unify createImage and createImageWithModifiers")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31725>
This is currently needed to fix d3d12 for st_unlower_io_to_vars.
The idea is to track the current value of ClipVertex in a temporary
variable, and for every emit_vertex, we load the ClipVertex value from
the temporary (which matches the stored value) and insert new CLIP_DIST
stores before emit_vertex.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
The code for IO variables was interleaved with code for IO intrinsics,
which was difficult to follow.
lower_clip_outputs is split and replaced by more accurate names:
lower_clip_vertex_var and lower_clip_vertex_intrin
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32363>
Predication instructions sometimes need extra nops to workaround what
seems to be a hardware bug: prede needs 6 nops and the second
predt/predf of a predt/predf pair needs 4 nops.
The prede workaround is enabled starting from a6xx gen3 and the
predf/predt workaround from a6xx gen4, following the blob.
Fixes rendering corruption in God of War (2018).
Signed-off-by: Job Noorman <job@noorman.info>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32366>