A7XX doesn't have the same issue with UBWC flag buffer coherency
as A6XX has.
Though for VK_EXT_rasterization_order_attachment_access we still have
to set prim mode to flushing since it allows not to explicitly synchronize
between writes and reads. Though we could use FLUSH_PER_OVERLAP in sysmem.
Passes:
dEQP-VK.pipeline.*feedback_loop*
dEQP-GLES31.functional.blend_equation_advanced.* (with Zink)
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28597>
Copies/fills/updates for buffers are happening through CCU but need
additional synchronization when write range is not aligned to 64 bytes.
Because dst buffer access uses either R8_UNORM or R32_UINT and they are not
coherent between each other in CCU since format seem to be a part of a
cache key.
See: https://gitlab.khronos.org/vulkan/vulkan/-/issues/3306
The synchronization with writes from UCHE (e.g. with SSBO stores) are
solved by the fact that UCHE has byte level dirtiness tracking and that CCU
flush would happen always before UCHE flush for such case (e.g. both
renderpass and dispatch would flush pending CCU write).
Additionally see:
https://gitlab.khronos.org/vulkan/vulkan/-/issues/3398#note_400111
Fixes geometry corruption and potentially hangs in Resident Evil 3.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28469>
There are more things to do, e.g. BV mempool dumping and estimating the
BV location. However this is a good start.
The expanded register size is because the reglist includes registers
from other cores and these are read the same as any other GPU register.
Note that this is also the actual range of type4 packets, even though
registers higher than 0xffff are all protected. Right now these are
skipped on page faults but still read with the crashdumper for hangs.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27266>
On pre-Valhall HW, the fragment shader metadata was part of the RSD
(renderer state descriptor), which was emitted at draw time, but
Valhall introduces a shader program descriptor containing only the
shader information, and this one is emitted at shader preparation
time.
If we don't add the FS state BO to batch, we might end up with a batch
being executed after the shader object has been destroyed, leading to
page faults when the GPU tries to access the shader program descriptor.
We make the panfrost_batch_add_bo() unconditional since it gracefully
handles the NULL case (which will happen on v7-).
Fixes: 087b63cb07 ("panfrost: Allow uploading fragment SPDs")
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Antonino Maniscalco <antonino.maniscalco@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28926>
During etna_assemble(..) we check if the uniform usage is valid for
the target GPU. As we do not fully init the srcs, it can happen that
we look at random data during the uniform check. This generates
false positive "generating instruction that accesses two different uniforms"
errors.
Fixes: 5aede1a157 ("etnaviv: isa: Do src swizzle with isaspec")
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Reviewed-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29048>
Add performance counting support for a7xx in Freedreno, providing the
available performance counter groups along with the lists of countables
that can be counted through related counters.
All the collected countable names and values are provided in enum
definitions, even when the names indicate some countables being reserved.
The perfcounter groups don't include those reserved values.
The countable selection command stream in fdperf is enabled for a7xx,
sharing the same command stream created for 5th- and 6th-gen devices.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27483>
Instead of the ratio of counter value change in a certain sampling window,
display the raw integer change of that counter value. Counters counting
countables with names indicating cycle values still have that ratio
computed and printed alongside the raw value.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27483>
The counter values on Adreno are 64-bit values, but only their lower
32 bits have been read until now.
This change switches to reading the complete 64-bit value, storing that
value and the delta against the previous value for each counter. Similar is
done for time values, namely storing the time value and the delta against
the previous value (in microseconds) for each counter. The deltas are then
used to compute the counter change per second, as was done before.
In curses UI, an early sampling is done during setup in order to avoid
artificially-large values popping up in the first update due to the deltas
being calculated to initially-zero values.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27483>
fdperf reserves the first CP counter for measuring the GPU frequency.
A new flag is added to the fdperf's counter struct type, the flag being
enabled for the first CP counter during counter setup.
Different tests on group and counter indices are replaced by testing for
this flag's value. Only exception is the restore_counter_groups()
functionality, where now this reserved CP counter is also reselected with
the persistent CP_ALWAYS_COUNT countable value, in case some other program
overrides it.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27483>
Right now select_counter() is called with values of specific countables
that should be tracked in the given counter, but it treats those values
as indices into the array of all available countables for a given counter
group. This works right now since all countable values for any counter
group are sequential, but that won't be the case on a7xx.
To address that, select_counter() is adjusted to find the index of the
specified countable value, and use that to store the label pointer that
should be displayed for the desired counter.
Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27483>