Until now we have been tracking the dstStageMask of barriers (where they
are consumed) but not where they are produced (the srcStageMask). With
this change we extend our barrier state to keep track of this as well.
This allows the driver to have better knowledge of the intended barrier
semantics so it can limit the amount of synchronization it does only
to the source stages involved with a barrier. We will do this in a
later patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now, we have always consumed barriers with the next GPU job
recorded into the command buffer after the barrier even if the job
was not the target of the barrier itself. This works based on the
idea that when we consume a barrier in a job we serialize it against
all queues, so effectively we are ensuring that whatever came
before it has completed, so if the barrier was intended for an
even later job, it would have served its purpose anyway.
It should be noted that CL jobs are special because they are actually
split in two different queues: the binning queue and the render
queue, with a dependency between them to ensure render runs after
binning. With our current implementation, if we have 3 jobs (A, B,
C) and we have a barrier after job A which is intended to block job C
on A's completion, with our implementation we would instead block
B on A's completion. If C is a CL job, and the barrier was targetting
the binning stage then we can have the following scenarios:
1. If B) is a CL job, it will consume the barrier at its binning
stage, so we know that B's binning will not start until A has
completed. Then C's binning will not start until B's binning
has completed, and thus, will not start until A has completed,
as intended.
2. If B) is not a CL job, it will consume the barrier and will not
start until A has completed, however, C's binning job will be
submitted to the binning queue without any sync requirements
and since B did not put any jobs in the binning queue it will
start as soon as A's binning has completed, but not A's render,
which would be incorrect.
Further, since a981ac0539 we now skip consumming BCL barriers if
a job does not have draw calls that can be affected by them. In the
same scenarios as before, now case 1) would also be problematic,
since B may skip the binning sync in that case and start immediately,
and since C's binning would be allowe to start immediately after B's
binning, there is no guarantee that this doesn't happen in parallel
with A's render.
With this patch we fix this situation by tracking the intended
consumer of each barrier: graphics, compute or transfer, and we make
sure to consume them only with jobs that match those semantics.
This fixes flakyness in dEQP-VK.device_group.*
Fixes: a981ac0539 ('v3dv: skip binning sync if binning shaders don't access external resources')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16743>
Until now we have been enabling binning sync if we found a barrier
involving geometry stages (a bcl barrier), however, if the actual
binning shaders involved with the job don't access any external
buffers or images there is no reason to sync at the binning stage.
In this patch we don't immediately consume the bcl barrier flag from
the command buffer state when we create a new job. Instead, we check
this state when we are about to emit a draw call by checking if the
shaders involved with binning may access external resources, such as
vertex buffers, UBOs, or textures. If none of the draw calls in the
job use binning shaders that access external resources then we never
enable binning sync for the job.
It is possible that a binning shader uses resources that are not
synchronized through a barrier though, so we keep track of the
access masks used with barriers for both buffers and images separately
to better identify if the binning shader is affected by the barrier.
If a serialized job never consumes the bcl barrier flag because none
of its draw calls ever required a bcl sync, then the flag will be
cleared when the job is finished.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16322>
Layout transitions are not relevant to us, we only care about barriers
that involve a sync point between read/write actions on the image across
GPU jobs.
Image transitions from undefined layout can only happen before the image
is ever used by the GPU, which means they are never relevant to our
implementation.
This improves performance in vkQuake.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16235>
Instead of having the CPU job execute the CSD job, put both jobs on the
list with the CPU job first which modifies the GPU job which gets kicked
off next. This gives the queue code more visibility into what types of
jobs are actually in the list. In particular, if an indirect compute
job is the last job in a batch buffer, it currently appears as if the
batch ends with CPU work which isn't true because it kicks off GPU work.
In that case, the last job on the list is now a GPU job, which better
matches reality.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Just as with all other TLB operations, we can only use the TLB if the render
area is aligned to tile boundaries. If it is not, then the operation would
overwrite pixels outside the render area, which is not allowed.
In this case, we can't even emit a previous TLB load to fix this because the
TLB has the multisampled attachment, not the resolve attachment, which is
just a destination buffer for the tile store.
Because the condition for tile alignment has to be determined for each
subpass, we handle this by storing this information in the attachment
state of the command buffer with the start of each subpass. We store
whether the attachment is to be resolved and whether it can use the
TLB (considering tile alignment restrictions).
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14752>
With multiple semaphore support, we can improve the way we handle
wait semaphores considering different job types and cmd buffer
batch scenarios, that means:
- A GPU job depends on wait semaphores whenever it is the first job
submitted to a queue in this command buffer batch (the `first` flag
for the job's queue type is set).
- For the first CPU job, if there are wait semaphores, we should
wait for the CPU and GPU being idle to process the job.
Signed-off-by: Melissa Wen <mwen@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13178>
Double buffer mode splits the tile buffer size in half so we can
start processing the next tile while the current one is being
stored to memory. This mode is available only if MSAA is not enabled
and can, in theory, improve performance by reducing tile store
overhead, however, it comes at the cost of reducing the tile size,
which also causes some overhead of its own.
Testing shows that this helps some cases (i.e the Vulkan Quake
ports) but hurts others (i.e. Unreal Engine 4), so for the time
being we don't enable this by default but we allow to enable it
selectively by using V3D_DEBUG.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14551>
v3dv, radv, and turnip are using several C&P format helpers (most of
them wrappers over util_format_description based helpers). methods.
This commit moves the common helpers to the already existing common
vk_format.h. For the case of v3dv we were able to remove the vk_format
header. For turnip and radv, a local vk_format.h header remains, with
methods that are only used for those drivers.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13858>
When multiview is enabled, queries must begin and end in the
same subpass and N consecutive queries are implicitly used,
where N is the number of views enabled in the subpass.
Implementations decide how results are split across queries.
In our case, only one query is really used, but we still need
to flag all N queries as available by the time we flag the one
we use so that the application doesn't receive unexpected errors
when trying to retrieve values from them.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>
The Vulkan spec states that when multiview is enabled the number of
layers in the framebuffer is set to one and that each attachment
must then have at least as many layers as referenced by view masks
in the subpasses in which is used.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12034>
With multilayered framebuffer we want to allocate enough tile state for
all layers involved, so te binner can handle layered rendering where
a geometry shader is used to redirect primitives to specific layers by
writing to gl_Layer.
However, we may also have layered framebuffers in cases where layered
rendering won't be used. Typically this will happen for meta copy/clear
operations, where we setup multilayered framebuffers but then we just
load and/or store the tile buffer without ever rendering a primitive,
let alone use a geometry shader to do layered rendering. In these cases
we can reduce the amount of tile state allocated to a sigle layer.
This patch allows us to specify if we should allocate tile state for all
layers when we start a new frame. We will take advantage of this in
later patches targetting the meta copy/clear code paths.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11923>
This was required to handle the case of secondary command buffers that
did not have framebuffer information available from the primary, since
we used to have an implementation that required this information to
be available for the fallback path of vkCmdClearAttachments. Now that
we can handle our our attachment clears in the current subpass by
emitting draw calls, we no longer need the framebuffer information to
be available and we can remove this.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11783>