This provides most of the implementation, but there are some
things we cannot enable until we improve of kernel submit
interface, namely:
We don't expose capacity to export SYNC_FD, although we do
have the implementation in place. This requires that we
improve our kernel interface and event wait implementation
first so we can cover the corner case where the application
submits a command buffer that includes a VkCmdWaitForEvents
and tries to export a SYNC_FD from its signal semaphores or
fence before it the event is signaled and the command buffer
is sent to the kernel for execution in full.
Likewise, we can't currently import semaphores. This is because
our current kernel submit interface can only take one syncobj.
We have been working around this so far by waiting on the last
syncobj produced from the device whenever we had to wait on any
semaphores (which is obviously suboptimal already), but this
won't work as soon as we allow importing external semaphores,
as those could (and would typically) be produced from a
different device.
Once we address the kernel bits, we should come back and enable
SYNC_FD exports as well as semaphore imports.
Relevant CTS tests:
dEQP-VK.api.external.fence.*
dEQP-VK.api.external.semaphore.*
dEQP-VK.synchronization.cross_instance.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>
We can (and should) close the descriptor immediately after the import.
Gets the following CTS test to pass without requiring to increase limits
for open file descriptors:
dEQP-VK.synchronization.basic.binary_semaphore.chain
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>
Fixes the following building error:
FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so
...
ld.lld: error: undefined symbol: brw_compile_ff_gs_prog
>>> referenced by brw_ff_gs.c:56 (external/mesa/src/mesa/drivers/dri/i965/brw_ff_gs.c:56)
Fixes: 52e426fd8b ("intel/compiler: add support for compiling fixed function gs")
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10718>
Android 29 introduced general-dynamic TLS variable support ("quick
function call to look up the location of the dynamically allocated
storage"), while Mesa on normal Linux has used initial-exec ("use some of
the startup-time fixed slots, hope for the best!"). Both would be better
options than falling all the way back to pthread_getspecific(), which is
the alternative we have pre-29.
Reviewed-by: Roman Stratiienko <r.stratiienko@gmail.com>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10389>
by using the bind counts, the common cases of rebinds can be immediately
handled without unnecessary iteration, and following this each rebind can
be evaluated to ensure that every necessary descriptor was rebound in order
to catch any remaining corner cases that may not be handled in the optimized
rebind path
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11093>
we want to avoid updating these values when possible in order to reduce
overhead, which means that if a descriptor is being replaced, it should
be updated only if the replacement is not the same resource
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11093>
this is the "lazy" descriptor manager, which aims to perform the least
amount of work calculating updates while ignoring the overhead that an
update may incur: effectively the inverse of the caching manager
in this initial implementation, divergence exists between the descriptor
layouts of the cached manager and the lazy manager in order to avoid
incurring regressions in the existing descriptor architecture; this will
be reconciled in a followup MR that refactors and unifies descriptor layouts
during this interim period and until such reconciliation occurs,
the default descriptor manager is now the lazy manager for testing purposes as
there are no changes here which can affect the existing infrastructure
the caching descriptor manager can be selected with the ZINK_CACHE_DESCRIPTORS
env var and will be automatically used for vulkan drivers which don't support
the features required for lazy mode (templates)
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11093>
We always want to reserve _something_, so reserve what we need at batch
creation time and stop trying to re-reserve in a zillion places after.
This has a neglible (<128 bytes per batch) increase in memory usage for
compute-only workloads, but given the amount of simplication, that's a
fair tradeoff.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11074>
We want dirty tracking for constant buffer uploads, but which dirty
flags are needed depend on what the sysvals are. So for each sysval,
record a corresponding dirty flag at compile time, so at draw-time the
check is easy.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11074>
If there is a preload job needing tiling, but no other jobs, then
first_tiler will be set but not tiler_dep.
Fixes faults when two depth-only (stencil is reloaded) clears are done
in a row.
panfrost ffa30000.gpu: Unhandled Page fault in AS1 at VA 0x0000000044870000
Reason: TODO
raw fault status: 0x49002C1
decoded fault status: SLAVE FAULT
exception type 0xC1: TRANSLATION_FAULT_LEVEL1
access type 0x2: READ
source id 0x490
panfrost ffa30000.gpu: gpu sched timeout, js=0, config=0x3301, status=0x8, head=0x608a300, tail=0x608a300, sched_job=f5b0862d
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11104>
We were using the CT0CA (Control List Executor Current Address) and
CT0EA (Control List Executor End Address) registers, but that would
only wait for the CLE to reach the end of the list, but there could
still be things in the rest of the pipeline.
Even if that seems to work with the current simulator, the correct way
to do that is using the BFC (Binning Mode Flush Count) and RFC
(Rendering Mode Frame Count) registers instead.
In fact, this would be needed with a newer simulator snapshot, in
order to get the followint CTS tests working:
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image.4_bit
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image_one_region.4_bit
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail
dEQP-VK.api.image_clearing.core.clear_color_image.1d.optimal.multiple_layers.r32g32_uint
dEQP-VK.api.image_clearing.core.clear_color_image.1d.optimal.remaining_array_layers_twostep.r16_sint
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
Until now we were waiting until having a dispatch current and/or
queued. But that would only wait for all shaders to have started, it
won't wait for them to have finished.
With this commit we wait until the NUM_COMPLETED_JOBS (that in spite
of that name, it is about dispatches) field got increased.
This is in general safest, and it is needed after the latest simulator
update to get CTS tests like the following ones working:
dEQP-VK.compute.basic.copy_ssbo_multiple_invocations
dEQP-VK.compute.basic.copy_ssbo_single_invocation
dEQP-VK.compute.basic.ssbo_rw_single_invocation
dEQP-VK.compute.basic.ssbo_unsized_arr_single_invocation
dEQP-VK.compute.basic.ubo_to_ssbo_multiple_invocations
dEQP-VK.compute.basic.ubo_to_ssbo_single_invocation
v2 (from Juan feedback):
* Clarify JOBS vs DISPATCHES
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
Current code just assumes that flushes are instant, as simulator
doesn't really model the caches. So right now we have just an assert
that the flush has been done.
But that can change on the future, so let's change the assert for a
wait.
Note that for the l1t case we are writing on the field TMUWCF. So I
understand that then we need to wait for TMUWCF_SET, even if the
previous code was using L2TFLS_SET.
This also happpens on the kernel side. We need to check if this was a
typo on the kernel side.
v2 (from Juan feedback)
* Add comment about the TMUWCF vs L2TFLS difference between this
commit and the kernel.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
So far we were not capturing any HUB interrupt, just core. This could
be a problem if any is fired, as we could enter on an infinite
loop. With this commit we start to capture them. So we split v3d_isr
into core and hub interrupt handling.
As reference we capture the same HUB interrupts that we capture on the
v3d kernel support.
It is worth to note that all those are mostly untested. Now with both
opengl/vulkan driver being stable we were not able to raise those
interrupts.
v2 (Juan feedback):
* Just one V3D_VERSION >= 41 block, more readable
* Assert that the core is 0 at v3d_isr_core (we don't handle
multi-core right now).
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
We were using the num_components to infer it, but in the end it is
VEC2 for CMPXCHG and 32BIT for anything else.
This doesn't affect any test with the real hw, but fixes an assert
with the last version of the simulator.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>