Always chain wsi_image_create_info to VkImageCreateInfo, which indicates
that the image is a wsi image and can be transitioned to/from
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR.
Add prime_blit_buffer to the struct as well. When set, it indicates the
prime blit destination and implies that the image is a prime blit
source.
Signed-off-by: Chia-I Wu <olvaffe@gmail.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10789>
This provides most of the implementation, but there are some
things we cannot enable until we improve of kernel submit
interface, namely:
We don't expose capacity to export SYNC_FD, although we do
have the implementation in place. This requires that we
improve our kernel interface and event wait implementation
first so we can cover the corner case where the application
submits a command buffer that includes a VkCmdWaitForEvents
and tries to export a SYNC_FD from its signal semaphores or
fence before it the event is signaled and the command buffer
is sent to the kernel for execution in full.
Likewise, we can't currently import semaphores. This is because
our current kernel submit interface can only take one syncobj.
We have been working around this so far by waiting on the last
syncobj produced from the device whenever we had to wait on any
semaphores (which is obviously suboptimal already), but this
won't work as soon as we allow importing external semaphores,
as those could (and would typically) be produced from a
different device.
Once we address the kernel bits, we should come back and enable
SYNC_FD exports as well as semaphore imports.
Relevant CTS tests:
dEQP-VK.api.external.fence.*
dEQP-VK.api.external.semaphore.*
dEQP-VK.synchronization.cross_instance.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>
We can (and should) close the descriptor immediately after the import.
Gets the following CTS test to pass without requiring to increase limits
for open file descriptors:
dEQP-VK.synchronization.basic.binary_semaphore.chain
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>
We were using the CT0CA (Control List Executor Current Address) and
CT0EA (Control List Executor End Address) registers, but that would
only wait for the CLE to reach the end of the list, but there could
still be things in the rest of the pipeline.
Even if that seems to work with the current simulator, the correct way
to do that is using the BFC (Binning Mode Flush Count) and RFC
(Rendering Mode Frame Count) registers instead.
In fact, this would be needed with a newer simulator snapshot, in
order to get the followint CTS tests working:
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image.4_bit
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image_one_region.4_bit
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail
dEQP-VK.api.image_clearing.core.clear_color_image.1d.optimal.multiple_layers.r32g32_uint
dEQP-VK.api.image_clearing.core.clear_color_image.1d.optimal.remaining_array_layers_twostep.r16_sint
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
Until now we were waiting until having a dispatch current and/or
queued. But that would only wait for all shaders to have started, it
won't wait for them to have finished.
With this commit we wait until the NUM_COMPLETED_JOBS (that in spite
of that name, it is about dispatches) field got increased.
This is in general safest, and it is needed after the latest simulator
update to get CTS tests like the following ones working:
dEQP-VK.compute.basic.copy_ssbo_multiple_invocations
dEQP-VK.compute.basic.copy_ssbo_single_invocation
dEQP-VK.compute.basic.ssbo_rw_single_invocation
dEQP-VK.compute.basic.ssbo_unsized_arr_single_invocation
dEQP-VK.compute.basic.ubo_to_ssbo_multiple_invocations
dEQP-VK.compute.basic.ubo_to_ssbo_single_invocation
v2 (from Juan feedback):
* Clarify JOBS vs DISPATCHES
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
Current code just assumes that flushes are instant, as simulator
doesn't really model the caches. So right now we have just an assert
that the flush has been done.
But that can change on the future, so let's change the assert for a
wait.
Note that for the l1t case we are writing on the field TMUWCF. So I
understand that then we need to wait for TMUWCF_SET, even if the
previous code was using L2TFLS_SET.
This also happpens on the kernel side. We need to check if this was a
typo on the kernel side.
v2 (from Juan feedback)
* Add comment about the TMUWCF vs L2TFLS difference between this
commit and the kernel.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
So far we were not capturing any HUB interrupt, just core. This could
be a problem if any is fired, as we could enter on an infinite
loop. With this commit we start to capture them. So we split v3d_isr
into core and hub interrupt handling.
As reference we capture the same HUB interrupts that we capture on the
v3d kernel support.
It is worth to note that all those are mostly untested. Now with both
opengl/vulkan driver being stable we were not able to raise those
interrupts.
v2 (Juan feedback):
* Just one V3D_VERSION >= 41 block, more readable
* Assert that the core is 0 at v3d_isr_core (we don't handle
multi-core right now).
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
We were using the num_components to infer it, but in the end it is
VEC2 for CMPXCHG and 32BIT for anything else.
This doesn't affect any test with the real hw, but fixes an assert
with the last version of the simulator.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>
There are some interactions between these two extensions that need to be
implemented when both are supported. Particularly:
1. Applications can create images that will be bound to swapchain memory
by passing a VkImageSwapchainCreateInfoKHR in the pNext chain
of VkImageCreateInfo. In this case we need to make sure that the
created image takes some of its parameters from the underlying
swapchain.
2. Applications can bind memory from a swapchain image to a VkImage
by passing a VkBindImageMemorySwapchainInfoKHR in the pNext chain
of VkBindImageMemoryInfo.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>
This was added with VK_KHR_device_group and allows users to specify
a base offset that will be automatically added to gl_WorkGroupID.
Unfortunately, V3D doesn't support this natively, so we need to add
the base to the workgroup id generated by hardware manually. For this,
we inject add instructions that source from a QUNIFORM that will
retrieve the actual dispatch base from the compute job when it is
dispatched.
Since a compute shader can be dispatched with CmdDispatch and/or
CmdDispatchBase, we always need to add these additional add
instructions and use a base of (0,0,0) for regular dispatches.
Since we don't support any version of OpenGL with this dispatch
base functionality we can avoid the extra instructions there.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>
We don't have any special restrictions associated with the number
of descriptors in a set other than maybe not exceeding what we can
put in a single memory allocation, so in practice, applications will
be limited by the per-stage contraints defined by other Vulkan limits.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10970>
We don't do anything for input attachment aspects read by a subpass
since it doesn't have performance implications for us.
We also ignore the the new depth stencil layouts because they don't
have practical implications for our implementation.
We also ignore the new usage info for views since we are not currently
making decisions about views based on their usage.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10951>
v2: Add all the Y21x tests to the A530 expected fail list. All of the
YUV image import tests fail on this platform, and nobody has been able
to investigate why.
v3: Update the comment describing the zeroed bits in Y212. Suggested by
Emma.
v4: Add all Y21x test to the rpi3 expected fail list.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9610>
v2: Don't leak __DRI_IMAGE_FOURCC_RGBA16161616 to applications.
v3: Fix typo in __DRI_IMAGE_FOURCC_RGBA16161616 table entry.
v4: Add the Y412 and Y416 tests to the A530 expected fail list. Many
YUV image import tests fail on this platform, and nobody has been able
to investigate why.
v5: Update the comment describing the zeroed bits in Y412. Suggested by
Emma.
v6: Add all Y41x test to the rpi3 expected fail list.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9610>
In Vulkan we configure our integer RTs to clamp automatically, so with logic
operations we need to be careful and avoid overflows by discarding any bits
that won't fit in the RT component size.
Fixes remaining CTS test failures in:
dEQP-VK.pipeline.logic_op.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>
This avoids debug builds to assert crash. Components that don't exist
won't be used and will be eventually DCEd, so simply lower them to 0.
Fixes CTS tests like these in debug builds:
dEQP-VK.pipeline.logic_op.r8_uint.clear
dEQP-VK.pipeline.logic_op.r8_uint.and
dEQP-VK.pipeline.logic_op.r8_uint.and_reverse
dEQP-VK.pipeline.logic_op.r8_uint.copy
dEQP-VK.pipeline.logic_op.r8_uint.and_inverted
dEQP-VK.pipeline.logic_op.r8_uint.no_op
dEQP-VK.pipeline.logic_op.r8_uint.xor
dEQP-VK.pipeline.logic_op.r8_uint.or
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>
We had re-enabled this because of some test regressions:
KHR-GLES31.core.geometry_shader.limits.max_input_components and
ext_transform_feedback-max-varyings failed to register allocate,
but now that we support indirect indexing on vertex shader outputs natively
this is no longer an issue.
Piglit's max-samplers tests failed. These tests use indirect indexing
on samplers which is not supported and fail to link with this error message:
"Failed to link: error: sampler arrays indexed with non-constant expressions
is forbidden in GLSL 110". This is expected. The reason these were passing
before is that loop unrolling was able to turn indirect indexing into
direct indexing. We add them to the expected fail list.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>
When spilling a register, the number of temps can be increased when
introducing a temporal variable.
Those nodes are not elegible to be spilled, but we need to take care of
no accessing out-of-bounds of the arrays defined with a size equal to
the original number of temps.
Fixes address sanitizer error on
KHR-GLES3.shaders.uniform_block.random.all_shared_buffer.14 (and many
others).
v2 (Iago):
- Add clarification in assertion.
- Use `vir_get_temp` to increase num_temps.
v3 (Iago):
- Update clarification
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10643>
The current policy is to always favor accumulators if possible, however,
this is not always optimal.
Particularly, accumulators play a crucial role in enabling QPU instruction
merges, since these are limited to both the ADD and the ALU instructions
addressing at most 2 physical registers. For 2-src instructions, this means
that to be able to merge we need them to address at least 2 accumulators.
While favoring accumulators does help the case for instruction merges in
general, it is risky to assign accumulators to variables that have
long life spans. Doing so will make the accumulator unavailable for
any other instructions during that life span, and since we only have a few
accumulators, we can quickly run out and losing our capacity to merge
instructions for large parts of the qpu program.
On the other hand, we also want to avoid the extreme case were we keep
allocating physical registers to the point we run out, even if we have
accumulators available, since accumulators have additional restrictions
and may not be suitable for everything.
This change continues the policy of favoring accumulators, but it only
does so if the life span of the temps is short, to ensure that we can
recycle accumulators often across instructions and avoid running out
for sections of the QPU code, unless we are already running out of
physical registers.
total instructions in shared programs: 13654647 -> 13336921 (-2.33%)
instructions in affected programs: 11015919 -> 10698193 (-2.88%)
helped: 39758
HURT: 17325
Instructions are helped.
total threads in shared programs: 412046 -> 412038 (<.01%)
threads in affected programs: 16 -> 8 (-50.00%)
helped: 0
HURT: 4
Threads are HURT.
total uniforms in shared programs: 3745726 -> 3746003 (<.01%)
uniforms in affected programs: 17296 -> 17573 (1.60%)
helped: 76
HURT: 99
Uniforms are HURT.
total max-temps in shared programs: 2364430 -> 2359942 (-0.19%)
max-temps in affected programs: 109117 -> 104629 (-4.11%)
helped: 2893
HURT: 772
Max-temps are helped.
total spills in shared programs: 5727 -> 5746 (0.33%)
spills in affected programs: 221 -> 240 (8.60%)
helped: 1
HURT: 2
total fills in shared programs: 13121 -> 13139 (0.14%)
fills in affected programs: 466 -> 484 (3.86%)
helped: 1
HURT: 2
total sfu-stalls in shared programs: 33432 -> 34491 (3.17%)
sfu-stalls in affected programs: 18219 -> 19278 (5.81%)
helped: 4459
HURT: 5087
Inconclusive result
total inst-and-stalls in shared programs: 13688079 -> 13371412 (-2.31%)
inst-and-stalls in affected programs: 11030017 -> 10713350 (-2.87%)
helped: 39630
HURT: 17429
Inst-and-stalls are helped.
total nops in shared programs: 335753 -> 333708 (-0.61%)
nops in affected programs: 112659 -> 110614 (-1.82%)
helped: 8726
HURT: 7383
Inconclusive result
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10686>
Once we have exhausted compile strategies at 4 threads and we start
enabling lower thread counts, there is no point in starting compiles
with 4 threads for them, we know these will fail, so let's start at
2 in these cases.
This also has another nice implication: if the driver compiles at 4
threads and fails to register allocate, we were allowing it to try
with 2 threads, but this would only retry the register allocation
process and would not really recompile the shader with 2 threads. This
is not optimal, because at 2 threads we have more TMU fifo space for
each thread and we can do more TMU pipelining, so we were missing that
opportunity.
This improves performance in Sponza by ~1.5% and also seems to help
UE4 slightly.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>