If the cache is disabled then we need to destroy the pipelines
manually when they are no longer needed. Do that by adding them
as private objects to the command buffer.
Fixes: 4f26303dbb ('v3dv: add debug option to disable custom pipeline caches for meta operations')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29544>
Right now we have some HW-dependant constants that we are accessing
using the same mechanism that some hw-dependant functions, through a
macro (V3DV_X macro).
But this means that each time that we need to get those constant
values, we need to do a hw version check. Also, right now both the
macro and the defines with each HW value are duplicated on v3d and
v3dv. Also that macro is ugly and has a ugly name.
This commit moves those values to the already common v3d_device_info
structure.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29535>
Macro values that define values for different HW generations should
use the V3DV_X helper instead of being defined under a V3D_VERSION #if
condition.
Without this change, the original V3D_CLE_READAHEAD and
V3D_CLE_BUFFER_MIN_SIZE definitions used were only working for 4.2 HW.
For the 7.1 HW (RPi5) the 4.2 definitions were applied.
The CLE MMU errors were hidden as they were reported at dmesg as
"MMU error from client PTB (1) at 0x1884200, pte invalid" instead of
client CLE. So fixes all v3dv dmesg warnings for PTB MMU errors on RPi5.
With this change we really don't need different functions per HW generation,
so we rename back file v3dvx_cl.c to v3dv_cl.c. As before, we can use
only the packets definitions for 4.2 HW as they use the same opcode as 7.1 HW.
It fixes also an indentation error introduced with 26c8a5cd72.
Fixes: bb77ac983e ("v3dv: Increase alignment to 16k on CL BO on RPi5")
Fixes: 26c8a5cd72 ("v3dv: fix CLE MMU errors avoiding using last bytes of CL BOs.")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29496>
This small patch fixes an issue where 'addr' is used uninitialized if
the assert gets removed due to compiling release code and thus
returning uninitialized 'addr'
v2: Modified based on initial review:
a) No need to initialize the 'addr' and 'ret' variables
b) Fix 'ret' variable to be proper type based on hw->get_mem return value
v3: Modified based on additional review:
a) Since both the simulator and mesa have their own version of
'unreachable()' and we cannot use ASSERT for the 'ret' value here,
just use a (void) ret after the assert
Reviewed-by: Eric Engestrom <eric@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29434>
Since we now use the common vulkan runtime to handle pipeline state and
this sets a limit for this at MESA_VK_MAX_VERTEX_BINDING_STRIDE we should
do the same, or else we can run into an assert-fail in the runtime code.
Fixes:
dEQP-VK.pipeline.monolithic.bind_buffers_2.maintenance5.triangle_list.buffers5.stride_offset_rnd321.whole_size
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29454>
We were exposing them as zero, as based on just the name, we assumed
that it was about the descriptors using the
VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit.
But from spec, that limit takes into account descriptors created *with
or without*, so for example:
"maxPerStageDescriptorUpdateAfterBindUniformBuffers is similar to
maxPerStageDescriptorUniformBuffers but counts descriptors from
descriptor sets created with or without the
VK_DESCRIPTOR_SET_LAYOUT_CREATE_UPDATE_AFTER_BIND_POOL_BIT bit
set."
As we don't support the feature, those limits are the same of the
existing without the DescriptorUpdateAfterBind.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29430>
We were multiplying it by 6, that is the number of possible shader
stages, but from spec it points that we need to multiply by the number
of supported shader stages.
From Vulkan 1.3 spec, chapter 33, "Limits", note 8 on Table 33
"Required Limits":
"The minimum maxDescriptorSet* limit is n times the corresponding
specification minimum maxPerStageDescriptor* limit, where n is the
number of shader stages supported by the VkPhysicalDevice. If all
shader stages are supported, n = 6 (vertex, tessellation control,
tessellation evaluation, geometry, fragment, compute)."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29430>
This is added with VK_KHR_maintenance5 to allow 64-bit
for pipeline creation flags.
The flags are backwards compatible so we don't need to
change the flag enum values by the new ones.
This patch also addresses a small issue where compute pipelines
where not initializing the flags field in the pipeline object.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29449>
Before VK_KHR_maintenance5 point size is undefined unless the
shader explicitly writes it, but this extension changes this and
expects a default point size of 1.0 if none has been written.
We accomplish this by emitting a POINT_SIZE packet with the
default point size the first time we draw with a POINT primitive
in the job. If the shaders used in the draw call doesn't write
point size then the hardware will take the point size from the
state set by the packet. If the shader does write to point size
then the value written in the shader will be used instead.
Passes all tests we support in:
dEQP-VK.rasterization.primitive_size.default_size.points.*
when forcing maintenance5 enabled.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29413>
If the shader writes point size, then the compiler needs to ensure it
writes it in the appropriate vpm output slot and also clamp its value to
expected limits. This is why we have the per_vertex_point_size in the
shader key, so it doesn't really make sense to set this if the shader
doesn't write point size.
If the shader record flags that the shader writes point size then the
hardware will use the shader written value to override point size state
(set with the POINT_SIZE packet), so again, we really only want to set
this in the shader state record if the shader actually writes its value.
While we could also limit this to point primitives, since these are the
only primitives where point size has an effect, this is not really
required, and skipping this allows us to use the same shader with any
primitive type (otherwise we would have to compile 2 different shaders).
Finally, this change makes the vertex shader setup for point size match the
one we had been doing for geometry shaders, so it makes both stages behave
consistently regarding point size behavior.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29413>
We can emit spill setup before RA if we use scratch. In that case
we have the same situation as during spilling, with the caveat that
we have already emitted the instructions so we need to find them
(they should be the only instructions ones before the instructions
accessing payload registers) and flag them as such.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
We read our payload registers first in the shader so we generally don't have
to care about temps being allocated to them and stomping their value before
we can read them. Hoewer, spilling setup instructions are an exception since
these will be inserted first when there is any spilling in the program.
To fix this, we flag RA nodes involved with these instructions so we can
then try to avoid assiging these registers to them.
Fixes CTS failures with V3D_DEBUG=opt_compile_time, particularly:
dEQP-VK.binding_model.buffer_device_address.set0.depth2.basessbo.convertcheckuv2.nostore.single.std140.comp_offset_nonzero
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
We are skipping additional 357684 tests for the following subgroups
where all tests results are skip to reduce CI usage testing skipped
tests. It reduces the CI time execution by a 2%.
14304 dEQP-VK.binding_model.mutable_descriptor.*
22948 dEQP-VK.binding_model.shader_access.primary_cmd_buf.bind2.*
20258 dEQP-VK.binding_model.shader_access.secondary_cmd_buf.bind2.*
11662 dEQP-VK.compute.shader_object_binary.*
11662 dEQP-VK.compute.shader_object_spirv.*
196299 dEQP-VK.image.host_image_copy.*
14043 dEQP-VK.query_pool.statistics_query.*
54656 dEQP-VK.robustness.robustness2.*
11852 dEQP-VK.sparse_resources.*
We are also using alphabetical order these subgroups.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29328>
As we are marking the last V3D_CLE_READAHEAD bytes as unusable we don't
need to reserve V3D_CL_MAX_INSTR_SIZE bytes for the CLE packet.
This reverts c2601f0690 ("v3dv: ensure at least V3D_CL_MAX_INSTR_SIZE
bytes in last CL instruction")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
We increase the alignment to 16k for BOs allocated for the CL on RPi5 HW.
So we have the same ratio of usable space because of HW readahead as
than on RPi4, as readahead has been increased from 256 to 1024 bytes on
RPi5.
We have also concluded that when the kernel is running with 16k pages
that is the default on Raspberry Pi 5 HW, BO allocations are aligned to
16k so this increase has no cost and we would be using memory more
efficiently.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
The last V3D_CLE_READAHEAD bytes of the CLE buffer are unusable because
using them would prefetch the next readahead bytes of the CL that would
be outside the allocated BO. To guarantee that we can chain a BO to the
current CL we always reserve space for the BRANCH or
RETURN_FROM_SUB_LIST packets.
Not taking this into account has been generating kernel dmesg errors like
"MMU error from client CLE".
As V3D_CLE_READAHEAD is different from RPi4 (256 bytes) to RPi5 (1024 bytes).
So we needed to rename v3dv_cl.c to v3dvX_cl.c to have different objects per
V3D_VERSION.
Extra assertions have been included to validate that we don't write
packets over the usable size of the CL silently.
v2: - Do not declare unusable the space needed for the BRANCH packet,
but take it into account for all space reservations.
v3: - Squash here ("v3dv: Secondary CL needs also to handle CLE readahead")
- Remove spureous parenthesis (Iago Toral)
- Refactor to avoid checking for needs_return_from_sub_list inside
cl_alloc_bo adding unusable_space as new parameter.
v4: - Improved logic for chaining BOs moving it to cl_alloc_bo using
a new enum v3dv_cl_chain_type to identify the different kinds
of BO chaining. Now we increase the size of the BO just before
submitting the BRACH/RETURN_FROM_SUB_LIST packages.
v5: - Assert on BO size updates that we are within the BO size.
(Iago Toral)
v6: - Remove changes at cmd_buffer_end_render_pass_secondary as we
assumed that cl->bo was already allocated when ending the
secondary CL, but it can be NULL. And this was already handle
by current code.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
Currently, the information about the performance counters is duplicated
both in the kernel and in user space. Naturally, this leads to
inconsistency, as the user space might be updated and the kernel isn't.
Aiming to turn the kernel as the "single source of truth", use
DRM_IOCTL_V3D_GET_COUNTER, when available, to get the performance
counter information.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29154>
Now, the kernel has the ability to inform about the maximum number of
performance counters of a V3D device. Let's add this information to the
`struct v3d_device_info` to use it when performing performance queries.
From now on, V3D_PERFCNT_NUM must not be used to retrieve the maximum
number of performance counters. We must use `devinfo->max_perfcnt`,
except on the case that the kernel doesn't support DRM_V3D_PARAM_MAX_PERF_COUNTERS.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29154>
emit_subpass_color_clear_rects and emit_subpass_ds_clear_rects is
receiving a v3dv_render_pass parameter pass, but then using
cmd_buffer->state.pass to access the current pass. All calls to those
methods are already initializing that parameter to that value.
This commit just uses the parameter. An alternative would be to remove
one of the parameters of the function call. I find this option
slightly more readable.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29218>
The resulting pipeline/shaders are different when we are using
multiview (for example, a geometry shader is injected in order to
support multiview).
Doesn't fix any CTS test run individually, but fixes some
dEQP-VK.draw.dynamic_rendering.primary_cmd_buff.multi_draw.overlapping*
CTS tests when run in a batch (using deqp-vk --deqp-caselist-file),
like:
dEQP-VK.draw.dynamic_rendering.primary_cmd_buff.multi_draw.mosaic.indexed_mixed.16_draws.stride_zero.10_instances.vert_only.single_view.offset_6_no_draw_id
dEQP-VK.draw.dynamic_rendering.primary_cmd_buff.multi_draw.overlapping.normal.one_draw.stride_zero.1_instance.vert_only.multiview.no_offset_no_draw_id
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29218>
2712D0 has V3D 7.1.10 which included draw index and
base vertex in the shader state record packet, shuffling
the locations of most of its fields. Handle this at run
time by emitting the appropriate packet based on the
V3D version since our current versioning framework doesn't
support changes based on revision number alone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29189>
2710D0 has V3D 7.1.10 which included draw index and
base vertex in the shader state record packet, shuffling
the locations of most of its fields. Handle this at run
time by emitting the appropriate packet based on the
V3D version since our current versoning framework doesn't
support changes based on revision number alone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29189>