Before VK_KHR_maintenance5 point size is undefined unless the
shader explicitly writes it, but this extension changes this and
expects a default point size of 1.0 if none has been written.
We accomplish this by emitting a POINT_SIZE packet with the
default point size the first time we draw with a POINT primitive
in the job. If the shaders used in the draw call doesn't write
point size then the hardware will take the point size from the
state set by the packet. If the shader does write to point size
then the value written in the shader will be used instead.
Passes all tests we support in:
dEQP-VK.rasterization.primitive_size.default_size.points.*
when forcing maintenance5 enabled.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29413>
If the shader writes point size, then the compiler needs to ensure it
writes it in the appropriate vpm output slot and also clamp its value to
expected limits. This is why we have the per_vertex_point_size in the
shader key, so it doesn't really make sense to set this if the shader
doesn't write point size.
If the shader record flags that the shader writes point size then the
hardware will use the shader written value to override point size state
(set with the POINT_SIZE packet), so again, we really only want to set
this in the shader state record if the shader actually writes its value.
While we could also limit this to point primitives, since these are the
only primitives where point size has an effect, this is not really
required, and skipping this allows us to use the same shader with any
primitive type (otherwise we would have to compile 2 different shaders).
Finally, this change makes the vertex shader setup for point size match the
one we had been doing for geometry shaders, so it makes both stages behave
consistently regarding point size behavior.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29413>
We can emit spill setup before RA if we use scratch. In that case
we have the same situation as during spilling, with the caveat that
we have already emitted the instructions so we need to find them
(they should be the only instructions ones before the instructions
accessing payload registers) and flag them as such.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
We read our payload registers first in the shader so we generally don't have
to care about temps being allocated to them and stomping their value before
we can read them. Hoewer, spilling setup instructions are an exception since
these will be inserted first when there is any spilling in the program.
To fix this, we flag RA nodes involved with these instructions so we can
then try to avoid assiging these registers to them.
Fixes CTS failures with V3D_DEBUG=opt_compile_time, particularly:
dEQP-VK.binding_model.buffer_device_address.set0.depth2.basessbo.convertcheckuv2.nostore.single.std140.comp_offset_nonzero
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
We are skipping additional 357684 tests for the following subgroups
where all tests results are skip to reduce CI usage testing skipped
tests. It reduces the CI time execution by a 2%.
14304 dEQP-VK.binding_model.mutable_descriptor.*
22948 dEQP-VK.binding_model.shader_access.primary_cmd_buf.bind2.*
20258 dEQP-VK.binding_model.shader_access.secondary_cmd_buf.bind2.*
11662 dEQP-VK.compute.shader_object_binary.*
11662 dEQP-VK.compute.shader_object_spirv.*
196299 dEQP-VK.image.host_image_copy.*
14043 dEQP-VK.query_pool.statistics_query.*
54656 dEQP-VK.robustness.robustness2.*
11852 dEQP-VK.sparse_resources.*
We are also using alphabetical order these subgroups.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29328>
As we are marking the last V3D_CLE_READAHEAD bytes as unusable we don't
need to reserve V3D_CL_MAX_INSTR_SIZE bytes for the CLE packet.
This reverts c2601f0690 ("v3dv: ensure at least V3D_CL_MAX_INSTR_SIZE
bytes in last CL instruction")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
We increase the alignment to 16k for BOs allocated for the CL on RPi5 HW.
So we have the same ratio of usable space because of HW readahead as
than on RPi4, as readahead has been increased from 256 to 1024 bytes on
RPi5.
We have also concluded that when the kernel is running with 16k pages
that is the default on Raspberry Pi 5 HW, BO allocations are aligned to
16k so this increase has no cost and we would be using memory more
efficiently.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
The last V3D_CLE_READAHEAD bytes of the CLE buffer are unusable because
using them would prefetch the next readahead bytes of the CL that would
be outside the allocated BO. To guarantee that we can chain a BO to the
current CL we always reserve space for the BRANCH or
RETURN_FROM_SUB_LIST packets.
Not taking this into account has been generating kernel dmesg errors like
"MMU error from client CLE".
As V3D_CLE_READAHEAD is different from RPi4 (256 bytes) to RPi5 (1024 bytes).
So we needed to rename v3dv_cl.c to v3dvX_cl.c to have different objects per
V3D_VERSION.
Extra assertions have been included to validate that we don't write
packets over the usable size of the CL silently.
v2: - Do not declare unusable the space needed for the BRANCH packet,
but take it into account for all space reservations.
v3: - Squash here ("v3dv: Secondary CL needs also to handle CLE readahead")
- Remove spureous parenthesis (Iago Toral)
- Refactor to avoid checking for needs_return_from_sub_list inside
cl_alloc_bo adding unusable_space as new parameter.
v4: - Improved logic for chaining BOs moving it to cl_alloc_bo using
a new enum v3dv_cl_chain_type to identify the different kinds
of BO chaining. Now we increase the size of the BO just before
submitting the BRACH/RETURN_FROM_SUB_LIST packages.
v5: - Assert on BO size updates that we are within the BO size.
(Iago Toral)
v6: - Remove changes at cmd_buffer_end_render_pass_secondary as we
assumed that cl->bo was already allocated when ending the
secondary CL, but it can be NULL. And this was already handle
by current code.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29023>
Currently, the information about the performance counters is duplicated
both in the kernel and in user space. Naturally, this leads to
inconsistency, as the user space might be updated and the kernel isn't.
Aiming to turn the kernel as the "single source of truth", use
DRM_IOCTL_V3D_GET_COUNTER, when available, to get the performance
counter information.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29154>
Now, the kernel has the ability to inform about the maximum number of
performance counters of a V3D device. Let's add this information to the
`struct v3d_device_info` to use it when performing performance queries.
From now on, V3D_PERFCNT_NUM must not be used to retrieve the maximum
number of performance counters. We must use `devinfo->max_perfcnt`,
except on the case that the kernel doesn't support DRM_V3D_PARAM_MAX_PERF_COUNTERS.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29154>
emit_subpass_color_clear_rects and emit_subpass_ds_clear_rects is
receiving a v3dv_render_pass parameter pass, but then using
cmd_buffer->state.pass to access the current pass. All calls to those
methods are already initializing that parameter to that value.
This commit just uses the parameter. An alternative would be to remove
one of the parameters of the function call. I find this option
slightly more readable.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29218>
The resulting pipeline/shaders are different when we are using
multiview (for example, a geometry shader is injected in order to
support multiview).
Doesn't fix any CTS test run individually, but fixes some
dEQP-VK.draw.dynamic_rendering.primary_cmd_buff.multi_draw.overlapping*
CTS tests when run in a batch (using deqp-vk --deqp-caselist-file),
like:
dEQP-VK.draw.dynamic_rendering.primary_cmd_buff.multi_draw.mosaic.indexed_mixed.16_draws.stride_zero.10_instances.vert_only.single_view.offset_6_no_draw_id
dEQP-VK.draw.dynamic_rendering.primary_cmd_buff.multi_draw.overlapping.normal.one_draw.stride_zero.1_instance.vert_only.multiview.no_offset_no_draw_id
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29218>
2712D0 has V3D 7.1.10 which included draw index and
base vertex in the shader state record packet, shuffling
the locations of most of its fields. Handle this at run
time by emitting the appropriate packet based on the
V3D version since our current versioning framework doesn't
support changes based on revision number alone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29189>
2710D0 has V3D 7.1.10 which included draw index and
base vertex in the shader state record packet, shuffling
the locations of most of its fields. Handle this at run
time by emitting the appropriate packet based on the
V3D version since our current versoning framework doesn't
support changes based on revision number alone.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29189>
In the past where backends had to deal with nir_register we needed
a specific path for them here because nir_def_components_read only
worked on ssa defs, but now that we got rid of nir_register we
only have nir_def and we don't need to go out of our way to do this
and we can just always use nir_def_components_read.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
This is similar to what we do for textures where we program a number
of channels matching the number of componentes actually read by the
shader.
Makes tests like dEQP-VK.image.load_store.with_format.2d.r32_uint
drop from 18 instructions to 14 by emitting a single ldtmu instead
of 4.
From dEQP-VK.image.load_store.*:
total instructions in shared programs: 12681 -> 12093 (-4.64%)
instructions in affected programs: 4866 -> 4278 (-12.08%)
helped: 256
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.30 x̃: 2
helped stats (rel) min: 4.76% max: 25.00% x̄: 12.35% x̃: 11.11%
95% mean confidence interval for instructions value: -2.40 -2.19
95% mean confidence interval for instructions %-change: -12.99% -11.71%
Instructions are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
Note that we don't support (and clarify on code why) two of the
features of this extension:
* extendedDynamicState2PatchControlPoints: as we don't support
Tessellation Shaders
* extendedDynamicState2LogicOp: as supporting it would need to allow
compile shader variants after pipeline creation, that we try to
avoid as much as possible (and it is not supported right now)
Note that those two features are not mandatory for Vulkan 1.3. From
spec:
"Promotion to Vulkan 1.3
This extension has been partially promoted. The dynamic state
enumerants VK_DYNAMIC_STATE_DEPTH_BIAS_ENABLE_EXT,
VK_DYNAMIC_STATE_PRIMITIVE_RESTART_ENABLE_EXT, and
VK_DYNAMIC_STATE_RASTERIZER_DISCARD_ENABLE_EXT; and the
corresponding entry points in this extension are included in core
Vulkan 1.3, with the EXT suffix omitted. The enumerants and entry
points for dynamic logic operation and patch control points are not
promoted, nor is the feature structure. Extension interfaces that
were promoted remain available as aliases of the core functionality."
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28980>
Note that when it is dynamic, it goes to the codepath of having
enabled raster_enabled at the pipeline, even if at the end it will be
disabled. The fragment shader compilation, and the stage keys, depends
on rasterization being enabled or not. As mentioned, if the state is
dynamic, it assumes that the rasterization is enabled.
That would work, as then the rasterization could be discarded at the
CFG_BITS package, by the command buffer at draw time. We just have a
(discarded) shader slightly more complex that it would have been with
rasterization enabled.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28980>
While working on VK_EXT_extended_dynamic_state2 we found two issues
the stencil emission code, after the update for StencilTestEnable
being dynamic.
Specifically:
* pack_stencil_cfg: if we don't have a ds_info, we need to return,
as pack_single_stencil_cfg uses it to fill it up. Also the check
for MESA_VK_DYNAMIC_DS_STENCIL_TEST_ENABLE was not needed. That
state doesn't affect the content of the STENCIL_CFG
packet. Stencil is enabled/disabled at the CFG_BITS packet.
* cmd_buffer_emit_stencil: we can't use pipeline->emit_stencil_cfg
to filter if it is needed to emit that as since
stencil_test_enable and stencil_op become dynamic.
We also update which states we check that are dynamic. As
mentioned STENCIL_TEST_ENABLE doesn't affect here.
Fixes: 60e9237e81 ("v3dv: StencilOp and StencilTestEnable are now dynamic")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28980>
There were some pending places to update after PrimitiveTopology
become dynamic. FWIW, this was not catched by any CTS test.
As we are here we add a comment to explain why we still use the
topology on the pipeline.
Fixes: 2526f74ade ("v3dv: PrimitiveTopology is now dynamic")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28980>
When logging a failed IOCTL, an errno is more useful than the output of
`drmIoctl()`. When the IOCTL fails, the return is usually -1 and this
value isn't very useful. On the other hand, the errno can help us to
debug the reason why the IOCTL failed.
Signed-off-by: Maíra Canal <mcanal@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29067>
Remove the v3dv_GetPhysicalDeviceProperties and the
v3dv_GetPhysicalDeviceProperties2 functions, replace them
by a private get_device_properties() called at device initialization
time.
(given the diff, the change is best viewed with --diff-algorithm=histogram)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Tested-by: Roman Stratiienko <r.stratiienko@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26386>
When this was promoted to EXT it expanded its properties struct to add a new
supportsNonZeroFirstInstance field.
Fixes: d38ff02c03 ("v3dv: mark some promoted extensions as supported")
Fixes: dEQP-VK.api.info.vulkan1p2_limits_validation.khr_vertex_attribute_divisor
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28964>