pipe_fence_handle is a refcounted object, it can't be owned by a container
which might have a different lifetime, it needs a dedicated heap allocation
so it can outlive its container.
Make sure that when we're handing out pipe_fence_handle references, that
we add a ref to them before handing them out.
Instead of assuming that a fence_wait call is for the exact fence that we
returned from a given op, mirror what's done on graphics and
opportunistically scan the batches to see what's done, and reclaim
resources for them.
Use d3d12_fence helpers to replace a lot of duplicated code.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35900>
Native sync fences represent point-in-time (fence + value) and can have
CPU wait events. Timeline semaphores represent a full timeline, do not
have a CPU wait event, and can have their value updated dynamically.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35900>
This issue was generating unwanted write accesses that
could overwrite previous operations.
Note: This functionality could also be tested with
nir_lower_wrmasks. This problem seems to only affect
the ssbos.
This change was tested on cypress, barts and cayman. Here are the tests fixed:
khr-gl4[3-6]/compute_shader/pipeline-pre-vs: fail pass
khr-gl4[5-6]/direct_state_access/queries_functional: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/advanced-cast-cs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_image_load_store/advanced-cast-fs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_storage_buffer_object/advanced-switchbuffers-cs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_storage_buffer_object/advanced-switchprograms-cs: fail pass
khr-gl4[5-6]/es_31_compatibility/shader_storage_buffer_object/basic-operations-case1-cs: fail pass
khr-gl4[3-6]/shader_storage_buffer_object/advanced-switchbuffers-cs: fail pass
khr-gl4[3-6]/shader_storage_buffer_object/advanced-switchprograms-cs: fail pass
khr-gl4[3-6]/shader_storage_buffer_object/basic-operations-case1-cs: fail pass
khr-gl4[4-6]/texture_buffer/texture_buffer_max_size: fail pass
khr-gles31/core/compute_shader/pipeline-pre-vs: fail pass
khr-gles31/core/shader_image_load_store/advanced-cast-cs: fail pass
khr-gles31/core/shader_image_load_store/advanced-cast-fs: fail pass
khr-gles31/core/shader_storage_buffer_object/advanced-switchbuffers-cs: fail pass
khr-gles31/core/shader_storage_buffer_object/advanced-switchprograms-cs: fail pass
khr-gles31/core/shader_storage_buffer_object/basic-operations-case1-cs: fail pass
khr-gles31/core/texture_buffer/texture_buffer_max_size: fail pass
khr-glesext/texture_buffer/texture_buffer_max_size: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35830>
Now that we emit these nops at the beginning of block, we can merge them
with any existing nops.
Totals from 7747 (4.71% of 164575) affected shaders:
Instrs: 10458516 -> 10439473 (-0.18%)
CodeSize: 19276236 -> 19255126 (-0.11%)
NOPs: 2379189 -> 2360146 (-0.80%)
(ss)-stall: 932629 -> 932685 (+0.01%)
(sy)-stall: 3634623 -> 3635354 (+0.02%)
Cat0: 2610461 -> 2591418 (-0.73%)
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35934>
Emitting in the same block as the pred[tfe] caused helper_sched to
sometimes insert unnecessary (eq). For example:
block i:
...
prede
(eq)(rpt6)nop
block i+1:
(eq)nop
Emitting the quirk nops in the next block (i+1 in this case) prevents
this.
Note that the small number of shaders where NOPs regress, are cases
where an extra (eq)nop is inserted in a block that doesn't contain any
other nops (but did contain the quirk nop before this change).
Totals from 3814 (2.32% of 164575) affected shaders:
Instrs: 6732543 -> 6732252 (-0.00%); split: -0.01%, +0.00%
CodeSize: 11978286 -> 11978086 (-0.00%); split: -0.00%, +0.00%
NOPs: 1683239 -> 1682948 (-0.02%); split: -0.02%, +0.01%
(ss)-stall: 635237 -> 634077 (-0.18%)
(sy)-stall: 2562027 -> 2533761 (-1.10%); split: -1.10%, +0.00%
Cat0: 1849898 -> 1849607 (-0.02%); split: -0.02%, +0.01%
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35934>
Some `sm8350-hdk` DUTs are currently failing LAVA health checks in the
Collabora farm, reducing available capacity. To mitigate job delays,
temporarily reduce the parallelism of the `a660-vk` job.
Thanks to previous optimizations and further increasing the
tests_per_group setting, there is no loss in test coverage.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35939>
the semaphore stage is VK_PIPELINE_STAGE_COLOR_ATTACHMENT_OUTPUT_BIT,
so the src access barrier must also use this in order to ensure it happens
after the acquire
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35940>
We haven't wired this up in the Midgard compiler, so we can't expose
sample shading on Midgard GPUs. This all seems fixable, because the KILL
instruction can update the coverage without the kill-flag (yeah, a bit
confusing naming), but until someone puts in the time to wire up that,
let's just disable the functionality to avoid crashes.
Fixes: 6bba718027 ("panfrost: Advertise SAMPLE_SHADING")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35881>
Previously we had to utilize the 3D path for float16 formats since
the hw would implicitly convert f16->f32 canonicalizing NaNs
resulting in copies that were not bit exact.
The `HALF_PRECISION` bit was discovered which avoids this
conversion, so we can go back to using the 2D path.
Using the 2D path is faster than the 3D path. Results of Crucible
bench.cast-image show an improvement of slightly above a 50% average
for the 1 MiB image->image copy of {4,6,16,32,64)B chunks, and for
the 16 GiB there's a slight improvement.
The affected formats are {R16,R16G16,R16G16A16}_SFLOAT.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35709>
Per Ken Graunke, corruption issues with push
constants for render batches on Gen12 graphics
have been observed and worked around by re-emitting
push constants at the start of the batch buffer.
We're seeing similar issues with compute batches,
so we'll apply the same work-around.
Fixes corruption reported in Blender on ADL/RPL
CC: mesa-stable
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35873>