Support new vrend command that queries layout of a backing GBM buffer
for a giver vrend resource. Use it for querying stride/modifier of a
PIPE_SHARED resource, passing this info down to WSI for exported resources.
Now venus is able to import vrend resources, making gamescope work in KMS
mode on QEMU. Virgl doesn't use stride/modifier info of winsys when it
imports classic vrend resources, hence this change only affects venus
context when it imports virgl WSI buffers.
Based on initial version of resource-layout command from Daniel Stone.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Acked-by: Yiwei Zhang <zzyiwei@gmail.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37646>
The .resource_create_with_modifiers() callback became required after
7d1a32fafd for venus to work in KMS mode. This fixes GBM buffer
allocation failure for vkmark-kms and fixes implicit modifier not
working on host when using Intel i915 driver for running Steam with
gamescope-kms on guest. Note that KMS support for venus on QEMU never
worked before, hence this is not regression fix.
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37646>
Sadly I don't see an obvious way to use it for int8 matrices, therefore
the code is a bit of a mess right now.
It allows us to vectorize load/stores more often as we can simply
transpose row/col major matrices when needed.
And the movm optimization is also only enabled for 16 bit types, even
though we _could_ do it for 32 bit. It's not clear yet if using it for 32
bit types is an overall advantage or not.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998>
Initial idea and code from Dave, but this is a complete rewrite of the
patch.
The Matrix layouts contain groups of values, for int8 we have vec4 groups,
for fp16, fp32 and int32 we have vec2s. With this we load and store them
as vectors getting rid of a bunch of address calculation.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37998>
For umul24 we expose the operation as UMUL24_RTOP0 so we can identify
the difference between umul24 as part of a sequence generated from an
imul as "multop+umul24" and a simple umul24 where rtop will always be 0.
For umul24_rtop0 instructions we relax the scheduling restrictions,
so they don't need to be serialized like the multop+umul24 ops. But
we maintain the read dependency with the last_rtop.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38642>
Without this, we will report some image formats as unsupported
and the dedicated sparse binding queue won't work
when sparse support is enabled using RADV_PERFTEST=sparse
Fixes: dd90c76cea12 ("radv: Advertise sparse features pre Polaris with perftest flag")
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38676>
RDR2 VRAM memory management when resizable BAR is enabled seems
incorrect because it keeps allocating VRAM without freeing anything.
This introduces a drirc option to emulate a fake carveout of 256MiB to
workaround this game bug. This also adjust memory budgets by
distributing it between visible and invisible because AMDGPU reports
the same value for both when REBAR is enabled.
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12091
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38627>
When running the Vulkan CTS test
dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail.basic ,
the driver sometimes crashes because of cleaning up sequences try to do
pvr_suballoc_bo_free() on bo's that is never initialized (thus old stale
value remains as pointer).
Fix the issues that lead to wild pointers access (a wrong cleanup
sequence and trying to free bo's that fails to be allocated).
The CTS test still fails here with "Allocations still remain, failed on
index 4274", but at least it does not crash now.
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38506>
This is forgotten when advertising the corresponding extension, which
leads to inconsistency, thus fail of
dEQP-VK.api.info.vulkan1p2.feature_extensions_consistency CTS testcase.
Enable the corresponding feature too. I ran all CTS tests with
"mirror_clamp_to_edge" in name, which are all skipped with NotSupported
before (because of the feature being not advertised), and gain
3695/11140 Pass with the remaining ones still NotSupported (no Fail).
This also makes the feature extension consistency CTS testcase Pass too.
Fixes: 4d34c07b7a ("pvr: advertise VK_KHR_sampler_mirror_clamp_to_edge")
Signed-off-by: Icenowy Zheng <uwu@icenowy.me>
Reviewed-by: Frank Binns <frank.binns@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38653>
M1 chips are more restrictive than M2 and above. We need to enforce memory
coherency when needed through "coherent" for buffer memory and
"memory_coherence_device" for textures. Without these the memory operations
are not visible to other threads.
Reviewed-by: Arcady Goldmints-Orlov <arcady@lunarg.com>
Signed-off-by: Aitor Camacho <aitor@lunarg.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38595>
All non-atomic allocations are on pretty slow paths where we only have a
single invocation running. This means they're technically thread-safe
(assuming only a single queue) but it also means the perf of a single
allocation doesn't matter much. However, as a bunch of things are
becoming helpers that may or may not be run in parallel for things like
multi-draw, it becomes harder to know when non-atomic is safe. We're
probably better off using atomic allocations everywhere.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
The original asahi code assumed a subgroup size of 32 and a workgroup
size of 32 * 32 = 1024. This makes doing ctz(ballot(b)) across an
entire workgroup an almost trivial operation. On panfrost, we won't be
so blessed unless we choose a workgroup size of 16 * 16 = 256. It's
also not clear that we want to use workgroups at all and we may better
off sticking to just subgroup parallelism and cutting out memory
bandwidth by more than half. With the new code, the only requirement
should be that the subgroup size is a power of two (this is always true)
and that the workgroup size is an even multiple of the subgroup size.
Even though the new code looks way more complicated, thanks to the magic
of NIR constant folding, it should all fold down to the original code on
asahi and something even smaller if one opts to go for a single subgroup.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
We have access to the poly_vertex_state from the GS so we might as well
use it. Asahi uses a single poly_vertex_state for VS and TCS and just
assumes the tessellator stalls before we update it for TCS. If a driver
wants to use two separate poly_vertex_state buffers, it will be the
driver's responsibility to make the system values return the right one.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
Instead of having the vertex output buffer be a system value and
something the driver needs to manage, put it in poly_vertex_param. We
already need to have it somewhere GPU-writable so we can write it from
indirect setup kernels. Instead of manually allocating 8B all over the
place just to hold this one pointer, stick it in poly_vertex_param.
This also lets us get rid of a NIR intrinsic.
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38404>
The newly rewritten remap_tess_levels_legacy will have already lowered
anything it cares about to URB intrinsics. So the generic remapping
pass won't see them, as it operates on generic input/output intrinsics.
This also drops some of the callback boilerplate we needed temporarily.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>
This unifies the dynamic (SSO) and fixed (linked together) versions.
We emit piles of NIR as if we were doing the dynamic version, but
replace the tess config field access with constant values. It all
should optimize away back to something reasonable. We lower these
directly to URB read/write intrinsics.
It also rewrites the dynamic version to directly read/write the URB
rather than going through temporaries. The old version was broken
in that tessellation control shader invocations can technically use
the shared output area for cross-invocation data sharing with barriers,
although doing so using the built-in tesslevel patch outputs is very
unlikely.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38482>