This clarifies the semantics of the index variables compared to the previous
version, which used the same variables in a slightly different way depending
on whether they were used for downwards moves or upwards ones.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10885>
Otherwise the pb_slabs might be freed by another thread in between.
Valgrind example:
==676841== Invalid read of size 1
==676841== at 0x6B0A8B3: get_slab_wasted_size (amdgpu_bo.c:659)
==676841== by 0x6B0AD7D: amdgpu_bo_slab_destroy (amdgpu_bo.c:684)
==676841== by 0x6ACF94F: pb_destroy (pb_buffer.h:259)
==676841== by 0x6ACF94F: pb_reference_with_winsys (pb_buffer.h:282)
==676841== by 0x6ACF94F: radeon_bo_reference (radeon_winsys.h:754)
==676841== by 0x6ACF94F: si_replace_buffer_storage (si_buffer.c:274)
==676841== by 0x6957036: tc_call_replace_buffer_storage (u_threaded_context.c:1554)
[...]
==676841== by 0x4ECCDEE: clone (clone.S:95)
==676841== Address 0x27879945 is 5 bytes inside a block of size 208 free'd
==676841== at 0x48399AB: free (vg_replace_malloc.c:538)
==676841== by 0x6B0E8BD: amdgpu_bo_slab_free (amdgpu_bo.c:863)
==676841== by 0x6B89D4A: pb_slabs_reclaim_locked (pb_slab.c:84)
==676841== by 0x6B89D4A: pb_slab_alloc (pb_slab.c:130)
==676841== by 0x6B0EE7F: amdgpu_bo_create (amdgpu_bo.c:1429)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4736
Fixes: 965c6445ad ("winsys/amdgpu,radeonsi: add HUD counters for how much memory is wasted by slabs")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11010>
This extension is basically only wrapping SPV_KHR_storage_buffer_storage_class
which is entirely implemented in the SPIR-V frontend.
Relevant CTS tests:
dEQP-VK.glsl.opaque_type_indexing.ssbo_storage_buffer_decoration.*
dEQP-VK.spirv_assembly.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11184>
This add a simple mechanism to select which GPU adapter the d3d12
driver should be using. A new environment variable is introduced.
MESA_D3D12_DEFAULT_ADAPTER_NAME
This represent a substring to search for in the GPU descrition,
for example "NVIDIA" or "INTEL", or "NVIDIA GeForce RTX 3090",
etc...
GPU are searched in order and the first one to include the substring
becomes a match. If no match is found, we default to the first
enumerated GPU.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10710>
The requirements for ES 3.1 are lower than the requirements for desktop
GL. The thread block size can be smaller, and images/buffers/atomics
need not be supported in the fragment stage. Allow a driver to expose
ES 3.1 without flipping on the desktop GL extensions.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10569>
The hardware has no support for 3d image loads/stores. So present the
image as a larger 2d image and fudge the coordinates. Note that a 2d
image (in the shader) may be backed by a slice of a 3d image, so we
always have to do the coordinate adjustments for 2d as well.
This is largely copied from the nv50 support, which has the same
restriction, with extra care taken to differentiate loads (which
specifies the X coordinate in bytes) and stores, which specifies it in
(formatted) pixels.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10820>
Prior to an earlier commit, xfb queries were not being marked as 64-bit.
The end result of this is that they would never appear to be "ready",
which in turn led to there always being a wait happening.
Once these got marked as 64-bit, we started checking the attached fence
for being signalled. However the screen fence does not seem to be enough
to wait for the streamout query data to actually be written out. So
instead we add a bit of extra "data" which emulates the 32-bit query way
of doing things (with the payload in front) which is emitted from the
same "unit" as the other streamout data. This seems to be sufficient.
Note that it does not seem to be required to actually emit the final
32-bit query from the streamout unit, but that seems logical and perhaps
there are edge cases where it is required.
While at it, also make the sequence management/initialization more
similar to the nvc0 driver.
Fixes dEQP-GLES3.functional.transform_feedback.*
Fixes: 58d47ca324 ("nv50: add compute invocations counter")
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10867>
Fix defect reported by Coverity Scan.
Side effect in assertion (ASSERT_SIDE_EFFECT)
assignment_where_comparison_intended: Assignment deviceMask = 1U
has a side effect. This code will work differently in a non-debug
build.
Fixes: 234e1b7356 ("v3dv: implement VK_KHR_device_group")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11197>
On AGX, the special register for front facing is inverted from its meaning in
APIs. We need to lower load_front_face to inot(load_back_face). Doing this in
the backend is trivial, but then we would miss out on algebraic optimizations
for the inot.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11199>
Somehow fairly recently the traces CI job started hitting timeouts, not
all the time but enough to be inconvenient for CI. I tracked it down to
getting into a situation where `ctx->batch->flush == true`, which causes
an infinite loop in the draw_vbo and clear paths (because
fd_batch_lock_submit() checks for flushed batch but fd_context_batch()
does not). I'm not entirely sure how we get into that state, or what
triggered this (seems possibly triggered by !10937). But it is easy
enough to recover.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11196>
Now that we have an idea of how many regs the conflicting allocation uses,
we can just skip to the next one and save repeated tests to find the same
conflicting neighbor again.
shadowrun-returns shader-db time on skl -1.62821% +/- 1.58079% (n=679),
now there's no statistically significant change from the start of the series
(n=420)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
By using the new class type, we don't need to make 1928 different
registers to represent each contigous reg size starting from the actual
128 HW register, or have a mapping between RA regs and HW base regs. With
the number of regs reduced, and the fast q computation when using the new
classes, we no longer need to compute our own q.
This drops the FS RA initialization time on my CFL system from about 1ms to
50us.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>
In the fully general case of register classes, to expose an allocation
class of unaligned 2-contiguous-regs allocations, for example, you'd have
your base individual regs (128 on intel), and another set of 127 regs that
each conflicted with the corresponding pair of the base regs. Single-reg
nodes would allocate in the 128, and double-reg nodes would allocate in
the 127 and the user would remap from the 127 down to the base regs with
some irritating table.
If you need many different contiguous allocation sizes (16 is a pretty
common number across drivers), your number of regs explodes, wasting
memory and making the q computation expensive at startup.
If all the user has is contiguous-reg classes, we can easily compute the q
value up front (as found in the intel driver and nouveau, for example),
and we only have to change a couple of places in the conflict-checking
logic so the contiguous-reg classes can use the base registers.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9437>