We take advantage of the needs_quad_helper_invocation information to
only enable the PER_QUAD TMU access on Fragment Shaders when it is needed.
PER_QUAD access is also disabled on stages different to fragment shader.
Being enabled was causing MMU errors when TMU was doing indexed by vertexid
reads on disabled lanes on vertex stage. This problem was exercised by some
shaders from the GTK new GSK_RENDERER=ngl that were accessing a constant buffer
offset[6], but having PER_QUAD enabled on the TMU access by VertexID was
doing hidden incorrect access to not existing vertex 6 and 7 as TMU was
accessing the full quad.
cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28740>
We no longer support the old LINE+MAC lowering, and we already lower
this to MAD in NIR on Gfx11+, so the LINTERP virtual opcode always
corresponds the PLN. The only catch is that LINTERP's operands are
reversed from PLN, so we have to switch them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28705>
None of the callers of anv_free_sparse_bindings() check for its return
result, and they also don't have a way to propagate it up the stack.
So just don't return error codes that won't be checked. Instead,
add an assertion so at least we can detect failures in our CI or
development runs.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
The device->using_sparse variable is only used at cmd_buffer_barrier()
to decide if we need to apply the heavier-weight flushes that are only
applicable to sparse resources. The big problem here is that we need
to apply the flushes to the non-image and non-buffer memory barriers,
so we were trying to limit those only to applications that ever submit
a sparse resource to the sparse queue.
The reason why we were applying this only to devices that ever
submitted sparse resources is that dxvk games have this thing where
during startup they create and then delete tiny sparse resources, so
switching device->using_sparse to true at resource creation would make
basically every dxvk game start applying the heavier-weight
workaround.
The problem with all that is that even if an application creates a
sparse resource but doesn't ever bind them, the resource should still
behave as an unbound resource (because they are bound with a NULL
bind), so the flushes affecting them should happen. This case is
exercised by vkd3d-proton/test_buffer_feedback_instructions_sm51.
In order to satisfy all the above cases and only really apply the
heavier-weight flushes to applications actually using sparse
resources, let's just count the number of sparse resources that
currently exist and then apply the workaround only if it's not zero.
That covers the dxvk case since dxvk deletes the resources as soon as
they create, so num_sparse_resources goes back to 0.
Testcase: vkd3d-proton/test_buffer_feedback_instructions_sm51
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10960
Fixes: 6368c1445f ("anv/sparse: add the initial code for Sparse Resources")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
Since we moved the dump_anv_vm_bind() call to anv_sparse_bind(), that
BEGIN/END block stopped making sense, so just keep the first set of
messages.
Also wrap everything around a single INTEL_DEBUG() check so we'll only
run this check once when debug is disabled (we don't care about
running the check multiple times if it's enabled).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28724>
Some uses don't have any post-dominator. An example is an atomic that
feeds itself in a loop. No instruction immediately post-dominates
the result of such an atomic because no instruction can strictly
post-dominate itself. This handles that case generally by setting
the root node as the post-dominator for instructions that can't be
reordered.
Fixes: ba54099dce - nir: add a utility computing post-dominance of SSA uses
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28436>
Reorder its members to fill the current padding hole, reducing the
struct size from 32 to 24.
This struct appears multiple times inside struct anv_image and its
members, so this change brings down sizeof(struct anv_image) from
1744 to 1600.
We went from:
struct anv_image_memory_range {
enum anv_image_memory_binding binding; /* 0 4 */
/* XXX 4 bytes hole, try to pack */
uint64_t offset; /* 8 8 */
uint64_t size; /* 16 8 */
uint32_t alignment; /* 24 4 */
/* size: 32, cachelines: 1, members: 4 */
/* sum members: 24, holes: 1, sum holes: 4 */
/* padding: 4 */
/* last cacheline: 32 bytes */
};
to:
struct anv_image_memory_range {
enum anv_image_memory_binding binding; /* 0 4 */
uint32_t alignment; /* 4 4 */
uint64_t size; /* 8 8 */
uint64_t offset; /* 16 8 */
/* size: 24, cachelines: 1, members: 4 */
/* last cacheline: 24 bytes */
};
Considering we can have tens of thousands of anv_image structs
allocated at the same time on gaming workloads, this can save us a few
MB of memory. It ain't much but it's honest work.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28700>