We want to use actual sparse-capable queues as the default
trtt->queue, not copy queues that may have a companion_rcs_batch.
Before this patch, if we expose more than one queue *and* the
application creates a copy queue first, we'll end up setting
trtt->queue as the copy queue, which will GPU hang when we submit the
TR-TT batches as they don't support the pipe_control commands we
issue.
The trtt->queue queue is used for binding/unbinding buffers in code
paths where there's no specific queue coming from user space, such as
when we're creating or destroying a sparse resource.
This is not a problem yet on i915.ko since we are exposing
only a single queue, and it is not a problem for xe.ko since TR-TT is
not the default there. This is also not a problem in applications
that create the render or compute queue first. We plan to expose more
queues when using TR-TT, so this would become a problem without this
patch.
None of VK-GL-CTS seems to exercise that, and none of the Steam games
I tested exercise that as well. I was able to reproduce this issue
using our internal tracing tool.
v2: New implementation that doesn't break when we only have a compute
queue (Lionel).
Fixes: 04bfe828db ("anv/sparse: allow sparse resouces to use TR-TT as its backend")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
On Gen12 (the oldest we support on Mesa right now for TR-TT) we
started having per-engine TR-TT registers and we are supposed to make
all contexts share the same TR-TT programming.
On LNL+, this is documented in the BSpec page for the TRTT_CNTRL
register (68417), with more details in HSDs 14020454786 and
16022013154.
On Gen12 platforms this information is a little harder to find and
there's a whole trail of HSDs leading up to 1209977595, which links to
the documents that describe the programming. BSpec for TR-TT on Gen12
is very confusing as it still contains registers and other information
from Gen11 that were not removed.
Regarding the additional BLT and COMP registers, please notice that on
the BSpec pages for the TR-TT registers, the "Register Instance"
section only lists the GFX registers as non-privileged. However, the
"User Mode Privileged Commands" lists the other instances of the TR-TT
Regsiters as non-privileged, which matches what we see: there's no
need to put these addresses in the FORCE_TO_NONPRIV registers.
Notice that for now, when TR-TT is being used we only expose a single
queue, so this change effectively does nothing until we start exposing
extra queues. I left that part for later to help bisectability.
v2:
- s/trtt_init_context_state/trtt_init_queues_state/ (José)
- pass device as the argument to init_queues_state (José)
v3:
- use async_submit_end (José)
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
Having this as a separate batch was the normal behavior until
7da5b1caef ("anv: move trtt submissions over to the
anv_async_submit").
While it certainly sounds better to do everything related to TR-TT
initialization in one batch, we need to revert it back to be a
separate batch (but now using the new anv_async_submit infrastructure)
because we'll want to run this batch on every engine.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30252>
Pretty big change... Sorry for that.
I can't exactly remember why I created 2 heaps. I think it's because I
mistakenly thought the samplers in the binding sampler pointers needed
to be indexed from the binding table. But that's not the case, they
just need to be in the dynamic state heap.
In the future, this change will allow to also allocate buffers for
push constant data in the newly created dynamic_visible_pool which
will be useful on < Gfx12.0 where this is the only place push constant
data can live for compute shaders.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30047>
When I915_PARAM_HAS_CONTEXT_FREQ_HINT is not supported the
intel_ioctl(fd, DRM_IOCTL_I915_GETPARAM, &gp) will return -1 and that
will cause i915_gem_get_param() to return false.
val will be different than 1 when not using GuC submission, so we are
forcing val check to ensure this holds good in platforms that doesn't
support GuC submission.
Fixes: d52dd5a9 ("anv/drirc: add option to provide low latency hint")
Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30234>
Commit 94989b45a5 ("anv,driconf: Add fake non device local memory WA
for Total War: Warhammer 3") implemented a workaround to make
Warhammer 3 work on ADL, but the game still doesn't work on LNL, which
uses xe.ko, and MTL, which uses i915.ko: it still fails at launch
claiming it couldn't allocate memory.
So in this implementation, instead of clearing DEVICE_LOCAL_BIT we
just duplicate our memory types, one having the bit and one not
having.
v2:
- Check for VK_MAX_MEMORY_TYPES (José)
- Invert the order of the memory types (José)
- Fix white space issue (José)
v3:
- Comment our non-spec-compliance (José)
- Remove useless lines (José)
Link: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8721
Fixes: 94989b45a5 ("anv,driconf: Add fake non device local memory WA for Total War: Warhammer 3")
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30162>
Xe2+ platforms don't use fast-type buffer for its new design.
We don't have to track different fast-clear types, so we just
return the highest level of support.
Fixes: Vulkan CTS
dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image
_one_region.8_bit_not_all_remaining_layers
src/intel/vulkan/anv_private.h:5439: anv_image_get_fast_clear_type_addr:
Assertion `device->info->ver < 20' failed.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29966>
Coverity has spotted a place where we could in theory overflow. In
reality it wont happen as the potential overflow is a bitfield with a
maximum of two values. Add an `assume()` statement to help out the
compiler and document our assumption.
fixes: dc1aedef2b
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29825>
Function anv_physical_device_try_create() creates the devinfo variable
and then at some point it copies its contents to device->info:
device->info = devinfo;
Much much later we're calling:
intel_common_update_device_info(fd, &devinfo);
... which is updating devinfo but not device->info. As a consequence,
we're only creating one queue, as engine_class_supported_count[klass]
is zero for everybody.
Fixes: 5b8b4f7878 ("intel/dev: Add engine_class_supported_count to intel_device_info")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29927>
This fixes corruption of push constants on Xe2 due to a mismatch in
the uniform layout implemented by the compiler and assumed by the
driver. To fix it we need to align the push constant ranges computed
by the Vulkan driver to a multiple of the GRF size of the platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29926>