With this patch, VM binds remain synchronous in relation to vm_bind()
KMD backend calls. However, the syscalls required for VM bind is
reduce in 2(in the optimal cases), the syncobj create and destroy
syscall are replaced by he usage a timeline syncobj.
Next step will be make this completely asynchronous.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26805>
Launch with :
$ MESA_VK_TRACE=rmv MESA_VK_TRACE_TRIGGER=/tmp/trig ./my_app
In another terminal, trigger a capture :
$ touch /tmp/trig
The application with create a snapshot and print out :
RMV capture saved to '/tmp/my_app_2024.01.19_10.56.33.rmv'
Then just open it with RMV :
./RadeonMemoryVisualizer /tmp/my_app_2024.01.19_10.56.33.rmv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26843>
Sync xe_drm.h with commit a8ff56e160bb ("drm/xe/uapi: Remove reset uevent for now").
This is the last xe_drm.h uAPI break.
The only relevant change for ANV and Iris is that now VM bind uAPI
is asynchronous only so I had to bring back the syncobj creation, wait
and destruction.
Is still in the Xe port TODO list to make VM binds truly asynchronous.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26699>
Batch bos are always allocated with ANV_BO_ALLOC_HOST_CACHED_COHERENT
so there is no need to do cflush calls.
But if we ever decide to change that anv_bo_needs_host_cache_flush()
will make sure cflush is called.
Outside of batch bos, this patch is also removing the
intel_flush_range() call from anv_QueuePresentKHR because
device->debug_frame_desc is offset of workaround_bo that is also
allocated as ANV_BO_ALLOC_HOST_COHERENT.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26457>
As suggested by Lionel, here adding ANV_BO_ALLOC_HOST_COHERENT
and with that ANV_BO_ALLOC_HOST_CACHED_COHERENT is now defined by
(ANV_BO_ALLOC_HOST_COHERENT | ANV_BO_ALLOC_HOST_CACHED).
In some callers of anv_device_alloc_bo() was necessary to add
ANV_BO_ALLOC_HOST_COHERENT as no other flag was set and that
was the default behavior up to now.
A change that could look not related is the removal of the
intel_flush_range() in anv_device_init_trivial_batch(), that was done
because trivial_batch_bo is HOST_COHERENT so no flush is necessary.
And it did not made sense to make it ANV_BO_ALLOC_HOST_CACHED_COHERENT
as it was never read in CPU.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26457>
This changes allow us to support HOST_COHERENT, HOST_CACHED and
HOST_COHERENT + HOST_CACHED memory types for platforms that has
the PAT uAPI.
Be aware that Xe KMD will not be able to support cached only memory
types, anv_xe_physical_device_init_memory_types() will reflect that
but internal usage should not allocate
VK_MEMORY_PROPERTY_HOST_CACHED_BIT only memory, hence the assert
added.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25462>
A "dance" is required with this uAPI, first we need to ask KMD what is
the size of the giving query id, then memory needs to be allocated to
match that size and then query again with the memory address set and
at this time Xe KMD will copy the query data to memory.
This dance was being duplicated in xe_engine_get_info() and
anv_xe_physical_device_get_parameters() and the next patch will also
use it in Iris, so here adding it common/xe and re-using as much
as possible.
There is one more implementation of this function in intel/dev but
due to how libs are linked intel/dev can't depend on to intel/common.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26325>
We need to wait for the batches to complete before we return the BOs
to the pool. We were previously doing this completely synchronously,
which made the code unnecessarily wait. Now we have a timeline syncobj
that signals completion of the previous BOs, so sometimes we check
where we are in the timeline and then return the BOs that we know are
unused.
This, in addition to the previous patch that made us wait for the
other syncobjs through the execbuf ioctl instead of through the CPU,
makes TR-TT batches at least an order of magnitude faster. Still, I
don't think we'll notice any changes in games's FPS as they don't bind
sparse resources that often.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
If we're going to move syncobj waiting/signaling down to the backend
we're going to need a queue to signal as lost in case those operations
fail.
In some places of the stack we don't have a queue available, such as
when we're creating or destroying resources. For those, for vm_bind
cases we don't use the queue for anything so passing it as NULL is
fine. For TR-TT we are already using device->trtt.queue.
For TR-TT specifically this also means we're going to start using the
actual queues from the call stack instead of trtt->queue, but that
shouldn't make any difference since we only ever have one queue.
Still, this is more technigally correct.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
Our ultimate goal is to have the backend functions deal with the wait
and signal syncobjs instead of waiting for them on the CPU inside
anv_queue_submit_sparse_bind_locked(). For that, we'll need waits and
signals parameters to be passed all the way to the backend functions
that actually make the submission, and this is what this patch does,
through struct anv_sparse_submission.
This patch just deals with passing the parameters to the functions,
nothing is using the new variables yet. There should be no functional
changes here. The goal here is to make code review easier.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
TR-TT is a hardware feature supported by both i915.ko and xe.ko, which
means we can now finally have Sparse Resources on i915.ko and we also
have 2 options for xe.ko (and whatever is the best should be the
default).
In this patch we use batch commands to write the page tables and
forever keep them in device memory. We maintain a mirror of both the
L3 and and L2 tables because that helps us never having to read the
tables that are in device memory.
We still have some things to improve, but with this commit, workloads
that didn't work at all due to the lack of sparse resources should
at least run.
This is still all disabled by default in i915.ko, you can turn it on
by exporting ANV_SPARSE=1 before launching the applications. For
xe.ko, switch the default with ANV_SPARSE_USE_TRTT=1.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25512>
When anv_device_map_bo() is called from anv_device_alloc_bo() it gets
VkMemoryPropertyFlags set to 0 so it ends up with a write-combine
caching for integrated platforms with LLC, see 'if (!(property_flags &
VK_MEMORY_PROPERTY_HOST_CACHED_BIT)))'.
Current approach also has issues when mapping with anv_MapMemory2KHR()
as it would not have information to know that BO is a scanout.
It was also not properly calculating mmap mode for platforms with PAT
uAPI before "anv: Change default PAT entry to WC".
So here storing alloc_flags to anv_bo so there is no mismatches
between different code paths then using it to properly
calculate the mmap mode.
alloc_flags in anv_bo will also be used to calculate PAT index in
future patches.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
It was never actually async as it was doing a DRM_IOCTL_SYNCOBJ_WAIT
right after DRM_IOCTL_XE_VM_BIND but it was required to allow the
partial binds required by sparse.
But it is now fixed and we can switch back to sync vm bind.
In future we will switch back to async vm bind to improve performance
but this time it will be properly implemented.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25300>
Sync xe_drm.h with commit xxxxx ("drm/xe/uapi: Fix naming of XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY").
One not so straght forward change is that sync VM binds now don't
require a syncobj anymore, the uAPI will return as soon the VM bind
operations are done.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25300>
Our sparse implementation will require us to issue partial unbinds,
but partial unbinds are not supported in synchronous vm_bind ioctls,
requiring us to to have our VM be marked with the ASYNC flag. This is
not properly documented and is subject to change in the next
iterations of the API.
Error handling with async binds is also not documented anywhere and is
being actively discussed in the mailing lists, so whatever we decide
to implement here is likely to end up changing in a few weeks. Also, I
haven't seen these errors happening in the real world, so for now
they're a very corner case. So for now just foward errors to
user-space and hope things work.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24681>
For Sparse Resources we need to be able to specify the address, size
and offsets and we also want to be able to issue multiple binds at the
same time. Extend xe_vm_bind_op() to handle those cases and add
the new vfunc.
v2:
- use STACK_ARRAY() (Lionel)
- no more need to work around xe.ko bug that was fixed (José)
Reviewed-by: José Roberto de Souza <jose.souza@intel.com> (v1)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24681>
The only driver that has a vm_bind ioctl is xe.ko, and its vm_bind
ioctl is not called GEM vm_bind, it's just DRM_IOCTL_XE_VM_BIND
(without GEM anywhere). Back when i915.ko was going to have a vm_bind
ioctl it had GEM on its name, so I guess that's how "gem" appeared in
the naming here, but now nothing does, so let's get rid of it.
Also, these vfuncs we have are specifically made to bind and unbind
whole BOs, so rename them to vm_bind_bo() and vm_unbind_bo() in order
to try to clarify what they mean. The goal is to add a more generic
vm_bind() later that can do anything.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24681>
We need to synchronize main (CCS/BCS) and companion rcs batch, so let's
create an empty batch and make both the batches (CCS/BCS) and companion
RCS batch wait on empty sync batch and signal the fence.
Reason to execute the empty batch is we need to make sure the companion
RCS batch finish as soon as the CCS/BCS batch finish. Preemption could
prevent the companion RCS batch execution and we might end up destroying
the CCS/BCS batch before companion RCS finishes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23661>
Xe KMD only requires userptr to be bound to VM, so here reusing
workaround_bo->gem_handle id to all userptr bos in Xe version of
gem_create_userptr(). The Xe version of gem_close() will make sure
that workaround_bo->gem_handle is not closed when userptr bos
are closed.
With the same gem_handle for all userptr bos, it was also necessary
skip the anv_device_lookup_bo() and manually allocate memory to store
anv_bo in host heap memory, what lead to some small changes in
anv_device_release_bo() as well.
The remaining changes are the support to VM bind userptr bos and the
gem_vm_bind() call in anv_device_import_bo_from_host_ptr().
Fixes: dEQP-VK.memory.external_memory_host*
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23787>