The combination of removing bottlenecks in userspace (userspace fences,
etc) and slow GPU results in hitting full ringbuffer on a306. Haven't
figured out a reasonable way to work around that in userspace until a
kernel fix is in place, so disable this one for now.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Move the submit ioctl to it's own thread to unblock the driver thread
and let it move on to the next frame.
Note that I did experiment with doing the append_bo() parts
synchronously on the theory that we should be more likely to hit the
fast path if we did that part of submit merging before the bo was
potentially re-used in the next batch/submit. It helped some things
by a couple percent, but hurt more things.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
This is a bit ugly, but with userspace fences and deferred submits it is
better to poll until the timestamp buffer is ready. Otherwise we could
be triggering deferred submits to flush sooner than they normally would,
changing the behavior of what we are trying to measure.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
For submits flushed with (a) no required fence, and (b) no externally
visible effects (ie. imported/exported bo), we can defer flushing the
submit and merge it into a later submit.
This is a bit more work in userspace, but it cuts down the number of
submit ioctls. And a common case is that later submits overlap in the
bo's used (for example, blit upload to a buffer, which is then used in
the following draw pass), so it reduces the net amount of work needed
to be done in the kernel to handle the submit ioctl.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/19
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Move everything into a struct assocated with the pipe_fence_handle, so
that the drm layer can fill in the seqn/fd fences directly.
This will give us a comvenient place to insert a util_queue_fence in the
next commit.
While we're at it, extract the uint32_t fence (previously called
'timestamp' in place, a kgsl legacy) into a struct that encapsulates
both the kernel fence and the userspace fence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Without deferred flushes, we would track the last_fence any time when
there had be no rendering/etc since the last flush, to avoid creating
empty batches simply for the purpose of getting a fence. We can do the
same thing with deferred flushes, where the fence is created from the
frontend thread before we know what will be flushed, by repopulating
the pre-created fence.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
We only need fences for batches flushed via pctx->flush(). This will
serve as a useful signal about submits that can be merged.
v2. disable this optimization pre-a6xx until I can debug issues that
submit merging exposes
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Add a per-fd_pipe fence "timeline" so we can detect cases where we don't
need to call into the kernel to determine if a fd_bo is still busy.
This reuses table_lock, rather than introducing a per-bo lock to protect
fence state updates because (a) the common / hotpath pattern is to
update fences on a lot of objects, but checking the fence state of a
single object is less common, and (b) because we already hold the table
lock in common spots where we need to check the bo's fence state (ie.
allocations from the bo-cache).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
There are a couple cases where we want to use _NOSYNC, but at the same
time we want to ensure that rendering related to a bo is actually
flushed.
This doesn't do anything yet, but when we start deferring/merging
submits we'll need a way to trigger anything deferred to flush.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>
Most of them were actually unused. The memory type (KMEM vs SMI) only
applied to very old a2xx era devices that had a small/fast stacked
memory (SMI) vs normal memory (KMEM). And the cache flags are ignored
(ie. everything is writecombine), but we can add new cache flags later
when they actually do something.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10444>