Typically we free them when we upload the QPU code from the variant
to the assembly BO in the pipeline, however, if there is an error
during pipeline compilation that may not happen and we would leak
the QPU code from the variants.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
We can output the final NIR form (which we store in the pipeline
stage) and the final QPU (which we can retrive from the assembly BO).
We should be careful not to fetch the shaders from the cache when
VK_PIPELINE_CREATE_CAPTURE_INTERNAL_REPRESENTATIONS_BIT_KHR is present,
since we don't store NIR shader in the pipeline shader data that is
cached, so a cache hit would leave us without the NIR shader. The spec
already contemplates this scenario:
"Enabling this flag must not affect the final compiled pipeline but
may disable pipeline caching or otherwise affect pipeline creation
time."
We also prevent disposing of the pipeline stages the variants when this
flag is requested to ensure this information is available later when
calling vkGetPipelineExecutableInternalRepresentationsKHR.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16370>
This is actually required by Vulkan 1.2 and to expose the extension,
so let's conform to this requirement, we don't really care since
image layouts are not relevant to our current implementation.
Fixes: 1442d77bc5 ('v3dv: trivially implement VK_KHR_separate_depth_stencil_layouts')
Fixes: dEQP-VK.info.device_mandatory_features
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16398>
Until now we have been enabling binning sync if we found a barrier
involving geometry stages (a bcl barrier), however, if the actual
binning shaders involved with the job don't access any external
buffers or images there is no reason to sync at the binning stage.
In this patch we don't immediately consume the bcl barrier flag from
the command buffer state when we create a new job. Instead, we check
this state when we are about to emit a draw call by checking if the
shaders involved with binning may access external resources, such as
vertex buffers, UBOs, or textures. If none of the draw calls in the
job use binning shaders that access external resources then we never
enable binning sync for the job.
It is possible that a binning shader uses resources that are not
synchronized through a barrier though, so we keep track of the
access masks used with barriers for both buffers and images separately
to better identify if the binning shader is affected by the barrier.
If a serialized job never consumes the bcl barrier flag because none
of its draw calls ever required a bcl sync, then the flag will be
cleared when the job is finished.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16322>
Fixes failures on tests like this when the on-disk-cache is enabled:
dEQP-VK.binding_model.descriptor_copy.compute.uniform_buffer_0
We only found them when running full CTS runs. What happens is that we
got a hit from the on-disk shader cache, for several tests using the
same shaders. But some tests seems to be using a uniform buffer, and
others a inline buffer. Right now inline buffers leads to some changes
on the final nir shader, and generated assembly, compared with uniform
buffers. So we got a wrong shader. Fortunately we only got an assert
instead of weird behaviour.
With this commit we include the pipeline layout on the pipeline sha1,
so those two cases would get different sha1. FWIW, this is what other
drivers are already doing.
Surprisingly that didn't cause a problem before.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16313>
If we are calling pipeline_cache_upload_shared_data with
from_disk_cache, that means that we had used the disk-cache to found
that entry. And that should only happens if we didn't find the entry
on the cache. So on that case we can skip to search for it.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16313>
Layout transitions are not relevant to us, we only care about barriers
that involve a sync point between read/write actions on the image across
GPU jobs.
Image transitions from undefined layout can only happen before the image
is ever used by the GPU, which means they are never relevant to our
implementation.
This improves performance in vkQuake.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16235>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
The key is created on stack, so as soon as the function returns this key
is lost, so the inserted key in the hashtable is invalid.
Rather, insert a duplicated version on heap.
This fixes a stack-buffer-overflow when running some Vulkan CTS tests.
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16083>
util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:
...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.
As a consequence, this unit test:
...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.
This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.
Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called. This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.
Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
Even if we're the first job on some queue, there may be no wait
semaphores but we still need to ensure things happen in-order. (See
the "Implicit Synchronization Guarantees" section of the Vulkan spec.)
The client can submit back-to-back command buffers with no semaphores
between them and it needs to adt the same as if there were a semaphore.
If job->serialize is set because of a barrier or something, we still
need to synchronize across HW queues by waiting on last_job_syncs.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
In order to properly wait for a query to be complete, we need to first
wait for the end query job to flush through on the queue. Since query
end is always handled on the CPU, we can do this with a condition
variable. The 2s timeout is taken from ANV.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Instead of having the CPU job execute the CSD job, put both jobs on the
list with the CPU job first which modifies the GPU job which gets kicked
off next. This gives the queue code more visibility into what types of
jobs are actually in the list. In particular, if an indirect compute
job is the last job in a batch buffer, it currently appears as if the
batch ends with CPU work which isn't true because it kicks off GPU work.
In that case, the last job on the list is now a GPU job, which better
matches reality.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15704>
Will allow us to drop more GLSL IR code in future once we switch
all drivers to NIR. Also stops the need for all drivers to call
this pass to remove indirect temps that may have been added during
the NIR varying linking lowering/optimisations.
This patch fixes some tests on i915, d3d12, lima and vc4.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15871>
The array is allocated for VkDrmFormatModifierPropertiesEXT, so
writring entried with type VkDrmFormatModifierProperties2EXT is
bogus.
It seems this was a mistake added with a change intended to get
rid of VK_OUTARRAY_MAKE, that changed the type of the write by
mistake.
Fixes: 56a2ccf058 ('v3dv: Stop using VK_OUTARRAY_MAKE()')
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15819>
We have been reconstructing/rematerializing uniforms for a while, but we
can do this in more scenarios, namely instructions which result is
immutable along the execution of a shader across all channels.
By doing this we gain the capacity to eliminate TMU spills which not
only are slower, but can also make us drop to a fallback compilation
strategy.
Shader-db results show a small increase in instruction counts caused
by us now being able to choose preferential compiler strategies that
are intended to reduce TMU latency. In some cases, we are now also
able to avoid dropping thread counts:
total instructions in shared programs: 12658092 -> 12659245 (<.01%)
instructions in affected programs: 75812 -> 76965 (1.52%)
helped: 55
HURT: 107
total threads in shared programs: 416286 -> 416412 (0.03%)
threads in affected programs: 126 -> 252 (100.00%)
helped: 63
HURT: 0
total uniforms in shared programs: 3716916 -> 3716396 (-0.01%)
uniforms in affected programs: 19327 -> 18807 (-2.69%)
helped: 94
HURT: 50
total max-temps in shared programs: 2161796 -> 2161578 (-0.01%)
max-temps in affected programs: 3961 -> 3743 (-5.50%)
helped: 80
HURT: 24
total spills in shared programs: 3274 -> 3266 (-0.24%)
spills in affected programs: 98 -> 90 (-8.16%)
helped: 6
HURT: 0
total fills in shared programs: 4657 -> 4642 (-0.32%)
fills in affected programs: 130 -> 115 (-11.54%)
helped: 6
HURT: 0
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15710>