Return the modifier's score, which indicates the driver's preference for the
modifier relative to others. A higher score is better. Zero means
unsupported.
Intended to assist selection of a modifier from an externally provided list,
such as VkImageDrmFormatModifierListCreateInfoEXT.
v2:
- Rename anv_drm_format_mod_score to isl_drm_modifier_get_score.
- Squash all incremental changes to anv_drm_format_mod_score.
v3:
- Drop redundant 'unlikely'. (for nchery)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
The code did not return error when VK_IMAGE_CREATE_DISJOINT_BIT was
incompatible with the other input params.
If the Vulkan spec forbids a set of input params for vkCreateImage,
but permits them for vkGetPhysicalDeviceImageFormatProperties2,
then vkGetPhysicalDeviceImageFormatProperties2 must reject those input
params with failure.
- v2: Clearer commit message.
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The general layout can be used for transfers, so we need to make sure
the vulkan driver knows. This will help the driver know when it needs to
flush caches.
While we're at it, also add shader-read, which is another access we use.
We should stop using that one ASAP, but for now this seems like the
right thing to do.
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7652>
This avoids redundant per-layer operations that are the same across
layers or that only need to do once. Namely:
- The sampler for the blit source is the same for all layers.
- The decision about whether we need to load TLB contents or not only
needs to be done once.
- Some command buffer state such as the pipeline, the viewport and the
scissor is the same for all layers and should only be bound once.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
This is much faster than the blit fallback (which requires to upload
the linear buffer to a tiled image) and the CPU path.
A simple stress test involving 100 buffer to image copies of a
single layer image with 10 mipmap levels provides the following
results:
Path | Recording Time | Execution Time |
-------------------------------------------------|
Texel Buffer | 2.954s | 0.137s |
-------------------------------------------------|
Blit | 10.732s | 0.148s |
-------------------------------------------------|
CPU | 0.002s | 1.453s |
-------------------------------------------------|
So generally speaking, this texel buffer copy path is the fastest
of the paths that can do partial copies, however, the CPU path might
provide better results in cases where command buffer recording is
important to overall performance. This is probably the reason why
the CPU path seems to provide slightly better results for vkQuake2.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7651>
src0 and src1 were mixed leading to invalid varying indices. In order to
fix that properly, we first extend load_vary to pass the immediate index
through a dedicated field and add a special boolean. This way, we don't
have to make sure src0 always contains the index, and can instead match
the src numbering defined in ISA.xml.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7636>
mir_estimate_pressure often underestimates the register pressure,
letting too many registers be used for uniforms, causing RA to fail.
Mitigate this by demoting some uniforms back to explicit loads to free
up work registers if register allocation fails.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7616>
We now have NIR opt_large_constants support in place, so we can flip the
switch and get better optimization before lowering to a constant buffer,
but also avoid having constant data mixed in with the shader's uniforms,
which should lower CPU overhead on affected shaders.
Only a few shaders are affected (<.01% impact across shader-db), but for
those the impact is pretty big:
instructions in affected programs: 748 -> 639 (-14.57%)
nops in affected programs: 364 -> 284 (-21.98%)
non-nops in affected programs: 384 -> 355 (-7.55%)
mov in affected programs: 47 -> 27 (-42.55%)
cov in affected programs: 9 -> 6 (-33.33%)
dwords in affected programs: 932 -> 836 (-10.30%)
full in affected programs: 13 -> 14 (7.69%)
constlen in affected programs: 140 -> 64 (-54.29%)
(ss) in affected programs: 14 -> 15 (7.14%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
Right now if the shader indirects on some large constant array, we see NIR
load_consts (usually from the const file) of its contents into general
registers, then indirection on the GPRs. This often results in register
allocation failures, as it's easy to go beyond the ~256 dwords of
registers per invocation.
By moving the large constants to a UBO, we can load an arbitrary number of
them. They also can be theoretically moved to the constant reg file (~2k
dwords), though you're unlikely to hit this path without an indirect load
on your large constant, and we don't yet let UBO indirect loads get moved
to constant regs.
This possibly won't work out right if we have 16-bit load_constants, but
without other MRs in flight we won't see 16-bit temps to be lowered to
this.
This allows 2 kerbal-space-program shaders to compile that previously
would fail, and fixes the new dEQP-VK and -GLES2 tests I wrote that
dynamically index a 40-element temporary array of float/vec2/vec3/vec4
with constant element initializers.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2789
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
If you're loading a 32b word from the const file and doing a cov.u32u16
split to two 16bit values, we can't turn that into a reference of a 16-bit
float value directly from the constbuf, because the
CONSTANT_DEMOTION_ENABLE results in a f2f16 operation on the 32-bit value
that we didn't want.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
cffdump looks at the following 4 instructions to decide if the shader has
*really* ended, so if we pack data after that (such as turnip's next
stage's shader), it might decode instructions that aren't really part of
the shader.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>
It's supposed to be ralloced -- there's not even a shader variant destroy
function for freeing, just ralloc_free() on the ir3_shader_variant or the
parent ir3_shader when you're done!
Fixes: f97acb4bb4 ("freedreno/ir3: disk-cache support")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5810>