In 2db8867943, we introduced a new meta-op MOV_FOR_SCRATCH which is
identical to MOV except it lets us identify MOVs emitted during spilling
so we know not to re-spill those instructions. We emit them from
shuffle_for_64bit_data whenever the new for_scratch parameter is true.
Unfortunately, we missed the one used for resolving swizzles.
Fixes: 2db8867943 "intel/vec4: Don't spill fp64 registers more..."
Tested-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11155>
Iris only runs on BDW+ and ANV already handles this by not even trying
on anything older than HSW. The only driver benefiting from this common
check is i965. Moving it out makes the pass more generic and if some
driver comes along which can push UBOs on IVB, it should work for that.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11145>
Fixes the following building error:
FAILED: out/target/product/x86_64/obj_x86/SHARED_LIBRARIES/i965_dri_intermediates/LINKED/i965_dri.so
...
ld.lld: error: undefined symbol: brw_compile_ff_gs_prog
>>> referenced by brw_ff_gs.c:56 (external/mesa/src/mesa/drivers/dri/i965/brw_ff_gs.c:56)
Fixes: 52e426fd8b ("intel/compiler: add support for compiling fixed function gs")
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10718>
A successful AHardwareBuffer_allocate itself will increase a refcount on
the newly allocated AHB. For the import case, the implementation must
acquire a reference on the AHB. So if we layer the exportable allocation
on top of AHB allocation and AHB import, we must release an AHB
reference to avoid leak.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10940>
On XeHP there are restrictions on types of source and destinations
with float types. As shuffle is implemented using MOV we need to make
sure we lower it to supported types.
This fixes tests like :
dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemax_vec4_vertex
dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemul_f16vec3_vertex
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10902>
Fixes perf regression introduced from tileY LID order for CS
shaders that access both textures and buffers. Walks LIDs in
X-major fashion, but with blocks of height 4. This maps LIDs per
HW thread for SIMD8/16/32 as (2x4/4x4/8x4), which is always good
for tileY resources and usually good for linear resources.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10733>
Computer shaders that access tileY resources (textures) benefit
from Y-locality accesses. Easiest way to implement this is walk
local ids in Y-major fashion, instead of X-major fashion. Y-major
local ids will reduce partial writes and increase cache locality
for tileY accesses since tileY resources cachelines progress in
Y direction.
Improves performance on TGL:
Borderlands3.dxvk-g2 +1.5%
Y-major can introduce a performance drop on CS that use mixture
of buffers and images. This should be fixed in next commit.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10733>
This is the vec4 equivalent of d0d039a4d3, required for proper UBO
pushing in vertex stages for Vulkan on HSW. Sadly, the implementation
requires us to do everything in ALIGN1 mode and the vec4 instruction
scheduler doesn't understand HW_GRF <-> UNIFORM interference so it's
easier to do the whole thing in the generator. We add an instruction
to the top of the program which just means "emit the blob" and all the
magic happens in codegen.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
In order to avoid switching pull constants to push constants and then
having to back to pull, compute the push ranges up-front. This way we
know by the time we emit code exactly what ranges are pushable. This is
a bit inefficient in the case where the "normal" push constants get
compacted. However, most apps don't use giant piles of dead uniforms
combined with substantial UBO use so this should be ok.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
The way we handle spilling for fp64 in vec4 is to emit a series of MOVs
which swizzles the data around and then a pair of 32-bit spills. This
works great except that the next time we go to pick a spill reg, the
compiler isn't smart enough to figure out that the register has already
been spilled. Normally we do this by looking at the sources of spill
instructions (or destinations of fills) but, because it's separated from
the actual value by a MOV, we can't see it. This commit adds a new
opcode VEC4_OPCODE_MOV_FOR_SCRATCH which is identical to MOV in
semantics except that it lets RA know not to spill again.
Fixes: 82c69426a5 "i965/vec4: support basic spilling of 64-bit registers"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
The vec4 back-end can't push UBOs just yet but it soon will be able.
When it starts pushing UBOs, it will have a lower limit than scalar due
to a crummy register allocator. Mirror that limit in ANV so we don't
run into asserts due to ANV and the back-end making different choices.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
A bunch of performance counters rely on register snapshots on top of
the OA reports. Those are already conditional to the query mode in the
equations :
availability="true $QueryMode &&"
This change allows to disable counters that are only available with
additional register snapshots. This will be useful if you only want to
OA reports to extract performance counter values.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Antonio Caggiano <antonio.caggiano@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10216>
We don't want to have to deal with vector phis in freedreno, because
vectors are always split/unsplit around vectorized instructions anyways,
and the stated reason for not scalarising them (it hurting coalescing)
won't apply to us because we won't be using nir_from_ssa. Add this
option so that we don't have to do the equivalent thing while
translating from NIR.
Reviewed-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>