Enabling this fast path, while making CPU side simpler
(and thus reducing driver's CPU overhead), forces us to
generate multiple VERTEX_BUFFER_STATE packets pointing to
the same buffer, but with slightly shifted BufferStartingAddress.
This is fine on GFX version >= 11 and < 8, but on version
8 and 9, thanks to internal details of the VF cache, vertex
data from each VERTEX_BUFFER_STATE is cached separately
and this results in a lot of cache misses.
Disabling this fast path restores previous performance levels.
On my SKL GT2 machine this improves performance in:
- GLB27 Egypt offscreen by 37%
- GLB27 TRex offscreen by 22%
- gfxbench5 Manhattan offscreen by 1.75%
- gfxbench5 Manhattan31 offscreen by 1.9%
- gfxbench5 Aztec Ruins high by 2.3%
In Intel performance CI on GFX version 9 it improves performance in:
- gfxbench5 Manhattan offscreen by 1.65%
- gfxbench5 Aztec Ruins normal offscreen by 1.33%
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3277
Fixes: 42842306d3 ("mesa,st/mesa: add a fast path for non-static VAOs")
Signed-off-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9857>
In some cases we will change the type of the destination register of
an instruction. This is the type we should use to verify that we're
allow to do the replacement.
Otherwise we can hit restrictions on CHV and upcoming Xe-Hp for
instance where the copy propagation transforms this :
send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD
mov(16) vgrf11:UW, vgrf10<2>:UW
mov(16) vgrf12:UW, vgrf10+0.2<2>:UW
mov(16) vgrf15:HF, |vgrf11|:HF
mov(16) vgrf16:HF, |vgrf12|:HF
mov(8) vgrf41<2>:UW, vgrf15+0.0:UW group0
mov(8) vgrf42<2>:UW, vgrf15+0.16:UW group8
mov(8) vgrf45<2>:UW, vgrf16+0.0:UW group0
mov(8) vgrf46<2>:UW, vgrf16+0.16:UW group8
into this :
send(16) (mlen: 2) vgrf10:UD, 0u, 0u, vgrf35:D, null:UD
mov(8) vgrf41<2>:HF, |vgrf10+0.0|<2>:HF group0
mov(8) vgrf42<2>:HF, |vgrf10+1.0|<2>:HF group8
mov(8) vgrf45<2>:HF, |vgrf10+0.2|<2>:HF group0
mov(8) vgrf46<2>:HF, |vgrf10+1.2|<2>:HF group8
Because of the floating point use, stride and offets should be the
same.
v2: Fix final destination type selection (Curro)
v3: constify (Curro)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9832>
Otherwise pass->tile_align_w will be 0, leading to a divide by zero and
undefined behavior. In practice, I saw this lead to an infinite loop in
tests like
dEQP-VK.draw.instanced.draw_indexed_indirect_vk_primitive_topology_line_list_attrib_divisor_0_multiview
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9606>
A little low-stakes RE effort as I unwind from fighting CI all day. Comes
from diffing dEQP-VK.clipping.user_defined.clip_distance.vert.* on the
blob and comparing to a6xx behavior. (My blob doesn't do tess, so if
there are equivalent tess fields for some of these, I didn't find them)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9870>
They are not supported by the hardware.
Fixes:
dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.buffer_0_32_short3_vec4_dynamic_draw_quads_1
dEQP-GLES2.functional.vertex_arrays.single_attribute.strides.buffer_0_32_short3_vec4_dynamic_draw_quads_256
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9877>
The indirect draw call already encodes the index bias so that no
additional encoding in the hardware is needed in this case.
This fixes a regression with a number of tests from
dEQP-GLES31.functional.draw_indirect.random.*
Fixes: c6c532faa8
"gallium/u_vbuf: use updated pipe_draw_start_count while using draw_vbo"
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9877>
It seems that this has to be 4 (radeonsi also sets this value to 4)
Fixes deqp-gles31:
functional.texture.texture_buffer.modify.bufferdata.offset_1_alignments
functional.texture.texture_buffer.modify.bufferdata.offset_7_alignments
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9877>
We need to check against the correct define, DRM_FORMAT_INVALID.
Checking an uint32_t against DRM_FORMAT_MOD_INVALID will always produce
false anyway.
Caught by Coverity.
Fixes: b85c6531f6 ("frontends/va: add support for VA_EXPORT_SURFACE_COMPOSED_LAYERS")
Reviewed-by: Simon Ser <contact@emersion.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9897>
pan_blend_shaders.{c,h} files are removed from Makefile.sources
in order to avoid the following building error:
FAILED: out/target/product/x86_64/obj_x86/STATIC_LIBRARIES/libmesa_pipe_panfrost_intermediates/pan_blend_shaders.o
...
clang: error: no such file or directory: 'external/mesa/src/gallium/drivers/panfrost/pan_blend_shaders.c'
clang: error: no input files
Fixes: 7994929e84 ("panfrost: Use the blend shader cache attached to the device")
Acked-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9894>
While looking at the traces emitted by chromium, I saw a lot of those
errors. But looking at the value of ns, it is 0, so it's probably just
the application checking whether work is done or not. Not much point
in printing out an error.
v2: check ret value (Christian)
check both timeout codes (Lionel)
Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9875>
This changes how the L3 cache affinity code works out the affinity
masks. It works better with multi-CPU systems and should also be
capable of handling big/little type situations if they appear in
the future.
It now iterates over all CPU cores, gets the core count for each
CPU, and works out the L3_ID from the physical CPU ID, and
the current cores L3 cache. It then tracks how many L3 caches
it has seen and reallocate the affinity masks for each one.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/4496
Fixes: d8ea509965 ("util: completely rewrite and do AMD Zen L3 cache pinning correctly")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9782>