We were assertion failing on some large draws due to indices >16bits,
despite asking draw to limit the max indices. I haven't managed to track
it down, so flip us back to the older, non-index drawing path that doesn't
hit this bug until it can get fixed. Leave an I915_DEBUG=vbuf flag around
so we can look into this later.
This is a pretty big performance hit for vertex shaders. Using glmark2 -b
build:use-vbo=true:
i915g-vbuf: 211 fps
i915g-nonvbuf: 185 fps
i915c: 41 fps
Given how massively better i915g still is than i915c (llvmpipe VS instead
of the classic swrast interpreter), I think it's still worth it to get
i915g correct before we fix this perf regression.
Fixes: #4971
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13052>
The common code fails dEQP-VK.wsi.display_control.register_device_event
due to having a stub NOT_IMPLEMENTED return, and thus fails the CTS. This
is one of our last failures, so disable the extension until it can get
finished off, so we can unblock passing the CTS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13010>
This patch allows to form clauses even if the register pressure
is at the limit with the effect that VMEM instructions are less
scattered after the first clause in a Block.
It respects the previous clause size to avoid excessive moving
of VMEM instructions.
VMEM_CLAUSE_MAX_GRAB_DIST is further reduced to compensate
some of the effects.
Totals from 28922 (19.26% of 150170) affected shaders: (GFX10.3)
VGPRs: 1546568 -> 1523072 (-1.52%); split: -1.52%, +0.00%
CodeSize: 117524892 -> 117510288 (-0.01%); split: -0.08%, +0.07%
MaxWaves: 605554 -> 611120 (+0.92%)
Instrs: 22292568 -> 22291927 (-0.00%); split: -0.10%, +0.09%
Latency: 488975399 -> 490230904 (+0.26%); split: -0.06%, +0.32%
InvThroughput: 117842300 -> 116521653 (-1.12%); split: -1.15%, +0.03%
VClause: 541550 -> 522464 (-3.52%); split: -9.73%, +6.20%
SClause: 718185 -> 718298 (+0.02%); split: -0.00%, +0.02%
Copies: 1420603 -> 1386949 (-2.37%); split: -2.64%, +0.27%
Branches: 559559 -> 559278 (-0.05%); split: -0.06%, +0.01%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10896>
Fixes several problems in the pan_blit() logic:
1. We actually need the reciprocal of the depth scaling in z_scale (maybe
we should rename this field z_scale_rcp to make it clear)
2. When Z end < Z start we should remove one to the cur_layer/layer_offset
instead of doing it on the last_layer field, otherwise there's an
off-by-one error
3. The Z src offset should be adjusted to account for scaling. If we don't
do that we won't sample from the right layer when upscaling.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12961>
Bit 12 of render->aux1 is GL_CCW/GL_CW. For GL_CCW (default of glFrontFace) we have
to set that bit active.
This is not what the blob does and what the original reverse engineering documentation
says. The blob sets this value inverted and does some bogus negation of the fragment
shaders gl_FrontFacing variable instead.
Anyway, doing it this way does not cause regressions but fixes
dEQP-GLES2.functional.shaders.builtin_variable.frontfacing and 4 piglit tests.
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7690>
Again if you get passed an invoc but the exec mask has the
active lane somewhere other than at 0, then if we have an
invoc we should find the active lane and extract the value
from invoc rather than using the idx.
This fixes a bunch of VK 1.2 subgroup tests once 1.2 is enabled:
dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_nonconst*
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12953>
this works by tracking 1024-member arrays of images and textures using idalloc
for indexing. each idalloc id is an index into the array that is returned as a handle,
and this handle is used to index into the array in shaders.
in the driver, VK_EXT_descriptor_indexing features are used to enable updates on the live
bindless descriptor set and leave unused members of the arrays unbound, which works as
long as no member is updated while it is in use. to avoid this, idalloc ids must cycle through
a batch once the image/texture handle is destroyed before being returned to the available pool
in shaders, bindless ops come in one of two types:
- i/o variables
- bindless instructions
for i/o, the image/texture variables have to be rewritten back to the integer
handles which represent them so that the successive shader stage utilizing them
can perform the indexing
for instructions, the src representing the image/texture has to be rewritten as a deref
into the bindless image/texture array
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12855>
this requires allocating and using a lazily-allocated msaa surface as a transient
attachment for the base render operation, resolving it
...except vulkan has no "replicate" renderpass attachment mechanism, so for now we're
just gonna allocate a transient surface and hang on to it. sorry tilers!
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12934>