This is based on Timur Kristof's code, but there are a lot of differences.
The idea is that it doesn't just compute an intersection between a point
and a triangle. It computes the *distance* between a point and a triangle
and it does so in screen space. It accurately takes the subpixel precision
of the rasterizer into account, so that it works optimally at all
resolutions, all MSAA modes, and all quant modes.
The distance computation is only approximated because it only considers
the infinite lines going through triangle edges. However, it seems to be
more than sufficient in practice because the existing rounding-based small
prim culling compensates for it.
The performance improvement is up to 10% in some geometry-bound tests,
though targeted microbenchmarks can show a lot more than that.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33361>
Migrate the panfrost-g52-vk job to mt8186-corsola-steelix-sku131072, a
new device in LAVA. This DUT is faster than the Khadas VIM3 device it
replaces, and since more of these devices are available in LAVA, reduce
the DEQP_FRACTION and increase the parallelism for the pre-merge job.
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33996>
We currently just assume that textureCompressionETC2 and
textureCompressionASTC_LDR are always supported. And while that's true
for all the G52s, G610s abd G310s we've seen out in the wild, it's not
guaranteed to be true. An SoC vendor might disable support for one of
these formats.
So let's check properly, just for good measure.
Fixes: d970fe2e9d ("panfrost: Add a Vulkan driver for Midgard/Bifrost GPUs")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34206>
The panfrost_emit_plane function expects an array, and Coverity
complains about passing a pointer instead. Yeah, that's a bit nit-picky,
but it's easy enough to use an actual array here instead of trying to
fudge it.
This should be a non-functional change.
CID: 1636773, 1636744
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34156>
quick_shader is for some reason now excruciatingly slow on the Microsoft
runners, but fine on the GStreamer runners. Until we can figure out why
this is happening - 27min runtimes instead of 3min - just keep them over
there.
Dozen is miraculously unaffected.
Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34280>
Buffer resources are quite special as they are only one dimensional,
always linear, don't have miplevels or array slices, never have a
texture or render compatible sibling, don't ever use TS.
The gallium context interface acknowledges this fact by providing
separate entry points for buffer maps/unmaps/flushes.
Provide a specialized etna_buffer_resource as a much more lightweight
alternative to the fullblown etna_resource and implement buffer
maps/unmaps in the same straight forward, direct map manner that is
hidden inside all the tiling, TS and resource sibling handling in
etna_transfer_map/unmap. It is expected that further map optimizations
can be added on top of this simple implementation much more easily
than in the merged buffer/texture transfer code.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34061>
A begin/end sequence is something like (it's all macros based):
radeon_begin(cs);
radeon_emit(PKT3(PKT3_DRAW_INDEX_AUTO, 1, cmd_buffer->state.predicating));
radeon_emit(vertex_count);
radeon_emit(V_0287F0_DI_SRC_SEL_AUTO_INDEX | use_opaque);
radeon_end();
This is loosely based on RadeonSI (see !8653 (a0978fff)) and it seems
indeed faster overall.
The main goal of this rework is to re-use the same logic as RadeonSI
for paired packets on GFX12 (also GFX11 dGPUs) because it's supposed
to be way faster, especially on GFX12 where the CP is slow. The other
goal is to share more cmdbuf emission between both drivers in the near
future.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34229>
This is probably a little more code but we're about to add real data for
Turing+ so it's better to have things contained like this. Since Volta
and earlier will always remain hacks, we might as well have those hacks
in the per-SM files rather than pretending we have a general thing in
sched_common.rs.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34302>
We're about to add instruction latencies which are going to be in their
own files because they're also massive. This makes things follow a bit
more of a module structure where sm70.rs is the thing that ties it all
together, sm70_encode.rs is the encoder, and smXX_instr_latencies.rs
will be the individual latency files.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34302>