rather than always early killing and then hitting pathological shuffle
situations, only early-kill when we can prove that we won't need to shuffle. it
turns out that's most of the time.
even with this heuristic, we still get hurt bad in shader-db due to extra moves.
but hopefully, the #s here are small enough that we can move on with our lives
and fix this source of known unsoundness.
this is tagged for backport as it's needed to avoid a perf regression with the
previous patch.
combined stats from this commit and the previous commit:
total instrs in shared programs: 2846065 -> 2852257 (0.22%)
instrs in affected programs: 618734 -> 624926 (1.00%)
total alu in shared programs: 2329477 -> 2335534 (0.26%)
alu in affected programs: 508119 -> 514176 (1.19%)
total gprs in shared programs: 894762 -> 901327 (0.73%)
gprs in affected programs: 36946 -> 43511 (17.77%)
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
shader-db stats combined with next commit. this is the rip off the bandaid, next
is the optimize. split to enable bisecting.
the code we have to shuffle clobbered killed sources is broken and, after
thinking about that for a Long time, I don't see a reasonable way to fix it. But
if we late-kill sources - or model our calculations as-if we were late-killing
souces - we never have to shuffle onto a killed source and the problem goes away
entirely.
this is similar in spirit to what NAK does. it's not "optimal", but it's sane.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
This hurts us in two ways:
* slightly more spilling (not actually a big problem)
* slightly worse occupancy (the shaders that are "helped" here are from trying
less hard to fit at higher occupancy levels)
However, in exchange we get a LOT more flexibility in the RA.
total instrs in shared programs: 2847015 -> 2846065 (-0.03%)
instrs in affected programs: 84134 -> 83184 (-1.13%)
total alu in shared programs: 2330406 -> 2329477 (-0.04%)
alu in affected programs: 62305 -> 61376 (-1.49%)
total code size in shared programs: 20497326 -> 20491690 (-0.03%)
code size in affected programs: 586664 -> 581028 (-0.96%)
total gprs in shared programs: 894202 -> 894762 (0.06%)
gprs in affected programs: 8900 -> 9460 (6.29%)
total scratch in shared programs: 13292 -> 13304 (0.09%)
scratch in affected programs: 2924 -> 2936 (0.41%)
total threads in shared programs: 27819712 -> 27814272 (-0.02%)
threads in affected programs: 55296 -> 49856 (-9.84%)
total spills in shared programs: 907 -> 914 (0.77%)
spills in affected programs: 419 -> 426 (1.67%)
total fills in shared programs: 857 -> 862 (0.58%)
fills in affected programs: 389 -> 394 (1.29%)
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34595>
FS tex prefetch reads tex coords from r0.x, and it doesn't care
what interpolation they have. Thus we can allow all interpolations
which HLSQ_CONTROL_3_REG/HLSQ_CONTROL_4_REG support. Which would
be: (pixel, centroid, sample) x (nopersp, persp). So all but FLAT
are supported.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34422>
The intention here, long ago, was to ensure that any symbols the DRI
driver needed from libGL were available. We don't have this problem
anymore, libgallium does not import any symbols from the GLX frontend
(or any other one for that matter).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30417>
Since MR !33891 EGL only supports a software driver (LLVM).
Routine dri3_x11_connect at
src/egl/drivers/dri2/platform_x11.c fails if DRI3 is not
available. So at that location variable *allow_dri2 should be set.
Looking at the major codition, we see it is not executed
if LIBGL_DRI3_DISABLE is set. In that case the hardware driver
is activated as desired. Previously this was not needed.
Also it is not practical, and not necessary.
I do not understand the major condition, so I did not change it.
This causes some duplicate coding.
Fixes: 323bad6b18 ("egl/x11: split out dri2 init entirely")
Signed-off-by: GKraats <vd.kraats@hccnet.nl>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34530>
Allocate space for the aliased region first, then allocate the
non-Aliased blocks in sequence after that.
SPV_KHR_workgroup_memory_explicit_layout extension added support for
having Blocks of workgroup (shared) memory, which include layout
decoration. For that extension all such blocks must be decorated with
Aliased.
SPV_KHR_untyped_pointers extension lifts that requirement, allowing
blocks that don't alias in workgroup memory. They are still explicitly
laid out.
The motivation is that untyped pointers provide a different
mechanism to obtain the same effect as the Aliased blocks. Instead of
having two Aliased variables with different types, have a single
variable and use an untyped pointer with a different type to access it.
This patch is a preparation for supporting untyped pointers.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34139>
Move the calculation to nir_lower_vars_to_explicit_types(). This
consolidates the check of shader_info::shared_memory_explicit_layout
in a single place instead of in all drivers.
This is motivated by SPV_KHR_untyped_pointers. Before that extension
we had essentially two modes for shared memory variables
- No layout decorations in the SPIR-V, and both internal layout and
driver location was _given by the driver_.
- Explicitly laid out, i.e. they are blocks, and decorated with Aliased.
Because they all alias, we could assign them driver location directly
to the start of the shared memory.
With the untyped pointers extension, there's a third option, to be added
by a later commit
- Explicitly laid out, i.e. they are blocks, and NOT decorated
with Aliased. Driver location is _given by the driver_. Blocks
with and without Aliased can be mixed.
The driver location of multiple blocks that don't alias depend on
alignment that is driver-specific, which we can more easily do from
the nir_lower_vars_to_explicit_types() that already has access to
a function to obtain such value.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> (hk)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> (v3dv)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (anv/hasvk)
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> (panvk)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> (radv)
Reviewed-by: Rob Clark <robdclark@gmail.com> (tu)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34139>