This hack in glsl_to_nir() to clean up after the glsl ir linker should
no longer be reachable. These type of linking opts are now done via
a nir based linker long after GLSL IR has been coverted to nir by
this pass.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19104>
This failed to take fabs of the first component, implementing an unintended
formula that would return the right results in some common cases but is wrong in
general:
max { x, |y|, |z| }
instead of the intended
max { |x|, |y|, |z| }
Reexpress the implementation to make correctness obvious.
Fixes: 272e927d0e ("nir/spirv: initial handling of OpenCL.std extension opcodes")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18754>
The loop over sources has to happen for every instruction, regardless of whether
we also need to register allocate the destination. The other source loops handle
this properly, but this one was missed.
Fixes spilling failure in shaders/android/angle/aztec_ruins/16.shader_test when
the input NIR is shuffled a bit (from reordering passes).
Fixes: 129d390bd8 ("pan/mdg: Fix bound setting in RA for sources")
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19093>
The register file on Midgard is not large enough to sustain 256 threads in a
threadgroup when all ISA-defined registers are used. As such, we want
to advertise the smallest MAX_THREADS_PER_BLOCK permissible by the spec to
avoid compiling shaders that will necessarily spill. The minimum-maximum in
OpenGL ES 3.1 is 128, so set that on Midgard.
6 compute shaders LOST in shader-db due to exceeding this new limit. These
shaders would fault if they were attempted to be executed.
Cc: mesa-stable
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19092>
If the pipeline was created with the creation flags
VK_PIPELINE_CREATE_CAPTURE_STATISTICS_BIT_KHR or
VK_PIPELINE_CREATE_CAPTURE_INTERNAL_REPRESENTATIONS_BIT_KHR it is
really likely that methods from VK_KHR_pipeline_executable_properties
that would require having access to the qpu insts around will be
called.
Instead of getting those back from the BO where we upload them, we
just keep them around. This could require more host memory, but would
allow us to avoid needing to handle map/unmap the BO when needed (so
needing the host memory in any case). This can be tricky if those
methods are being called from different threads (so we can avoid
adding a mutex there).
In the same way, if the pipeline was not created with those flags, we
skip collecting data that requires the QPU. Only
GetPipelineExecutableProperties is allowed to be called without any of
those flags, and doesn't require that info.
This fixes a race condition crash at GetPipelineExecutableProperties
when using fossilize-replay with some fossils with several shaders,
and using several threads, as some thread would be unmapping the bo
before other thread stopped to use it.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18859>
KHR_debug provides the same functionality but this extension is
still in use and adding support for it seems fairly harmless.
For example its used by Unity and without it we keep
getting given apitraces from Unity games that just spew out
unsupported function errors due to the missing support.
Acked-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19029>
Turn -DWINDOWS_NO_FUTEX to be pre_args for not need add direct dependencies
to dep_futex for libraries and executables.
So only add dependencies to idep_mesautil is enough.
And this will make sure all source code are either using Windows futex,
or use mtx_t consistently across different sources, other than mixed usage of
futex and mtx_t before this commit.
If -DWINDOWS_NO_FUTEX is not globally available, that would cause
/src/util/simple_mtx.h:116: undefined reference to `futex_wait'
This error is raised when
* compiled with -D min-windows-version=7
* moved futex_wait from futex.h to futex.c
* used simple_mtx_t in more codes
Or linkage error:
src/compiler/libcompiler.a.p/glsl_types.cpp.obj: in function `futex_wake':
/../../src/util/futex.h:154: undefined reference to `WaitOnAddress'
When:
* compiled with -D min-windows-version=7
* used simple_mtx_t in more codes
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7494
Fixes: c002bbeb2f ("util: Add a Win32 futex impl")
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19087>
This reverts commit e749f67f89, which added a CAP
to support drivers that can only do upside-down point coordinates. That was
added specifically for Asahi, since Metal's point coordinate convention is
opposite Mesa's. Since then, additional reverse-engineering aided by the PowerVR
headers led me to the bit doing the flip in hardware, so Asahi does not use the
CAP since baadc1ec13 ("asahi: Don't use lower_wpos_pntc"). Garbage collect it.
[If it's needed for future hardware, we can revive it. But the plan is Vulkan
anyway.]
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Suggested-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19078>
If there is a single use of fmul, and that single use is fadd, it makes
sense to fuse ffma, as we already do. However, if there are multiple
uses, fusing may impede code gen. Consider the source fragment:
a = fmul(x, y)
b = fadd(a, z)
c = fmin(a, t)
d = fmax(b, c)
The fmul has two uses. The current ffma fusing is greedy and will
produce the following "optimized" code.
a = fmul(x, y)
b = ffma(x, y, z)
c = fmin(a, t)
d = fmax(b, c)
Actually, this code is worse! Instead of 1 fmul + 1 fadd, we now have 1
fmul + 1 ffma. In effect, two multiplies (and a fused add) instead of
one multiply and an add. Depending on the ISA, that could impede
scheduling or increase code size. It can also increase register
pressure, extending the live range.
It's tempting to gate on is_used_once, but that would hurt in cases
where we really do fuse everything, e.g.:
a = fmul(x, y)
b = fadd(a, z)
c = fadd(a, t)
For ISAs that fuse ffma, we expect that 2 ffma is faster than 1 fmul + 2
fadd. So what we really want is to fuse ffma iff the fmul will get
deleted. That occurs iff all uses of the fmul are fadd and will
themselves get fused to ffma, leaving fmul to get dead code eliminated.
That's easy to implement with a new NIR search helper, checking that all
uses are fadd.
shader-db results on Mali-G57 [open shader-db + subset of closed]:
total instructions in shared programs: 179491 -> 178991 (-0.28%)
instructions in affected programs: 36862 -> 36362 (-1.36%)
helped: 190
HURT: 27
total cycles in shared programs: 10573.20 -> 10571.75 (-0.01%)
cycles in affected programs: 72.02 -> 70.56 (-2.02%)
helped: 28
HURT: 1
total fma in shared programs: 1590.47 -> 1582.61 (-0.49%)
fma in affected programs: 319.95 -> 312.09 (-2.46%)
helped: 194
HURT: 1
total cvt in shared programs: 812.98 -> 813.03 (<.01%)
cvt in affected programs: 118.53 -> 118.58 (0.04%)
helped: 65
HURT: 81
total quadwords in shared programs: 98968 -> 98840 (-0.13%)
quadwords in affected programs: 2960 -> 2832 (-4.32%)
helped: 20
HURT: 4
total threads in shared programs: 4693 -> 4697 (0.09%)
threads in affected programs: 4 -> 8 (100.00%)
helped: 4
HURT: 0
v2: Update trace checksums for virgl due to numerical differences.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18814>