To be helpful, the thing inside the fsat has to be used with and without
the fsat. Otherwise it just moves a saturate destination modifier
around. To not be harmful, the fsat has to only be used by the bcsel.
All Broadwell and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 20174475 -> 20174449 (<.01%)
instructions in affected programs: 3913 -> 3887 (-0.66%)
helped: 13 / HURT: 0
total cycles in shared programs: 866844832 -> 866844719 (<.01%)
cycles in affected programs: 46037 -> 45924 (-0.25%)
helped: 10 / HURT: 1
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 161491468 -> 161491372 (-0.0%)
helped: 31 / HURT: 8
Cycles in all programs: 10933090736 -> 10933024716 (-0.0%)
helped: 32 / HURT: 18
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22169>
`nir_create_passthrough_gs` will now take a boolean argument to decide
whether it needs to handle edgeflags.
When true is passed it will output a line strip where edges that
shouldn't be visible are not emitted.
This is usefull because geometry shaders will generally throw away
edgeflags so for a passthrough GS to act transparently it needs to emulate them.
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21238>
NIR i/o lowering and sysval lowering can handle the compact var fine at
this point.
Affects: nouveau, virgl, svga, radeonsi, r600, llvmpipe. Does not affect
PIPE_CAP_NIR_COMPACT_ARRAYS drivers like crocus, iris, d3d12, freedreno,
zink.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21940>
Mali's LD_TILE instruction (mapping to NIR's load_output) requires a "conversion
descriptor" specifying how to convert from the register foramt to the tilebuffer
format. To implement framebuffer fetch on OpenGL without shader variants, we
generate these descriptors in the driver and pass them in a uniform. However, to
comply with the Ekstrand Rule, we can't have magically materialized system
values -- they should come only from the NIR where the driver can lower as it
pleases (e.g. PanVK can lower to a constant because it knows the framebuffer
format at pipeline create time). Add intrinsics to model this.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
We want to lower this in NIR instead of the backend IR to give the driver a
chance to lower the "is multisampled?" system value, which makes more sense to
do in NIR. This gets rid of one of the magic compiler materialized sysvals.
Plus, this will let us constant fold away the lowering in Vulkan when we know
that the pipeline is single-sampled / multi-sampled.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20906>
The mediump lowering tests are important for poking at the lowering pass
behavior, since you can't really assert the behavior in any given driver,
given that the GLSL spec allows any mediump op to be done in highp.
But, in hacking on mediump lowering, I wanted several things that the old
test couldn't do:
- Be able to assert about the actual NIR code we expect to generate for a
hypothetical driver (important if other compiler stages might do invalid
transformations like eliminating highp temps, or if we were to move the
lowering after GLSL IR)
- Run faster (gtest unit tests rather than python forking off the standalone
glsl compiler per testcase).
- Express expectations with a lot less escaping of typical syntax.
- High-quality logs for displaying failures.
This new test does all of that, I think, though I haven't converted all of
the unit tests over yet. In converting, I dropped some of the
combinatorial explosion for float/int variations, instead only doing so
when it gets at some different code path (default precision flags). I've
also included some new tests I wrote in the process of writing my proposed
gl_nir mediump lowering.
Even if the conversion isn't complete, getting these tests to run faster
is probably a good idea on its own, for anyone iterating running Mesa's
unit tests (80 tests in 25ms, compared to 109 tests in 1.5s!).
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21886>
bitCount is still special in that our lowering would try to demote its arg
based on the precision of its output, and it shouldn't do that. But the
other special cases now have appropriate qualifiers on them at the IR
level.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21666>
These have precision qualifiers defined in the spec, in which case we
should emit them them while generating builtin signatures and code. We've
been special-casing them in GLSL lower_precision, but now we can just rely
on the precision qualifier of the builtin if non-NONE.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21666>
No need to generate a temp in this case. Cleanup I noticed while looking
at lower_precision behavior (and I've included a testcase to sanity check
that things work out).
This causes a tiny amount of scheduling change on freedreno:
total instructions in shared programs: 11010012 -> 11010012 (0.00%)
instructions in affected programs: 147 -> 147 (0.00%)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21666>