These are I/O variables which are not going to be removed anyway.
However, get_variable_io_mask handles their location incorrectly.
Found using the GCC undefined behavior sanitizer.
Fixes the following error:
runtime error:
shift exponent 4294967258 is too large
for 64-bit type 'long unsigned int'
Closes: #5319
Fixes: cf5f8f55c3
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12719>
Most modern hardware needs the edge flag added as a hidden vertex input
and needs code added to the vertex shader to copy the input to an
output. Intel hardware is a little different. Gfx4 and Gfx5 hardware
works in the previously described mannter. Gfx6+ hardware needs the
edge flag as a specific vertex shader input, and that input is magically
processed by fixed-function hardware without need for extra shader code.
This flag signals only that the vertex shader input is needed. It would
be nice if we could decouple adding the vertex shader input from
generating the copy-to-output code, but that has proven to be
challenging. Not having that code causes other passes to want to
eliminate that shader input.
v2: Convert conditional to assertion. This pass is only called for
vertex shaders. Suggested by Ken.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12858>
Now that they're no longer ralloc'd, we have to be much more careful
about indirects. We have to make sure every time a source or
destination is overwritten, its indirect (if any) is freed. We also
have to choose a memory ownership convention for the rewrite functions.
Assuming that they will be called with the source from some other
instruction, we choose to always make a copy of the indirect (if any).
It's the responsibility of the caller to ensure its copy of the indirect
is freed.
Unfortunately, all this extra logic is going to make
nir_instr_rewrite/move_src/dest more expensive because they now have
all the logic of nir_src/dest_copy instead of a simple struct
assignment. Fortunately, the vast majority of rewrite calls are done by
nir_ssa_def_rewrite_uses which is an SSA-only fast-path.
Fixes: 879a569884 "nir: Switch from ralloc to malloc for NIR instructions."
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12884>
This code was being tricky with passing a mem_ctx instead of the shader,
then freeing the mem_ctx when the pass was done and all the parallel
copies had been removed from the shader. Use the right type for instr
creation and do a bit of manual list management to prepare the way for
non-ralloc NIR instrs.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11776>
Calling this lower pass twice in a row would cause spurious
set_vertex_and_primitive_count(0, undef) intrinsics after the proper
set_vertex_and_primitive_count intrinsic. This pretty much turns any
geometry shader into garbage.
Fix this by treating nir_intrinsic_emit_vertex_with_counter and
nir_intrinsic_end_primitive_with_counter just like the non-_with_counter
versions. If no blocks would need set_vertex_and_primitive_count
intrinsics added, exit the pass before doing any work. This prevents
the need for DCE to do extra clean up later.
Since this pass is potentially called multiple times via multiple
invocations of a finalize_nir callback, it is (hypothetically?) possible
that control flow could be changed to add new blocks that need this
intrinsic. The check implemented in this commit should be robust
against that possibility.
v2: Add a_block_needs_set_vertex_and_primitive_count. Suggested by
Timur.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12802>
It only makes sense to call this pass for fragment shaders, and the
first thing the pass does is read a FS-specific field out of a union,
so it isn't safe to call it for other shader stages.
We could make it early return, but instead we just assert, so that
drivers know to only call it when appropriate.
(A previous version of this patch, which early returned instead of
asserting, was Reviewed-by: Emma Anholt <emma@anholt.net> as well.)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12839>
Driver like radeonsi load varying in a scalar manner, so prefer to pack
varying with different interpolation qualifier into same slot to save
space.
But driver like panfrost/bifrost can load varying in vector manner,
so prefer to pack varying with same interpolation qualifier.
Driver can add interpolation qualifiers which are able to be
packed into same varying slot to pack_varying_options nir option.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12537>
The lowered LS and NGG stages use local_invocation_index and they
can benefit from the unsigned upper bound because they can emit a
less expensive integer multiplication instruction.
This was working in the past, but accidentally borked by a refactor.
Fossil DB changes on Sienna Cichlid:
Totals from 956 (0.74% of 128647) affected shaders:
CodeSize: 2354172 -> 2344712 (-0.40%)
Instrs: 434359 -> 434327 (-0.01%)
Latency: 1883949 -> 1876814 (-0.38%)
InvThroughput: 762638 -> 757405 (-0.69%)
Fossil DB changes on Sienna Cichlid (with NGGC enabled):
Totals from 57873 (44.99% of 128647) affected shaders:
CodeSize: 155844192 -> 155607064 (-0.15%)
Instrs: 29799184 -> 29799152 (-0.00%)
Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00%
InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00%
Fixes: 8af6766062
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>