If a deref_cast parent is also a deref, and the parent mode is not
generic, we can safely propagate the parent mode down to the
deref_cast.
This will allow the fixup to work when the deref_cast is used just
to change the glsl_type when accessing a variable/deref-chain.
Reviewed-by: Faith Ekstrand <faith@gfxstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23734>
In 9ffd00bcf1 ("nir_to_tgsi: Pack our tex coords into vec4
nir_tex_src_backend[12]"), Emma added a pair of back-end sources to
nir_tex_instr to allow complex lowering to be done in NIR. This adds a
tiny bit more hw-specific back-end information that a NIR lowering pass
can communicate to the back-end compiler.
While the opcode contains most of the information needed, some thing
such as the presence of offsets is currently only communicated via the
presence of specific source types in the source list. This information
is gone when the texture instruction is lowered to back-end sources.
Adding a backend_flags field fixes this by allowing the lowering pass to
communicate a small amount of side-band information if needed.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22303>
nir_const_value_for_int asserts signed bounds on the input, but we pass in an
unsigned value that would be out-of-bounds for 32-bit channels, causing the
assert to fail for 32-bit channel formats.
Fixes dEQP-VK.pipeline.monolithic.logic_op.r32_uint.* on AGXV (and probably
PanVK).
Fixes: dbd0615e7a ("nir/lower_blend: Avoid useless iand with logic ops")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Italo Nicola <italonicola@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24252>
Consider the snippet of NIR:
div 32 %447 = @load_reg (%442) (base=0, legacy_fabs=0, legacy_fneg=0)
div 32 %463 = @load_reg (%442) (base=0, legacy_fabs=0, legacy_fneg=0)
con 32 %409 = iadd %17 (0x3), %447
@store_output (%182 (0x601), %463) (base=0, wrmask=x, component=0, src_type=invalid...
@store_reg (%409, %442) (base=0, wrmask=x, legacy_fsat=0)
The load_reg's are trivial, so the %442 read will get folded into store_output.
But under the old definition, the store_reg is also trivial so it gets folded
into the iadd... causing a read-after-write hazard and invalid code generation.
The fix is to amend our definition of store_reg triviality to account for loads
getting folded in. It's not good enough that there's no intervening load_reg,
there can also be no intervening source that gets chased to a load_reg. Handle
that case as well.
Identified in dEQP-VK.geometry.input.basic_primitive.triangles_adjacency on
V3DV.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reported-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
In order for a register load to be trivial, it cannot be used in any
block other than the one in which it is loaded. We're not currently
explicitly doing anything to ensure this invariant holds. It may be
that it holds regardless but I couldn't find any documented reason why
it should so let's explicitly handle that case. Worst case, the newly
added code does nothing.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
Because this pass is intended to be run after out-of-SSA and directly
before injesting the NIR into the back-end, it may come after divergence
analysis and needs to preserve the divergence information. Fortunately,
since all we ever do is insert nir_op_mov, this is easy.
Fixes: d313eba94e ("nir: Add pass for trivializing register access")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
This commit makes three changes:
1. Default all newly created registers divergent because this is the
safer default.
2. Make divergence analysis do something sane with register divergence.
It's not perfect because divergence analysis isn't able to prove
registers divergent based on stores but at least if someone uses
registers a bit they'll end up with safe defaults. This matches
what they'd get with nir_ssa_def_init().
3. Make the load_reg() helper automatically propagate divergence from
the register. Because the defaults for both nir_ssa_def_init() and
nir_decl_reg() are to mark everything divergent, this only means
that nir_load_reg() of a uniform reg is now uniform.
Putting all these together, nir_from_ssa should now be producing
load_reg intrinsics with the proper uniform information.
Fixes: 7229bffcb1 ("nir: Add intrinsics for register access")
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24153>
In the case of:
halt
// succs: b9
if %618 {
block b3:// preds:
break
// succs: b6
} else {
block b4: // preds: , succs: b5
}
block b5: // preds: b4
32 %556 = iadd %617, %2 (0x1)
opt_constant_if() doesn't work because stitch_blocks() can't join blocks if the
before ends in a jump and the after isn't empty.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24235>
Most of the variables in the hash table will never actually be looked up
for any given block so cloning every possible value just creates a bunch
of unrequired memcpy calls.
Here we change the code to only clone the copies array once it is
actually looked up for the first time.
The Blender shader compile time in issue #9326 improves as folows:
151.09 seconds -> 21.11 seconds
The CTS test dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
improves as follows:
1.67 seconds -> 0.92 seconds
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
If kill alias results in the hash table entry holding an empty
copies array then remove the hash entry and return the dynamic array
to the unused pool.
This helps avoid hash table size getting out of control in very large
shaders.
151.09 seconds -> 118.60 seconds
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
Here we change things to simply clone the entire hash table. This
is much faster than trying to rebuild it and is needed to avoid
slow compilation of very large shaders.
The Blender shader compile time in issue #9326 improves as folows:
251.29 seconds -> 151.09 seconds
The CTS test dEQP-GLES31.functional.ubo.random.all_per_block_buffers.20
improves as follows:
2.38 seconds -> 1.67 seconds
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24227>
Consider code like:
32x4 %2 = @load_interpolated_input (%1, %0 (0x0)) (base=0, component=0, dest_type=float32, io location=VARYING_SLOT_VAR0 slots=1 mediump) // Color
32x4 %3 = fabs %2
32x4 %4 = fsat %3
32x4 %5 = fsin %4
The existing logic would incorrectly tell the backend that both fabs and fsat
could be folded, and then half the shader disappears. Whoops. Fix by stopping
the folding in this case. I choose to do this check in the fsat rather than the
fabs because it's more straightforward (1 source vs N uses) but it's somewhat
arbitrary.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24116>