Some hardware doesn't have a way to check if invocation was demoted,
in such case we have to track it ourselves.
OpIsHelperInvocationEXT is specified as:
"An invocation is currently a helper invocation if it was originally
invoked as a helper invocation or if it has been demoted to a helper
invocation by OpDemoteToHelperInvocationEXT."
Therefore we:
- Set gl_IsHelperInvocationEXT = gl_HelperInvocation
- Add "gl_IsHelperInvocationEXT = true" right before each demote
- Add "gl_IsHelperInvocationEXT = gl_IsHelperInvocationEXT || condition"
right before each demote_if
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9460>
The prog_to_nir->NIR-to-TGSI change ended up causing regressions on r300,
and svga against r300-class hardware, because nir_lower_uniforms_to_ubo()
introduced shifts that nir_lower_ubo_vec4() tried to reverse, but that NIR
couldn't prove are no-ops (since shifting up and back down may drop bits),
and the hardware can't do the integer ops.
Instead, make it so that nir_lower_uniforms_to_ubo can generate
nir_intrinsic_load_ubo_vec4 directly for !INTEGER hardware.
Fixes: cf3fc79cd0 ("st/mesa: Replace mesa_to_tgsi() with prog_to_nir() and nir_to_tgsi().")
Closes: #4602
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10194>
It is useful for the precisions of varyings to match across shader
stages at link-time to enable precision lowering optimizations, which
would otherwise require costly draw-time fixups.
The goal is to enable `producer->precision == consumer->precision` to be
an invariant drivers may rely on for linked shaders.
v2: keep transform feedback outputs at mediump - mareko
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com> (v1)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
Added:
* a pass that renumbers bases of IO intrinsics
* a pass that converts mediump IO to 16 bits, optionally using the new
packed varying slots
* a pass that sets (forces) mediump in IO intrinsics (for testing)
* a pass that remaps VARYING_SLOT_VAR[0..15]_16BIT to VARYING_SLOT_VAR[0..31]
(if some shader stages don't want packed varyings)
* a pass that folds type conversions around texture opcodes into those
opcodes (e.g. tex(f2f32(coord), ..) is changed into tex accepting f16)
* a pass that changes (legalizes) sampler src and dst types based on specified
hw constraints (e.g. derivatives must be the same type as coordinates)
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
This allows mediump inputs and outputs to be trivially lowered into packed
16-bit varyings where 1 slot is occupied by 2 16-bit vec4s, without any
packing instructions in NIR and without any conflicts with 32-bit varyings.
The only thing that is changed is IO semantics in intrinsics to get packed
16-bit varyings.
This simplifies supporting 16-bit types for drivers that have 32-bit slots
everywhere except the fragment shader where they can do 16-bit interpolation
on either the low or high half of each slot.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9050>
This breaks CLOn12's handling of CL CTS test_basic vector_creation for char3 (at least).
Removing this cast causes us to try to load from a deref with no alignment info.
Fixes: 99bb2a4d ("nir/opt_deref: Don't remove casts with alignment information")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10165>
HLSL doesn't support bitcasting a 64bit integer to a double. DXIL
doesn't have generic pack/unpack instructions, so we lower those to
integer bitwise ops. As a result, NIR generic double pack/unpack would
require our backend to emit a bitcast to get a double, but we want
to match HLSL semantics and emit MakeDouble/SplitDouble.
Adding a dedicated opcode for double pack/unpack allows us to add a
pass to emit that instead, which lets our backend emit the right
instruction to pack and unpack doubles.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
Roundtrip to a larger float and divide there. The extra details for
mod/rem are handled directly in integer space to simplify verification
of rounding details. The one issue is that the mantissa might be
rounded down which will cause issues; adding 1 unconditionally (proposed
by Jonathan Marek) fixes this. The lowerings here were tested
exhaustively on all pairs of 16-bit integers.
v2: Update idiv lowering per Rhys Perry's comment.
v3: Rewrite lowerings.
v4: Remove useless ftrunc, fix 8-bit issue, simplify code.
v5: Remove useless ffloor
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Tested-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8339>
Helps deduplicate some code between the two lowering paths. In
particular, it ports the missing 32-bit? check to the precise pass. This
does not change anything immediately: drivers depending on this to lower
16-bit did not work before due to type mismatches and will not work now
since it'll refuse to lower. But that means sub-32-bit idiv can be
lowered more efficiently in an algebraic pass.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8339>
texture_index is meaningless when a tex_instr has deref src.
Use var->data.binding instead.
This fixes the incorrect lowering on radeonsi where the same
lowering steps was applied to all tex_instr based on the needs
of the first one (since texture_index is always 0).
CC: mesa-stable
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9931>
For nir_address_format_64bit_global_32bit_offset and
nir_address_format_64bit_bounded_global, we use a new intrinsics which
take the base address and offset as separate parameters. For bounds-
checked access, the bound is also included in the intrinsic. This gives
the drive more control over the bounds checking so that UBOs don't
suddenly become massively more expensive.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8635>