Allow folding constants/undef sources by sharing more code with the image_store
16bit folding pass.
Allow more than one set of sources because RADV wants two, one for
G16 (ddx/ddy) and one for A16 (all other sources).
Allow folding cube sampling destination conversions on radeonsi/radv because
I think the limitation only applies to sources.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16978>
This adds support for global 64-bit GPU addresses as a pair of
32-bit values. This is useful for platforms with 32-bit GPUs
that want to support VK_KHR_buffer_device_address, which makes
GPU addresses explicitly 64-bit.
With the new format we also add new global intrinsics with 2x32
suffix that consume the new address format.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17275>
On Intel, we have to do this because we can't ask for the per-sample
barycentrics without setting the per-sample dispatch bit or the GPU will
hang. However, nothing we're doing in this pass is Intel-specific and
it may be a useful optimization for someone else so we may as well make
it a generic NIR pass. This version actually does a bit more than the
current brw_nir_demote_sample_qualifiers() pass as it also handles
pre-nir_lower_io interp_dref_at* as well as a couple system values which
we can easily constant-fold.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14020>
according to the spec, atomic counters can be bound at any offset divisible by 4,
which means that any driver that uses the ssbo lowering pass and doesn't have
a min offset align of 4 is potentially broken
to handle this, use a statevar to inject the misaligned remainder of the offset
into the shader as a uniform. for well-aligned counter binds, the uniform offset
will be 0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16749>
Doing this in NIR should give better results, but also allows us to
stop calling more GLSL IR optimisations passes.
v2: Skip 8bit and 16bit type that would require further processing
I believe this is an existing bug in the GLSL IR pass also.
v3: rebuild constant initialisers as we want to call this pass
after nir has already lowered them and performed optimisations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
This function allows to only scalarize instructions down to a desired
vectorization width. nir_lower_alu_to_scalar() was changed to use the
new function with a width of 1.
Swizzles outside vectorization width are considered and reduce
the target width.
This prevents ending up with code like
vec2 16 ssa_2 = iadd ssa_0.xz, ssa_1.xz
which requires to emit shuffle code in backends
and usually is not beneficial.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
The callback allows to request different vectorization factors
per instruction depending on e.g. bitsize or opcode.
This patch also removes using the vectorize_vec2_16bit option
from nir_opt_vectorize().
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
1. Lowers NV_mesh_shader TASK_COUNT output to launch_mesh_workgroups.
2. Removes all code after launch_mesh_workgroups, enforcing the
fact that it's a terminating instruction.
3. Ensures that task shaders always have at least one
launch_mesh_workgroups instruction, so the backend doesn't
need to implement a special case when the shader doesn't have it.
4. Optionally, implements task_payload using shared memory when
task_payload atomics are used.
This is useful when the backend is otherwise not capable of
handling the same atomic features as it can for shared memory.
If this is used, the backend only has to implement the basic
load/store operations for task_payload.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16720>
Some drivers don't support these indirects and therefore require
loop unrolling if a shader uses a loop induction variable to
access a sampler array.
Here we add a new nir shader compiler option that drivers can set,
this will be the equivalent of the EmitNoIndirectSampler setting
used in the GLSL IR unrolling pass.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16543>
This controls the whole lowering of "make tex ops with implicit
derivatives on non-implicit-derivative stages be tex ops with an explicit
lod of 0 instead", but it's really hard to describe that in a git commit
summary.
All existing callers get it added except:
- nir_to_tgsi which didn't want it.
- nouveau, which didn't want it (fixes regressions in shadowcube and
shadow2darray with NIR, since the shading languages don't expose txl of
those sampler types and thus it's not supported in HW)
- optional lowering passes in mesa/st (lower_rect, YUV lowering, etc)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16156>
NTT can't convert local 64 variables of type vec3 or vec4, therefore,
they need to be split into vec2 + double or vec2 + vec2.
At the same time deal splitting the phi nodes.
The pass goes into the global namespace because it is also useful
for r600.
v2: only lower function_temps (Emma) and handle array of arrays (Jason)
v3: - remove bool parameter in function to merge
vec3 and vec4
- simplify load/store_deref_(array|var)
- use a pointer hash table
- simplify PHI splitting (all Emma)
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15945>
The BITFIELD_MASK() macro is intended for using with actual bitfields,
not with nir_component_mask_t. This means we do some extra work to
handle values that are invalid for nir_component_mask_t in the first
place.
This eliminates some warnings on Clang, where the compiler complains
about casting UINT32_MAX to UINT16_MAX.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15547>
When we put NIR in the compiler stack for r300, indirect addressing broke
for gallium nine. DX's array indirects round the float value, so the DX
shader gets mapped to a TGSI "ARR ADDR[0] src.x" instruction. Translating
that to NIR maps to r0[f2i32(fround(src.x))]. While we might hope that in
translation back using nir-to-tgsi after optimization we would recognize
the construct and emit ARR again, that's going to be error prone (think
"what if src.x is in a NIR register?") so we need a fallback plan. r300
will be able to handle this lowering, so get it in place first to fix the
regression.
Fixes: #6297
Fixes: 7d2ea9b0ed ("r300: Request NIR shaders from mesa/st and use NIR-to-TGSI.")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15870>
DG2 can only do sample_d and sample_d_c on 1D and 2D surfaces. Cube
maps and 3D surfaces were already handled, but 1D array and 2D array
surfaces were not.
Fixes the following Vulkan CTS failures on DG2:
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.isampler2darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler1darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_fixed_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.sampler2darray_float_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler1darray_fragment
dEQP-VK.glsl.texture_functions.texturegradclamp.usampler2darray_fragment
The Fixes: tag below is a bit misleading. This commit adds another
lowering, similar to the one in the Fixes: commit, that probably should
have been added at the same time. I just want to make sure this commit
gets applied everywhere that commit was also applied.
Fixes: 635ed58e52 ("intel/compiler: Lower txd for 3D samplers on XeHP.")
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15681>
As this function doesn't check for any control-flow
dependence, it only returns true for statically
(or globally) uniform values.
The same holds true for is_binding_dynamically_uniform()
in nir_opt_gcm().
Rename to better reflect that property.
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14994>