The option use_interpolated_input_intrinsics will lower these as well
as regular input loads. This is inconvenient for V3D, where we can
produce optimal code for regular input loads based on the input
variable layout qualifiers, so this change adds an option to only
lower instances of interpolateAt().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In practice we found that we need this for v3d (specifically for cube
map arrays, as they don't support the default value for wrap_i, so a
sampler object is needed to override that value).
It is worth to note that the main reason behind this auxiliar method
was to identify those cases that we didn't have a sampler object
available for Vulkan. So far, we found that we have a sampler object
coming from nir always for that operation.
Fixes cube map array tests like the following:
dEQP-VK.glsl.texture_functions.query.texturequerylod.usamplercubearray_fragment
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
locations are important for these because they provide info about how
many block indices each ubo takes up
UBO arrays have nonzero values here. all non-array UBOs have either 0
for the base or nonzero for an io lowered block at an offset,
but only arrays need to be changed here because they're the only ones
with absolute values, whereas all the others are relative.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6272>
Before 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero") this
optimization didn't need the use of umax/umin. VC4 HW supports only signed
integer max/min operations.
lower_umin and lower_umax are added to allow enabling previous optimizations
behaviour for this cases.
Fixes: 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7083>
After each end_primitive and at the end of the shader before emitting
set_vertex_and_primitive_count, we check if the primitive that is being
emitted has enough vertices or not, and we adjust the vertex and
primitive counters accordingly.
As a result, if the backend uses this option, the backend compiler
will not have to worry about discarding the unneeded vertices
and primitives.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
LLVM loves take advantage of the fact that vec3s in OpenCL are 16B
aligned and so it can just read/write them as vec4s. This results in a
LOT of vec4->vec3 casts on loads and stores. One solution to this
problem is to get rid of all vec3 variables.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
LLVM loves take advantage of the fact that vec3s in OpenCL are 16B
aligned so it can just read/write them as vec4s. This is questionably
legal except that it uses a xyz write-mask when it does it. The result
is a LOT of vec4->vec3 casts on loads and stores. This optimization
detects this case as well as other bit-cast cases and rewrites them to
get rid of the cast.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
These are based on the ones which already existed in the load/store
vectorization pass but I made some improvements while moving them. In
particular,
1. They're both faster if the bit sizes are equal
2. The check is faster if old_bit_size > new_bit_size
3. The check now fails if it would use more than NIR_MAX_VEC_COMPONENTS
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
This pass attempts to optimize three broad categories of memcpy:
1. Self-copies: These we can discard out-of-hand.
2. Vector copies: It doesn't matter what the vector size is or if the
source and destination have different vector types, it's still easy
enough to emit a load/store pair.
3. Tightly packed copies: In the case where a type is tightly packed
(no padding bits), we can replace the memcpy with a copy_deref
instruction which the optimizer is far better at handling.
This has proven capable of getting rid of many of the memcpy instances
in some rather gnarly OpenCL C kernels I've been looking at, even after
coming out of LLVM's optimizer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
In 9f3c595dfc, we attempted to handle casts in opt_find_array_copies
but missed a critical case. In particular, in the case where we begin
finding a copy but then encounter a cast, we need to discard everything
which might alias that cast.
Fixes: 9f3c595dfc "nir/find_array_copies: Handle cast derefs"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
This adds primarily two passes: One is a lowering pass which turns
these conversion intrinsics into a series of ALU ops. The other is an
optimization pass which attempt to simplify the conversion whenever
possible in the hopes that we can turn it into a "normal" conversion op
which doesn't need special treatment.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>