The option use_interpolated_input_intrinsics will lower these as well
as regular input loads. This is inconvenient for V3D, where we can
produce optimal code for regular input loads based on the input
variable layout qualifiers, so this change adds an option to only
lower instances of interpolateAt().
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
In practice we found that we need this for v3d (specifically for cube
map arrays, as they don't support the default value for wrap_i, so a
sampler object is needed to override that value).
It is worth to note that the main reason behind this auxiliar method
was to identify those cases that we didn't have a sampler object
available for Vulkan. So far, we found that we have a sampler object
coming from nir always for that operation.
Fixes cube map array tests like the following:
dEQP-VK.glsl.texture_functions.query.texturequerylod.usamplercubearray_fragment
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6766>
locations are important for these because they provide info about how
many block indices each ubo takes up
UBO arrays have nonzero values here. all non-array UBOs have either 0
for the base or nonzero for an io lowered block at an offset,
but only arrays need to be changed here because they're the only ones
with absolute values, whereas all the others are relative.
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6272>
Before 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero") this
optimization didn't need the use of umax/umin. VC4 HW supports only signed
integer max/min operations.
lower_umin and lower_umax are added to allow enabling previous optimizations
behaviour for this cases.
Fixes: 8e1b75b330 ("nir/algebraic: optimize iand/ior of (n)eq zero")
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7083>
After each end_primitive and at the end of the shader before emitting
set_vertex_and_primitive_count, we check if the primitive that is being
emitted has enough vertices or not, and we adjust the vertex and
primitive counters accordingly.
As a result, if the backend uses this option, the backend compiler
will not have to worry about discarding the unneeded vertices
and primitives.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6964>
LLVM loves take advantage of the fact that vec3s in OpenCL are 16B
aligned and so it can just read/write them as vec4s. This results in a
LOT of vec4->vec3 casts on loads and stores. One solution to this
problem is to get rid of all vec3 variables.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
LLVM loves take advantage of the fact that vec3s in OpenCL are 16B
aligned so it can just read/write them as vec4s. This is questionably
legal except that it uses a xyz write-mask when it does it. The result
is a LOT of vec4->vec3 casts on loads and stores. This optimization
detects this case as well as other bit-cast cases and rewrites them to
get rid of the cast.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
These are based on the ones which already existed in the load/store
vectorization pass but I made some improvements while moving them. In
particular,
1. They're both faster if the bit sizes are equal
2. The check is faster if old_bit_size > new_bit_size
3. The check now fails if it would use more than NIR_MAX_VEC_COMPONENTS
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
This pass attempts to optimize three broad categories of memcpy:
1. Self-copies: These we can discard out-of-hand.
2. Vector copies: It doesn't matter what the vector size is or if the
source and destination have different vector types, it's still easy
enough to emit a load/store pair.
3. Tightly packed copies: In the case where a type is tightly packed
(no padding bits), we can replace the memcpy with a copy_deref
instruction which the optimizer is far better at handling.
This has proven capable of getting rid of many of the memcpy instances
in some rather gnarly OpenCL C kernels I've been looking at, even after
coming out of LLVM's optimizer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
In 9f3c595dfc, we attempted to handle casts in opt_find_array_copies
but missed a critical case. In particular, in the case where we begin
finding a copy but then encounter a cast, we need to discard everything
which might alias that cast.
Fixes: 9f3c595dfc "nir/find_array_copies: Handle cast derefs"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6871>
This adds primarily two passes: One is a lowering pass which turns
these conversion intrinsics into a series of ALU ops. The other is an
optimization pass which attempt to simplify the conversion whenever
possible in the hopes that we can turn it into a "normal" conversion op
which doesn't need special treatment.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
Most of these were originally written by Daniel Stone in the Microsoft
ClOn12 branch, reworked by Jesse Natalie, fixed by Boris Brezillon, and
possibly touched by others along the way. Unfortunately, none of that
is in the commit history thanks to living in the CLOn12 branch.
I ported them to mesa master and further reworked things for better
cosmetics. In particular,
1. They now live in a builder helper rather than in vtn_alu.c.
2. Instead of looping inside each builder helper, we just trust NIR
vector instructions to handle vectors.
3. Lots of re-arranging of the helpers for clarity, better asserting,
and better re-use with the upcoming lowering pass.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
This new intrinsic is capable of handling the full range of conversions
from OpenCL including rounding modes and possible saturation. The
intention is that we'll emit this intrinsic directly from spirv_to_nir
and then lower it to ALU ops later.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
We're about to introduce conversion ops which are going to want two
different types. We may as well just split the one we have rather than
end up with three. There are a couple places where this is mildly
inconvenient but most of the time I find it to actually be nicer.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6945>
It turns out I had missed a case in my enumeration of why everything
currently was vec4-aligned.
Fixes a simple testcase of loading from a vec3[2] array in freedreno with
IR3_SHADER_DEBUG=nouboopt.
Initial shader-db results look devastating:
total instructions in shared programs: 8019997 -> 12829370 (59.97%)
total cat6 in shared programs: 87683 -> 145840 (66.33%)
Hopefully this will recover once we introduce the i/o vectorizer, but that
was blocked on getting the vec3 case fixed.
Reviewed-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6612>