All of the builtin variables mentioned in the GLSL ES spec and the
extensions include a precision declaration which is different
depending on what the variable is used for. This patch makes it set
the corresponding precision when creating the variable. This will make
a difference once we start using the precision information for
optimisation. Previously all of the builtin variables ended up with a
precision of NONE.
v2: Made gl_PointSize and gl_FragCoord highp since GLSL ES 3.00. Fixed
gl_MaxViewPorts to always be highp. (Eric Anholt)
Reviewed-by: Eric Anholt <eric@anholt.net>
Drivers only use lower_io for modes where pointers don't have a
meaningful value, and dereferences can always be traced back to a
variable. But there can be other modes, like global mode with
VK_EXT_buffer_device_address, where pointers cannot be traced back to a
variable, and lower_io would segfault on loads/stores of these since
nir_deref_instr_get_variable() would return NULL.
Just use the mode on the deref itself to filter out these modes before
we try to get the variable.
Fixes: 118a66df99 ("radv: Use NIR barycentric coordinates")
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This commit re-plumbs all of nir_loop_analyze to use nir_ssa_scalar for
all intermediate values so that we can properly handle swizzles. Even
though if conditions are required to be scalars, they may still consume
swizzles so you could have ((a.yzw < b.zzx).xz && c.xx).y == 0 as your
loop termination condition. The old code would just bail the moment it
saw its first non-zero swizzle but we can now properly chase the scalar
from the if condition to all the way to a, b, and c.
Shader-db results on Kaby Lake:
total loops in shared programs: 4388 -> 4364 (-0.55%)
loops in affected programs: 29 -> 5 (-82.76%)
helped: 29
HURT: 5
Shader-db results on Haswell:
total loops in shared programs: 4370 -> 4373 (0.07%)
loops in affected programs: 2 -> 5 (150.00%)
helped: 2
HURT: 5
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This commit reworks both get_induction_and_limit_vars() and
try_find_trip_count_vars_in_iand to return true on success and not
modify their output parameters on failure. This makes their callers
significantly simpler.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
There are various cases in which we want to chase SSA values through ALU
ops ranging from hand-written optimizations to back-end translation
code. In all these cases, it can be very tricky to do properly because
of swizzles. This set of helpers lets you easily work with a single
component of an SSA def and chase through ALU ops safely.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
None of the current code knows what to do with swizzles. Take the safe
option for now and bail if we see one. This does have a small shader-db
impact but it is at least safe.
Shader-db results on Kaby Lake:
total loops in shared programs: 4364 -> 4388 (0.55%)
loops in affected programs: 5 -> 29 (480.00%)
helped: 5
HURT: 29
Shader-db results on Haswell:
total loops in shared programs: 4373 -> 4370 (-0.07%)
loops in affected programs: 5 -> 2 (-60.00%)
helped: 5
HURT: 2
Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The current code assumes everything is 32-bit which is very likely true
but not guaranteed by any means. Instead, use nir_eval_const_opcode to
do the calculations in a bit-size-agnostic way. We also use the new
constant constructors to build the correct size constants.
Fixes: 6772a17acc "nir: Add a loop analysis pass"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
One issue was that the original version didn't check that swizzles
matched when comparing ALU instructions so it could end up matching
very different instructions. Using the nir_instrs_equal function from
nir_instr_set.c which we use for CSE should be much more reliable.
Another was that the loop assumes it will only run two iterations which
may not be true. If there's something which guarantees that this case
only happens for phis after ifs, it wasn't documented.
Fixes: 9e6b39e1d5 "nir: detect more induction variables"
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Now that we have the nir_const_value_as_* helpers, every one of these
functions is effectively the same except for the suffix they use so we
can easily define them with a repeated macro. This also means that
they're inline and the fact that the nir_src is being passed by-value
should no longer really hurt anything.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
This gives more flexibility than the normal store_deref/store_output
versions (particularly, it allows us to abuse the type system in awful
ways, which is necessary for efficient format conversion in blend
shaders.)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
We already have nir_imm_float16 and nir_imm_vec4; let's add the ability
to easily make immediate fp16 vectors as well, now that fp16 support is
maturing in NIR/GLSL.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Existing users only operate on instructions with SSA destinations. Some
later patches add new direct calls and indirect calls (via existing NIR
functions) on instructions after going out of SSA. At the very least,
these calls are added by:
intel/vec4: Try to emit a VF source in try_immediate_source
intel/vec4: Try to emit a single load for multiple 3-src instruction operands
The first commit adds direct calls, and the second adds calls via
nir_alu_srcs_equal and nir_alu_srcs_negative_equal.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
When I added this function, I was not sure if swizzles of immediate
values were a thing that occurred in NIR. The only existing user of
these functions is the partial redundancy elimination for compares.
Since comparison instructions are inherently scalar, this does not
occur.
However, a couple later patches, "nir/algebraic: Recognize
open-coded flrp(-1, 1, a) and flrp(1, -1, a)" combined with "intel/vec4:
Try to emit a single load for multiple 3-src instruction operands",
collaborate to create a few thousand instances.
No shader-db changes on any Intel platform.
v2: Handle the swizzle in nir_alu_srcs_negative_equal and leave
nir_const_value_negative_equal unchanged. Suggested by Jason.
v3: Correctly handle write masks. Add note (and assertion) that the
caller is responsible for various compatibility checks. The single
existing caller only calls this for combinations of scalar fadd and
float comparison instructions, so all of the requirements are met. A
later patch (intel/vec4: Try to emit a single load for multiple 3-src
instruction operands) will call this for sources of the same
instruction, so all of the requirements are met.
v4: Add unit test for nir_opt_comparison_pre that is fixed by this
commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Each tests has a comment with the expected before and after NIR. The
tests don't actually check this. The tests only check whether or not
the optimization pass reported progress. I couldn't think of a robust,
future-proof way to check the before and after code.
Reviewed-by: Matt Turner <mattst88@gmail.com>
From SPV_EXT_demote_to_helper_invocation. Demote will be implemented
as a variant of discard, so mark uses_discard if it is used.
v2: Add CAN_ELIMINATE flag to the new intrinsic. (Jason)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This is nice to have with radeonsi, where color varyings are handled
specially to avoid recompiles.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Pretty much every driver using nir_lower_io_to_temporaries followed by
nir_lower_io is going to want this. In particular, radv and radeonsi in
the next commits.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
These weren't properly supported. This does pretty much the same thing
that the radv code did.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Right now nir_copy_prop_vars is effectively undoing
nir_lower_io_to_temporaries for inputs by propagating the original
variable through the copy created in lower_io_to_temporaries. A
theoretical variable coalescing pass would have the same issue with
output variables, although that doesn't exist yet. To fix this, add a
new bit to nir_variable, and disable copy propagation when it's set.
This doesn't seem to affect any drivers now, probably since since no one
uses lower_io_to_temporaries for inputs as well as copy_prop_vars, but
it will fix radv once we flip on lower_io_to_temporaries for fs inputs.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It was double-counting cases where multiple variables were assigned to
the same slot, and not handling the case where the last variable is a
compact variable.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
These are used in Vulkan for clip/cull distances, instead of the GLSL
lowering when the clip/cull arrays are shared.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
It isn't really doing anything Gallium-specific, and it's needed for
handling component packing, overlapping, etc.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
load_fragcoord is already handled in common code for radeonsi, so we
don't need to do anything to handle it. However, there were some passes
creating NIR with the varying, so we switch them over to the sysval. In
the case of nir_lower_input_attachments which is used by both radv and
anv, we add handling for both until intel switches to using a sysval.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
On AMD, FragCoord should be a sysval because it is handled separately
from all the other inputs. We were already doing this in radeonsi, but
we weren't doing it with radv. It'll be much more annoying to handle
VARYING_SLOT_POS in fragment shaders when we let NIR lower FS inputs for
us, so here we add an option so that radv can get it as a system value.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
From OpPtrAccessChain description in the SPIR-V spec (1.4 rev 1):
For objects in the Uniform, StorageBuffer, or PushConstant storage
classes, the element’s address or location is calculated using a
stride, which will be the Base-type’s Array Stride when the Base
type is decorated with ArrayStride. For all other objects, the
implementation will calculate the element’s address or location.
For non-CL shaders the driver should layout the Workgroup storage
class, so override any explicitly set ArrayStride in the shader. This
currently fixes only the lower_workgroup_access_to_offsets case, which
is used by anv.
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
When using ARB_gl_spirv, the block names are optional and the uniform
blocks are referred using Bindings instead. Teach
gl_nir_lower_buffers to handle those.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Until now, we were using the uniform explicit location to check if the
current nir variable was already processed while adding entries on the
uniform storage. But for UBOs/SSBOs, entries are added too but we lack
a explicit location.
For those we need to rely on the UBO/SSBO binding and the unifor
storage block_index. In that case several uniforms would need to be
updated at once.
v2: (from Timothy review)
* Improve wording and fix typos of some long comments.
* Rename update_uniform_storage for mark_stage_as_active
v3: (from cmarcelo review)
* Fixed some comment typos
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Specifically, offset, stride (coming from arrays or matrices) and
row_major.
On GLSL, most of that info is computed using the layout qualifier, but
on ARB_gl_spirv they are explicit, and for Mesa, included on the
glsl_type.
From ARB_gl_spirv spec:
"Mapping of layouts
std140/std430 -> explicit *Offset*, *ArrayStride*, and
*MatrixStride* Decoration on struct members""
"7.6.2.spv SPIR-V Uniform Offsets and Strides
The SPIR-V decorations *GLSLShared* or *GLSLPacked* must not be
used. A variable in the *Uniform* Storage Class decorated as a
*Block* must be explicitly laid out using the *Offset*,
*ArrayStride*, and *MatrixStride* decorations"
For offset we needed to include the parent and index_in_parent while
processing the type, as the offset is maintained on glsl_struct_field
of the parent type, not on the type itself.
v2: Fix the default values for MATRIX_STRIDE, ARRAY_STRIDE and
ROW_MAJOR when the variable is not backed by a buffer object
(Antia Puentes).
v3: Update after Jason series "SPIR-V: Use NIR deref instructions for
UBO/SSBO access" that included just one explicit stride, instead
of a previous patch we wrote that had matrix_stride and
array_stride (Alejandro)
Signed-off-by: Antia Puentes <apuentes@igalia.com>
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
For this interfaces, the inner members are added only once as uniforms
or resources, in opposite to other cases, like a uniform array of
structs.
For those guessing why a issue (16) from ARB_program_interface_query
was used, instead of a quote of the core spec: The core spec is not
really clear about how members of arrays of blocks should be
enumerated.
On GLSL this was also problematic, specially when we were trying to
pass the 4.5 CTS tests. See commit "glsl: Fix program interface
queries relating to interface blocks"
(4c4d9e4f03), as a reference. That one
also needed to rely on issue (16) to justify the change, pointing that
the core spec needs to be clarified.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Adding the ability to link uniform blocks and shader storage blocks
using NIR, intended for ARB_gl_spirv support. Among other things, this
linking needs to take into account that everything should work without
names, as they could be not present, while the GLSL IR uniform block
linking was wrote with the names on its core.
The other major difference compared with the GLSL IR linker is that we
don't deal with layouts. There are no references to std140, std430,
etc. Layouts are expressed through explicit offset, array stride and
matrix stride. That simplifies how the buffer size are computed. But
also means that we couldn't use the existing methods at glsl_types, so
we needed to implement new methods.
It is worth to note that this linking do a iteration over the
glsl_types, similarly to what the linking uniforms do. A possible
future improvement would be refactor both cases to try to share more
code that it sharing right now. On GLSL IR there are a class visitor,
specialized on each case, for that sharing. As adding a class visitor
on C would more complicated, for now we are just iterating on both.
Signed-off-by: Alejandro Piñeiro <apinheiro@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Signed-off-by: Antia Puentes <apuentes@igalia.com>
v2: (from Timothy review)
* Fix variable name convention
* Stop to use _function_name convention
* Don't use // for comments
* "nir/linker: Keep track of the stages referencing an UBO/SSBO"
squashed with this patch
v3: (from Caio review)
* Don't delete the linked shader on failure
* Use rzalloc_array to avoid some explicit initializations
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>