This is what we do in the condition too, so it makes sense.
v2: Only compute without_array() once (Ilia).
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
According to the GLSL ES 3.20, GLSL 4.50, and GLSL 1.20 specs:
"To force all output variables to be invariant, use the pragma
#pragma STDGL invariant(all)
before all declarations in a shader."
Notably, this is only supposed to affect output variables. Furthermore,
"Only variables output from a shader can be candidates for invariance."
It looks like this has been wrong since we first supported the pragma in
2011 (commit 86b4398cd1).
Fixes dEQP-GLES2.functional.shaders.preprocessor.pragmas.pragma_fragment.
v2: Now that all cases are identical (other than compute shaders, which
have no output variables anyway), we can drop the switch statement
entirely. We also don't need the current_function == NULL check;
this was a hold over from when we had a single var_mode_out for both
function parameters and shader varyings, in the bad old days.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
The GL spec will soon be revised to clarify that a buffer binding for
a transform feedback buffer is only required if a variable is actually
defined to use the buffer binding point. Previously a declaration for
the default transform buffer would make it require a binding even if
nothing was declared to use the default buffer.
Affects:
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list
KHR-GL44/45.enhanced_layouts.xfb_stride_of_empty_list_and_api
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
Cc: mesa-stable@lists.freedesktop.org
I think it's more clear to only call emit_access once. The only
difference between the two calls is the value of size_mul used for the
offset parameter... but you really have to look at it to be sure.
The s/is_64bit/is_double/ change is because there are no int64_t or
uint64_t matrix types.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
I was going to squash this with the previous commit, but there's a lot
of churn in that commit.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Without this, the SPIR-V generator has to deal with a bunch of junk
like:
(swiz z (swiz xxx (swiz x (var_ref packed:binormal.z,light_dir))))
It seems better to cull that stuff out than to add code to deal with
it. The problem is the way swizzles to and from scalars have to be
handled in SPIR-V.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
... as can happen with various types like mat4, or else we'll smash the
stack writing past the end of components_local[].
Fixes: 5a0d3e1129 ("nir: Print the components referenced for split or
packed shader in/outs.")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Most of NIR doesn't allow doing array indexing on a vector (though it
does on a matrix). However, nir_lower_io handles it just fine and this
behavior is needed for shared variables in Vulkan. This commit makes
glsl_get_array_element do something sensible for vector types and makes
nir_validate happy with them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We were already validating that the parent type goes along with the
child type but we weren't actually validating that the parent type is
reasonable. This fixes that.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The GL_ARB_shader_ballot spec says that gl_SubGroupSizeARB is declared
as a uniform. This means that it cannot change across an invocation
such as a draw call or a compute dispatch. For compute shaders, we're
ok because we only ever use one dispatch size. For fragment, however,
the hardware dynamically chooses between SIMD8 and SIMD16 which violates
the spec. Instead, let's just pick a subgroup size based on the shader
stage. The fixed size we choose for compute shaders is a bit higher
than strictly needed but there's no real harm in that. The advantage is
that, if they do anything interesting with the value, NIR will see it as
an immediate and can optimize better.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Ballot intrinsics return a bitfield of subgroups. In GLSL and some
SPIR-V extensions, they return a uint64_t. In SPV_KHR_shader_ballot,
they return a uvec4. Also, some back-ends would rather pass around
32-bit values because it's easier than messing with 64-bit all the time.
To solve this mess, we make nir_lower_subgroups take a new parameter
called ballot_bit_size and it lowers whichever thing it gets in from the
source language (uint64_t or uvec4) to a scalar with the specified
number of bits. This replaces a chunk of the old lowering code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
The SUBGROUP_*_MASK system values are uint64_t when coming in from GLSL
but uvec4 when coming in from SPIR-V. Lowering based on type allows us
to nicely handle both.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This way they can return either a uvec4 or a uint64_t. At the moment,
this is a no-op since we still always return a uint64_t.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
This commit pulls nir_lower_read_invocations_to_scalar along with most
of the guts of nir_opt_intrinsics (which mostly does subgroup lowering)
into a new nir_lower_subgroups pass. There are various other bits of
subgroup lowering that we're going to want to do so it makes a bit more
sense to keep it all together in one pass. We also move it in i965 to
happen after nir_lower_system_values to ensure that because we want to
handle the subgroup mask system value intrinsics here.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
We're going to want subgroup ID for SPIR-V subgroups eventually anyway.
We really only want to push one and calculate the other from it. It
makes a bit more sense to push the subgroup ID because it's simpler to
calculate and because it's a real API thing. The only advantage to
pushing the base thread ID is to avoid a single SHL in the shader.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Previously, brw_nir_lower_intrinsics added the param and then emitted a
load_uniform intrinsic to load it directly. This commit switches things
over to use a specific NIR intrinsic for the thread id. The one thing I
don't like about this approach is that we have to copy thread_local_id
over to the new visitor in import_uniforms.
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Commit 259fc50545 added linker error for
mismatching uniform precision, as required by GLES 3.0 specification and
conformance test-suite.
Several Android applications, including Forge of Empires, have shaders
which violate this rule, on a dead varying that will be eliminated.
The problem affects a big number of applications using Cocos2D engine
and other GLES implementations accept this, this poses a serious
application compatibility issue.
Starting from GLSL ES 3.0, declarations with conflicting precision
qualifiers are explicitly prohibited. However GLSL ES 1.00 does not
clearly specify the behavior, except that
"Uniforms are defined to behave as if they are using the same storage in
the vertex and fragment processors and may be implemented this way.
If uniforms are used in both the vertex and fragment shaders, developers
should be warned if the precisions are different. Conversion of
precision should never be implicit."
The word "used" is not clear in this context and might refer to
1) declared (same as GLES 3.x)
2) referred after post-processing, or
3) linked after all optimizations are done.
Looking at existing applications, 2) or 3) seems to be widely adopted.
To avoid compatibility issues, turn the error into a warning if GLSL ES
version is lower than 3.0 and the data is dead in at least one of the
shaders.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97532
Signed-off-by: Tomasz Figa <tfiga@chromium.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The dynamic index of a vector (not array!) is lowered to a sequence of
conditional assignments. However, the interpolate_at_* expressions
require that the interpolant is an l-value of a shader input.
So instead of doing conditional assignments of parts of the shader input
and then interpolating that (which is nonsensical), we interpolate the
entire shader input and then do conditional assignments of the interpolated
result.
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
The intended rule has been clarified in GLSL 4.60, Section 8.13.2
(Interpolation Functions):
"For all of the interpolation functions, interpolant must be an l-value
from an in declaration; this can include a variable, a block or
structure member, an array element, or some combination of these.
Component selection operators (e.g., .xy) may be used when specifying
interpolant."
For members of interface blocks, var->data.must_be_shader_input must be
determined on-the-fly after lowering interface blocks, since we don't want
to disable varying packing for an entire block just because one input in it
is used in interpolateAt*.
v2: keep setting must_be_shader_input in ast_function (Ian)
v3: follow the relaxed rule of GLSL 4.60
v4: only apply the relaxed rules to desktop GL
(the ES WG decided that the relaxed rules may apply in a future version
but not retroactively; see also
dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_centroid.negative.*)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101378
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Fixes many GL 4.5 CTS blend tests, such as:
* GL45-CTS.blend_equation_advanced.extension_directive_enable
* GL45-CTS.blend_equation_advanced.extension_directive_warn
* GL45-CTS.blend_equation_advanced.blend_all.GL_MULTIPLY_KHR_all_qualifier
* GL45-CTS.blend_equation_advanced.blend_specific.GL_COLORBURN_KHR
v2:
* Directly save the BlendSupport field to avoid potentially including
a pointer in the future in the structure is updated. (tarceri)
Cc: Timothy Arceri <tarceri@itsqueeze.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This will be used to disable the shader cache when xfb is enabled
via the api as we don't currently allow for it when generating the
sha for the shader.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
v2 (Jason Ekstrand):
- Various whitespace cleanups
- Add helpers for reading/writing objects
- Rework derefs
- [de]serialize nir_shader::num_*
- Fix uses of blob_reserve_bytes
- Use a bitfield struct for packing tex_instr data
v3:
- Zero nir_variable struct on deserialization. (Jordan)
- Allow nir_serialize.h to be included in C++. (Jordan)
- Handle NULL info.name. (Jason)
- Set info.name to NULL when name is NULL. (Jordan)
Acked-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Previously the values were calculated by just shifting ~0 by the
invocation ID. This would end up including bits that are higher than
gl_SubGroupSizeARB. The corresponding CTS test effectively requires that
these high bits be zero so it was failing. There is a Piglit test as
well but this appears to checking the wrong values so it passes.
For the two greater-than bitmasks, this patch adds an extra mask with
(~0>>(64-gl_SubGroupSizeARB)) to force these bits to zero.
Fixes: KHR-GL45.shader_ballot_tests.ShaderBallotBitmasks
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102680#c3
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Neil Roberts <nroberts@igalia.com>
>From GLSL 4.5 spec, section "7.1 Built-In Language Variables", page 130 of
the PDF states:
"If multiple shaders using members of a built-in block belonging to
the same interface are linked together in the same program, they must
all redeclare the built-in block in the same way, as described in
section 4.3.9 “Interface Blocks” for interface-block matching, or a
link-time error will result."
Fixes:
* GL45-CTS.CommonBugs.CommonBug_PerVertexValidation
v2 (Neil Roberts):
Explicitly look for gl_PerVertex in the symbol tables instead of
waiting to find a variable in the interface.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102677
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
This effectively factorizes a couple of similar routines.
v2 (Neil Roberts): Non-trivial rebase on master
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
Some symbols gathered in the symbols table during parsing are needed
later for the compile and link stages, so they are moved along the
process. Currently, only functions and non-temporary variables are
copied between symbol tables. However, the built-in gl_PerVertex
interface blocks are also needed during the linking stage (the last
step), to match re-declared blocks of inter-stage shaders.
This patch adds a new utility function that will factorize current code
that copies functions and variables between two symbol tables, and in
addition will copy explicitly declared gl_PerVertex blocks too.
The function will be used in a subsequent patch.
v2 (Neil Roberts):
Allow the src symbol table to be NULL and explicitly copy the
gl_PerVertex symbols in case they are not referenced in the exec_list.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eduardo Lima Mitev <elima@igalia.com>
Signed-off-by: Neil Roberts <nroberts@igalia.com>
NIR does not have these instructions. TGSI and Mesa IR both implement
them using < and >=, repsectively. Removing them deletes a bunch of
code and means I don't have to add code to the SPIR-V generator for
them.
v2: Rebase on 2+ years of change... and fix a major bug added in the
rebase.
text data bss dec hex filename
8255291 268856 294072 8818219 868e2b 32-bit i965_dri.so before
8254235 268856 294072 8817163 868a0b 32-bit i965_dri.so after
7815339 345592 420592 8581523 82f193 64-bit i965_dri.so before
7813995 345560 420592 8580147 82ec33 64-bit i965_dri.so after
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>