nir_variable_create already inserts it in the right list for us so
inserting it again causes a linked list corruption.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This has the better name to use. Aparently, sh->Name is usually 0.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Neil Roberts <neil@linux.intel.com>
For glBlendFunc and glBlendFuncSeparate(), the _UsesDualSrc flag
will be the same for all buffers, so no need to compute it N times.
Reviewed-by: Eric Anholt <eric@anholt.net>
A redundant call to glBlendFuncSeparateiARB() is more likely than getting
invalid values, so do the no-op check first.
Reviewed-by: Eric Anholt <eric@anholt.net>
Streamline the checking for no state change in _mesa_BlendFuncSeparate()
(and _mesa_BlendFunc()). If _BlendFuncPerBuffer is false, we only need
to check the 0th buffer state. Move argument validation after the no-op
check.
I'm looking at an app that issues about 1000 redundant glBlendFunc()
calls per frame!
Reviewed-by: Eric Anholt <eric@anholt.net>
We can skip to the end of _mesa_update_state_locked() if only the
_NEW_LINE flag is set since none of the derived state depends on it
(just like _NEW_CURRENT_ATTRIB). Note that we still call the
ctx->Driver.UpdateState() function, of course.
v2: use bitmask-based test, per Eric.
Reviewed-by: Eric Anholt <eric@anholt.net>
Changing the matrix mode alone has no effect on rendering and does
not need to trigger a flush or state validation.
Reviewed-by: Eric Anholt <eric@anholt.net>
Rather than accepting a void pointer, only to down and up cast around
it, convert the function to take the base (struct gl_program) pointer.
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
V3: use a check_*_allowed style function for requirements checking
rather than has_* which doesn't encapsulate the error message
V2: add missing 's' to the extension name in error messages
and add decimal place in version string
Reviewed-by: Marta Lofstedt <marta.lofstedt@intel.com>
This adds support for setting up the UniformBlock structures for AoA
and also adds support for resizing AoA blocks with a packed layout.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Add support for setting the max access of an unsized member
of an interface array of arrays.
For example ifc[j][k].foo[i] where foo is unsized.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This marks all counters in an AoA as active.
For AoA all but the innermost array are treated as separate
counters/uniforms. The Nvidia binary also goes further and
finds inactive counters in the AoA, in future we should do
this too, however this gets things working for the time being.
This change also removes the use of UniformHash for atomic counters,
this avoids having to generate name strings used as hash keys.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
This allows the correct offset to be calculated for use in indirect
indexing of samplers.
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Currently only one ir assignment is removed for each var in a single
dead code optimisation pass. This means if a var has more than one
assignment, then it requires all the glsl optimisations to be run again
for each additional assignment to be removed.
Another pass is also required to remove the variable itself.
With this change all assignments and the variable are removed in a single
pass.
Some of the arrays of arrays conformance tests that were looping
through 8 dimensions ended up with a var with hundreds of assignments.
This change helps ES31-CTS.arrays_of_arrays.InteractionFunctionCalls1
go from around 3 min 20 sec -> 2 min
ES31-CTS.arrays_of_arrays.InteractionFunctionCalls2 went from
around 9 min 20 sec to 7 min 30 sec
I had difficulty getting the public shader-db to give a consistent result
with or without this change but the results seemed unchanged at between
15-20 seconds.
Thomas Helland measured change with shader-db on his machine from
approx 117 secs to 112 secs.
V3: Simplify freeing of list as suggested by Ian, and spelling fixes.
V2: Add assert to be sure references are counted before assignments.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-By: Thomas Helland <thomashelland90@gmail.com>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
V3: move patch after fixes to ast for AoA and add const to helper
as suggested by Ian
V2: move single dimensional array detection into a helper
Signed-off-by: Timothy Arceri <t_arceri@yahoo.com.au>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
V3: Fix setting of data.location for struct AoA UBO members
V2: Handle arrays of arrays in the same way structures are handled
The ARB_arrays_of_arrays spec doesn't give very many details on how
AoA uniforms are intended to be implemented. However in the
ARB_program_interface_query spec there are details that show AoA are
intended to be handled in a similar way to structs.
Issues 7 from the ARB_program_interface_query spec:
We define rules consistent with our enumeration rules for
other complex types. For existing one-dimensional arrays, we enumerate
a single entry if the array is an array of basic types, or separate
entries for each array element if the array is an array of structures.
We follow similar rules here. For a uniform array such as:
uniform vec4 a[5][4][3];
we enumerate twenty different entries ("a[0][0][0]" through
"a[4][3][0]"), each of which is treated as an array with three elements.
This is morally equivalent to what you'd get if you worked around the
limitation in current GLSL via:
struct ArrayBottom { vec4 c[3]; };
struct ArrayMid { ArrayBottom b[3]; };
uniform ArrayMid a[5];
which would enumerate "a[0].b[0].c[0]" through "a[4].b[3].c[0]".
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Now that we have anv_nir_apply_pipeline_layout, we can hand the backend
compiler intrinsics and texture instructions that use a flat buffer index
just like it wants. There's no longer any reason for any of these hacks.
This patch reworks a bunch of stuff in the way we do descriptor set
layouts. Our previous approach had a couple of problems. First, it was
based on a misunderstanding of arrays in descriptor sets. Second, it
didn't properly handle descriptor sets where some bindings were missing
stages. The new apporach should be correct and also makes some operations,
particularly those on the hot-path, a bit easier.
We use the descriptor set layout for four things:
1) To determine the map from bindings to the actual flattened descriptor
set in vkUpdateDescriptorSets().
2) To determine the descriptor <-> binding table entry mapping to use in
anv_cmd_buffer_flush_descriptor_sets().
3) To determine the mappings of dynamic indices.
4) To determine the (set, binding, array index) -> binding table entry
mapping inside of shaders.
The new approach is directly taylored towards these operations.
Requested by developers outside Intel.
During the driver's pre-release development, let's make the README easy
to find for external experimenters. Keep it at the top of the source
tree.
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates
an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
The ES31-CTS.compute_shader.pipeline-compute-chain test case generates
an unsigned index by using gl_LocalInvocationID.x and
gl_LocalInvocationID.y as array indices.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>