We use the same nir_deref_mode_is_in_set helper that we use in
nir_lower_vars_to_explicit_types for the same reason. If there are any
generic pointers in play, we have to lower all generic pointer modes at
the same time or else we risk types getting out-of-sync.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
We can only lower a deref to SSA in this pass if it's guaranteed to be
nir_var_function_temp. We already flag any variables with complex uses
(i.e. casts) as not being lowerable and refuse to lower any derefs to
them so we don't have to worry about false negatives.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
For non-explicit nir_lower_io, we use nir_deref_mode_is because there's
no way it works for generic pointers. For nir_lower_vars_to_explicit_types,
and nir_lower_explicit_io, we use nir_deref_mode_is_in_set to ensure we
never get type confusion. For generic pointers, this means that they
must be called with the full set of generic pointer modes.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
NIR derefs currently have exactly one variable mode. This is about to
change so we can handle OpenCL generic pointers. In order to transition
safely, we need to audit every deref->mode check. This commit adds a
set of helpers that provide more nuanced mode checks and converts most
of NIR to use them.
For simple cases, we add nir_deref_mode_is and nir_deref_mode_is_one_of
helpers. These can be used in passes which don't have to bother with
generic pointers and just want to know what mode a thing is. If the
pass ever encounters generic pointers in a way that this check would be
unsafe, it will assert-fail to alert developers that they need to think
harder about things and fix the pass.
For more complex passes which require a more nuanced understanding of
modes, we add nir_deref_mode_may_be and nir_deref_mode_must_be helpers
which accurately describe the compiler's best knowledge about the given
deref. Unfortunately, we may not be able to exactly identify the mode
in a generic pointers scenario so we have to be very careful when we use
these. Conversion of these passes is left to later commits.
For the case of mass lowering of a particular mode (nir_lower_explicit_io
is one good example), we add nir_deref_mode_is_in_set. This is also
pretty assert-happy like nir_deref_mode_is but is for a set containment
comparison on deref modes where you expect the deref to either be all-in
or all-out.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
We already have the variable so we know the mode exactly. Just use that
instead of the deref mode. If these paths ever have to handle variable
pointers (not likely since they're OpenGL-specific), we can fix them to
handle crazy deref modes then.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
In split_var_list_structs where we initalize the splitting, we already
use get_complex_used_vars to avoid splitting any variables that have a
complex use. However, we weren't actually handling the complex uses
properly in the case where we can't actually find the variable.
Fixes: f1cb3348f1 "nir/split_vars: Properly bail in the presence of ..."
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
I can't think of any reason why shared and output aren't in this list.
The real thing we're trying to do is avoid premature scalarization
because of a shader or function temporary variable because we might
lower it to something we don't want scalarized later. Also fix the
version we copy+pasted into GCM.
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6332>
This will create code that is easier to combine into MADs/FMA when the
last component of the vector is 1.0.
nir_opt_algebraic_late has an optimization to do something similar but it
only works for inexact code, if the multiplication-by-1 optimization is
done before it and if the backend enables fuse_ffma.
fossil-db (Navi):
Totals from 4296 (3.75% of 114665) affected shaders:
SGPRs: 283468 -> 283764 (+0.10%); split: -0.02%, +0.12%
VGPRs: 172868 -> 172904 (+0.02%); split: -0.09%, +0.11%
CodeSize: 14045312 -> 14027128 (-0.13%); split: -0.15%, +0.02%
MaxWaves: 59285 -> 59282 (-0.01%); split: +0.04%, -0.05%
Instrs: 2703507 -> 2683187 (-0.75%); split: -0.76%, +0.00%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>
This will create code that is easier to combine into MADs/FMA when the
last component is 1.0.
nir_opt_algebraic_late has an optimization to do something similar but it
only works for inexact code, if the multiplication-by-1 optimization is
done before it and if the backend enables fuse_ffma.
fossil-db (Navi):
Totals from 85583 (74.64% of 114665) affected shaders:
SGPRs: 4556060 -> 4558596 (+0.06%); split: -0.07%, +0.12%
VGPRs: 3315060 -> 3312984 (-0.06%); split: -0.23%, +0.17%
SpillSGPRs: 13552 -> 13553 (+0.01%)
CodeSize: 184962756 -> 184431388 (-0.29%); split: -0.32%, +0.03%
MaxWaves: 1208693 -> 1209361 (+0.06%); split: +0.17%, -0.11%
Instrs: 35678819 -> 35361617 (-0.89%); split: -0.91%, +0.02%
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5631>
After we lowered `return` into `break` - the control flow is changed and
the block with this change has a new successor, which means that in this
new successor phis should have additional source.
Since the instructions that use phis in the successor are predicated -
it's ok for a new phi source to be undef.
If `return` is lowered in a nested loop, `break` is inserted in the outer
loops, so all new blocks with break require the same changes to phis
described above.
Examples of NIR before lowering:
block block_0:
loop {
block block_1:
if ssa_2 {
block block_2:
return
// succs: block_6
} else {
block block_2:
break;
// succs: block_5
}
block block_4:
}
block block_5:
// preds: block_3
vec1 32 ssa_4 = phi block_3: ssa_1
// succs: block_6
block block_6:
Here converting return to break should add block_2 to the phis
of block_5.
block block_0:
loop {
block block_1:
loop {
block block_2:
if ssa_2 {
block block_3:
return
// succs: block_8
} else {
block block_4:
break;
// succs: block_6
}
block block_5:
}
block block_6:
break;
// succs: block_7
}
block block_7:
// preds: block_6
vec1 32 ssa_4 = phi block_6: ssa_1
// succs: block_8
block block_8:
Here converting return to break will insert conditional break in
the outer loop, changing block_6 predcessors.
Cc: <mesa-stable@lists.freedesktop.org>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3322
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/3498
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6186>
I think the rationale for not setting the size for inputs is that
when passed between geometry stages the clip and cull distances are
supposed to be treated like any other varying. However, this isn't 100%
the case for the FS, since when it's read by the FS it's also used by
the fixed-function stage. In freedreno we setup varying locations when
compiling the FS, and then tack on VS-only outputs like gl_Position at
the end. Furthermore there's code to compact input locations based on
what's actually read. But this compaction can't happen for clip and cull
distances, because then we won't have space for components that are only
read by the clipper. So, we need to know the original number of
components for both arrays. Modify this pass so that we don't have to go
digging around for it ourselves.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6959>
This patch fixes mesa compiler crash in i965 on shaders like the following one:
```
in VS_OUTPUT {
mat4 data;
} vs_output;
out vec4 fs_output;
vec4 convert(in float val) {
return vec4(val);
}
void main()
{
fs_output = vec4(0.0);
for (int a = -1; a < 5; a++) {
for (int b = -1; b < 5; b++) {
fs_output += convert(vs_output.data[b][a]);
}
}
}
```
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior....
Out-of-bounds reads return undefined values, which
include values from other variables of the active program or zero.
Out-of-bounds writes may be discarded or overwrite
other variables of the active program.
GL_KHR_robustness and GL_ARB_robustness encourage us to return zero
for reads.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6560>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member buffer_access_type is not
initialized in this constructor nor in any functions that it
calls.
uninit_member: Non-static class member progress is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7243>
All these instructions replicate the result of a N-component dot-product
to a vec4. Naming them fdot_replicatedN gives the impression that are
some sort of abstract dot-product that replicates the result to a vecN.
They also deviate from fdph_replicated... which nobody would reasonably
consider naming fdot_replicatedh.
Naming these opcodes fdotN_replicated more closely matches what they
are, and it matches the pattern of fdph_replicated.
I believe that the only reason these opcodes were named this way was
because it simplified the implementation of the binop_reduce function in
nir_opcodes.py. I made some fairly simple changes to that function, and
I think the end result is ok.
The bulk of the changes come from the sed rename:
sed --in-place -e 's/fdot_replicated\([234]\)/fdot\1_replicated/g' \
$(grep -r 'fdot_replicated[234]' src/)
v2: Use a named parameter to binop_reduce instead of using
isinstance(name, str). Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5725>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member found_unsupported_op is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member found_expensive_op is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member found_dynamic_arrayref is not
initialized in this constructor nor in any functions that it calls.
uninit_member: Non-static class member is_then is not initialized in
this constructor nor in any functions that it calls.
uninit_member: Non-static class member then_cost is not initialized in
this constructor nor in any functions that it calls.
uninit_member: Non-static class member else_cost is not initialized in
this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7228>
If a pass returning boolean progress reports no change, we shouldn't need
to re-validate. If a pass breaks the NIR but also fails to report
progress correctly, it would be up to the next pass to catch that.
This should hopefully help with test timeouts on
KHR-GL33.texture_swizzle.functional since switching softpipe to
nir-to-tgsi and enabling NIR validation in CI (27s to 20s on my system).
Suggested-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-By: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7239>
live_index had two things going on: 0 meant the instr was an undef and
always dead, and otherwise ssa defs had increasing numbers by instruction
order. We already have a field in the instruction for storing instruction
order, and ssa defs don't need that number to be contiguous (if you want a
compact per-ssa-def number, use ssa->index after reindexing).
We don't use ssa->index for this, because reindexing those would change
nir_print, and that would be rude to people trying to track what's
happening in optimization passes.
This openend up a hole in nir_ssa_def, so we move nir_ssa_def->index
toward the end to shrink the struct from 64 bytes to 56.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3395>