Commit Graph

2199 Commits

Author SHA1 Message Date
Eric Anholt 035fd4fb9f nir/lower_clip: Fix picking of unused driver locations.
This fixes things when the last input/output is a struct or matrix.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4670>
2020-04-23 18:52:46 +00:00
Eric Anholt 91668ae839 nir/lower_two_sided_color: Fix picking of new driver location.
We have shader->num_inputs for "last used input + 1" already, which
respects struct/matrix varyings.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4670>
2020-04-23 18:52:46 +00:00
Gert Wollny 49ce749d0e nir: Add umad24 and umul24 opcodes
So far only the singed versions are defined.

v2: Make umad24 and umul24 non-driver specific (Eric Anholt)

v3: Take care of nir_builder and automatic lowering of the
    opcodes if they are not supported by the backend.

Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>
2020-04-23 18:23:04 +00:00
Gert Wollny 42aa348dad nir: Add r600 specific intrinsics for tesselation shader IO
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610>
2020-04-23 18:23:04 +00:00
Rhys Perry 32d871b48f nir/algebraic: don't undo lowering of 8/16-bit comparisons to 32-bit
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4387>
2020-04-23 10:57:38 +00:00
Rhys Perry 6d79298992 nir/lower_bit_size: fix lowering of {imul,umul}_high
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4387>
2020-04-23 10:57:38 +00:00
Rhys Perry 715ef95700 nir/lower_bit_size: fix lowering of shifts
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4387>
2020-04-23 10:57:38 +00:00
Kenneth Graunke 155bb74ea9 nir: Actually do load/store vectorization beyond vec2
nir_opt_load_store_vectorize has an is_strided_vector() function that
looks for types with weird explicit strides.  It does so by comparing
the explicit stride against the type-size-derived typical stride.

This had a subtle bug.  Simple vector types (vec2/3/4) have no explicit
stride, so glsl_get_explicit_stride() returns 0.  This never matches the
typical stride for a vector, so is_strided_vector() would return true
for basically any vector type, causing the vectorizer to bail.

I found this by looking at a compute shader with scalar SSBO loads at
offsets 0x220, 0x224, 0x228, 0x22c.  nir_opt_load_store_vectorize would
properly vectorize the first two into a vec2 load, but would refuse to
extend it to a vec3 and ultimately vec4 load because is_strided_vector()
saw a vec2 and freaked out.

Neither ACO nor ANV do load/store vectorization before lowering derefs,
so this shouldn't affect them.  However, I'd like to fix this bug to
avoid the trap for anyone who decides to in the future.  In a branch
where anv used this lowering, this cut an additional 38% of the send
messages in the shader by properly vectorizing more things.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4255>
2020-04-22 21:22:36 -07:00
Alejandro Piñeiro 9fd180394b nir: add nir_tex_instr_need_sampler helper
That is basically nir_tex_instr sampler_index documentation comment
expressed as a helper.

Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677>
2020-04-22 23:43:18 +02:00
Dylan Baker 8e3696137f remove final imports.h and imports.c bits
This moves the fi_types to a new mesa_private.h and removes the
imports.c file. The vast majority of this patch is just removing
pound includes of imports.h and fixing up the recursive includes.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024>
2020-04-21 11:09:04 -07:00
Jason Ekstrand 7c43b8ce1b nir: Delete the fnoise opcodes
As of the previous commit, they are never used.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624>
2020-04-21 06:16:13 +00:00
Jonathan Marek 71820c6b02 nir: convert_ycbcr: preserve alpha channel
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: D Scott Phillips <d.scott.phillips@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
2020-04-20 22:01:43 +00:00
Jonathan Marek f8558fb1ce nir: add common convert_ycbcr for vulkan csc
Copied from anv, replaced state with passing model/range directly.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: D Scott Phillips <d.scott.phillips@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528>
2020-04-20 22:01:43 +00:00
Dave Airlie c2d8a4bf17 nir/linking: fix issue with two compact variables in a row. (v2)
If we have a clip dist float[1] compact followed by a tess factor
float[2] we don't want to overlap them, but the partial check
only happens for non-compact vars.

This fixes some issues seen with my sw vulkan layer with
dEQP-VK.clipping.user_defined.clip_distance*

v2: v1 failed with clip/cull mixtures, since in that
case the cull has a location_frac to follow after the clip
so only reset if we get a location_frac of 0 in a subsequent
clip var

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4635>
2020-04-20 21:08:54 +00:00
Samuel Pitoiset 59427b6d1d nir/opt_algebraic: lower 64-bit fmin3/fmax3/fmed3
This unconditionally lowers 64-bit fmin3/fmax3/fmed3 because
AMD hardware doesn't have native instructions, and no drivers
except RADV uses these instructions.

Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.*.f64.*
with ACO.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4570>
2020-04-20 06:59:47 +00:00
Samuel Pitoiset eed0ace466 nir/lower_int64: lower imin3/imax3/umin3/umax3/imed3/umed3
Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.*.i64.*
with ACO because this backend compiler expects most of the 64-bit
operations to be lowered.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4570>
2020-04-20 06:59:47 +00:00
Timothy Arceri 839818332c nir/gcm: dont move movs unless we can replace them later with their src
This helps us avoid moving the movs outside if branches when there
src can't be scalarized.

For example it avoids:

   vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
   if ... {
      r0 = imov ssa_7.z
      r1 = imov ssa_7.y
      r2 = imov ssa_7.x
      r3 = imov ssa_7.w
      ...
   } else {
      ...
      if ... {
         r0 = imov ssa_7.x
         r1 = imov ssa_7.w
         ...
      else {
         r0 = imov ssa_7.z
         r1 = imov ssa_7.y
         ...
      }
      r2 = imov ssa_7.x
      r3 = imov ssa_7.w
   }
   ...
   vec4 32 ssa_36 = vec4 r0, r1, r2, r3

Becoming something like:

   vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
   r0 = imov ssa_7.z
   r1 = imov ssa_7.y
   r2 = imov ssa_7.x
   r3 = imov ssa_7.w

   if ... {
      ...
   } else {
      if ... {
         r0 = imov r2
         r1 = imov r3
         ...
      else {
         ...
      }
      ...
   }

While this is has a smaller instruction count it requires more work
for the same result. With more complex examples we can also end up
shuffling the registers around in a way that requires more registers
to use as temps so that we don't overwrite our original values along
the way.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Timothy Arceri e4e5beee8a nir/gcm: be more conservative about moving instructions from loops
Here we only pull instructions further up control flow if they are
constant or texture instructions. See the code comment for more
information.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Timothy Arceri bf4a6c99d2 nir/gcm: allow derivative dependent intrinisics to be moved earlier
We can't move them later as we could move them into non-uniform
control flow, but moving them earlier should be fine.

This helps avoid a bunch of spilling in unigine shaders due to
moving the tex instructions sources earlier (outside if branches)
but not the instruction itself.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Jason Ekstrand 50a6dd0d65 nir/gcm: Prefer the instruction's original block
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Jason Ekstrand d4cf2df01a nir/gcm: Delete dead instructions
Classically, global code motion is also a dead code pass.  However, in
the initial implementation, the decision was made to place every
instruction and let conventional DCE clean up the dead ones.  Because
any uses of a dead instruction are unreachable, we have no late block
and the dead instructions are always scheduled early.  The problem is
that, because we place the dead instruction early, it  pushes the
placement of any dependencies of the dead instruction earlier than they
may need to be placed.  In order prevent dead instructions from
affecting the placement of live ones, we need to delete them.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Jason Ekstrand dca3f351e5 nir/gcm: Add a real concept of "progress"
Now that the GCM pass is more conservative and only moves instructions
to different blocks when it's advantageous to do so, we can have a
proper notion of what it means to make progress.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Jason Ekstrand 5b1615fdb7 nir/gcm: Move block choosing into a helper function
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Jason Ekstrand 1f60f1aa3d nir/gcm: Use an array for storing the early block
We are about to adjust our instruction block assignment algorithm and we
will want to know the current block that the instruction lives in.  In
order to allow for this, we can't overwrite nir_instr::block in the
early scheduling pass.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Jason Ekstrand 6006a9e275 nir/gcm: Loop over blocks in pin_instructions
Now that we have the new block iterators, we can simplify things a bit.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Jason Ekstrand 4d083b52c0 nir/dominance: Better handle unreachable blocks
v2: Fix minor comments (Ken)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636>
2020-04-20 03:46:29 +00:00
Arcady Goldmints-Orlov ec1b96fdc8 nir: Lower returns correctly inside nested loops
Inside nested flow control, nir_lower_returns inserts predicated breaks
in the outer block. However, it would omit doing this if the remainder
of the outer block (after the inner block) was empty. This is not
correct in the case of loops, as execution just wraps back around to the
start of the loop, so this change doesn't skip the predication inside
loops.

Fixes: 79dec93ead (nir: Add return lowering pass)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2724

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4603>
2020-04-19 02:54:08 +00:00
Timothy Arceri c19ebca308 nir: add matrix_layout to nir_variable data
This will be used by the following patch.

Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4623>
2020-04-18 11:50:44 +00:00
Jason Ekstrand f5deed138a spirv,nir: Move the SPIR-V vector insert code to NIR
This also makes spirv_to_nir a bit simpler because the new
nir_vector_insert helper automatically handles a constant component
selector like nir_vector_extract does.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4495>
2020-04-17 19:21:44 +00:00
Jason Ekstrand acaccff4d3 nir/builder: Handle any bit-size selector in nir_extract
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4495>
2020-04-17 19:21:44 +00:00
Jason Ekstrand 9b17d7caac nir: Add some sanity assertions in opt_large_constants
We make some assumptions in opt_large_constants such as the size_align
function returning the obvious sizes for vectors.  Now that we've got
the deref_size lying around, we may as well assert it's consistent with
our assumptions.  In particular, we now assert that it really claims
booleans are 32-bit.  If anyone's driver ever decides to be clever and
change this, we'll now catch the breakage earlier.

Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
2020-04-16 17:00:13 +00:00
Jason Ekstrand 33eb43349e nir: Add an alignment to nir_intrinsic_load_constant
In f1883cc73d we tried to pass through alignments from load_constant
intrinsics when rewriting them to load_ubo in iris.  However, those
intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices.  It's easy
enough to add them.  We just call the size/align function on the vector
type at the end of our deref chain and use the alignment returned from
there.  It's possible we could do better by walking the whole deref
chain but this should be good enough.

Fixes: f1883cc73d "iris: Set alignments on cbuf0 and constant reads"
Closes: #2739
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468>
2020-04-16 17:00:13 +00:00
Connor Abbott abcfb64370 ir3: Fix LDC offset units
I had missed that LDC actually uses vec4 units for its offset. This
means that we have to create a new instruction, and lower it in
ir3_nir_lower_io_offsets, similar to the existing SSBO instructions.
Unfortunately we can't assume that loads are always vec4-aligned, so we
have to use the alignment information that NIR gives us. Unfortunately,
it's currently woefully inadequate, and will have to be fixed to give us
good codegen in the future.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4568>
2020-04-15 22:38:20 +00:00
Connor Abbott 274f3815a5 ir3: Plumb through bindless support
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358>
2020-04-09 15:56:55 +00:00
Timothy Arceri 52c8bc0130 nir: make opt_if_loop_terminator() less strict
nir_cf_{extract,reinsert}() can't stitch a block together
if the block we are extracting ends in a jump but other jumps
nested in further ifs should be fine to move.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4477>
2020-04-08 01:35:45 +00:00
Caio Marcelo de Oliveira Filho 5dc85abc4f nir: Add per_view attribute to nir_variable
If a nir_variable is tagged with per_view, it must be an array with
size corresponding to the number of views.  For slot-tracking, it is
considered to take just the slot for a single element -- drivers will
take care of expanding this appropriately.

This will be used to implement the ability of having per-view position
in a vertex shader in Intel platforms.

Acked-by: Rafael Antognolli <rafael.antognolli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313>
2020-04-07 17:16:09 +00:00
Rob Clark 57557783f6 nir/lower_amul: fix slot calculation
Fixes incorrect indexing in
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.packed.mat2x3

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455>
2020-04-06 18:00:17 +00:00
Rob Clark 4638a16a93 nir: add some swizzle helpers
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455>
2020-04-06 18:00:17 +00:00
Jason Ekstrand e78a7a1825 nir: Assert memory loads are aligned
We've had alignment parameters on these operations for a while but a
bunch of places weren't setting them.  That should be resolved now so we
can start validating that they're always set.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4441>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4441>
2020-04-06 15:57:30 +00:00
Hyunjun Ko 9f174eb2df nir: fix wrong assignment to buffer in xfb_varyings_info
Tested with dEQP-VK.transform_feedback.fuzz.various_buffers.buffers100_instance_array_vertex

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4459>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4459>
2020-04-06 08:55:05 +00:00
Rob Clark bf64648864 nir: fix definition of imadsh_mix16 for vectors
Fixes: c27b3758fa ("nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4423>
2020-04-04 00:07:10 +00:00
Daniel Schürmann dc69738b0f nir: fix unpack_64_4x16 in lower_alu_to_scalar()
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002>
2020-04-03 23:13:15 +01:00
Jason Ekstrand 36a32af008 nir/load_store_vectorize: Add support for nir_var_mem_global
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
2020-04-03 20:26:54 +00:00
Jason Ekstrand b6273291b5 nir/load_store_vectorize: Use nir_iadd_imm for offsets
This makes it capable of handling 64-bit offsets

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
2020-04-03 20:26:54 +00:00
Jason Ekstrand 04d08ea149 nir/load_store_vectorize: Fix shared atomic info
These were clearly copied and pasted from SSBOs.  The shared atomics
don't have an SSBO index so their offset is src0 and data is src1.

Fixes: ce9205c03b "nir: add a load/store vectorization pass"
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367>
2020-04-03 20:26:54 +00:00
Jason Ekstrand c71c1f44b0 nir/from_ssa: Only chain movs when a src is also a dest
The algorithm we use for resolving parallel copy instructions plays this
little shell game with the values.  The reason for this is that it lets
us handle cases where, for instance we have a -> b and b -> a and we
need to use a temporary to do a swap.  One result of this algorithm is
that it tends to emit a lot of mov chains which are typcially really bad
for GPUs where a mov is far from free.  For instance, it's likely to
turn this:

    r16 = ssa_0; r17 = ssa_0; r18 = ssa_0; r15 = ssa_0

into this:

    r15 = mov ssa_0
    r18 = mov r15
    r17 = mov r18
    r16 = mov r17

which, if it's the only thing in a block (this is common for phis) is
impossible for a scheduler to fix because of the dependencies and you
end up with significant stalling.  If, on the other hand, we only do the
chaining in the actual case where we need to free up a so that it can be
used as a destination, we can emit this:

    r15 = mov ssa_0
    r18 = mov ssa_0
    r17 = mov ssa_0
    r16 = mov ssa_0

which is far nicer to the scheduler.  On Intel, our copy propagation
pass will undo the chain for us so this has no shader-db impact.
However, for less intelligent back-ends, it's probably a lot better.

Reviewed-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412>
2020-04-02 19:06:46 +00:00
Mark Janes 90a8b458ac nir: check shader type before writing to shaderinfo.tess union
If the shader is not a tesselation shader, then writing to the tess
member of the shaderinfo union will overwrite other members and crash.

Closes: #2722
Fixes: f1dd81ae10 ("nir: Collect if shader uses cross-invocation or indirect I/O.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4408>
2020-04-01 20:25:55 +00:00
Ian Romanick b097e326b8 nir/algebraic: Remove a redundant fabs pattern
Made redundant by 5544b2cbbd ("nir/algebraic: Use value range analysis
to eliminate useless unary ops").

No shader-db changes on any Intel platform.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
2020-04-01 00:28:38 +00:00
Ian Romanick af1bc7e0c7 nir/algebraic: Use value range analysis to convert fmax to fsat
This is conceptually similar to the 1-fsat(a) <=> fsat(1-a) rearragement
done in:

3b74790941 ("nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))")

2d259713b7 ("nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all
non-fmul instructions").

Note: this helps the Aztex Ruins shader that was hurt for spills and
fills on Braodwell in the previous commit, but it does not fix the
spills or fills. :(

All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14528985 -> 14526116 (-0.02%)
instructions in affected programs: 477300 -> 474431 (-0.60%)
helped: 2332
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.23 x̃: 1
helped stats (rel) min: 0.07% max: 8.89% x̄: 0.88% x̃: 0.64%
95% mean confidence interval for instructions value: -1.27 -1.19
95% mean confidence interval for instructions %-change: -0.92% -0.85%
Instructions are helped.

total cycles in shared programs: 203723684 -> 203692984 (-0.02%)
cycles in affected programs: 4878847 -> 4848147 (-0.63%)
helped: 1764
HURT: 324
helped stats (abs) min: 1 max: 706 x̄: 22.94 x̃: 17
helped stats (rel) min: <.01% max: 17.75% x̄: 1.94% x̃: 1.66%
HURT stats (abs)   min: 1 max: 400 x̄: 30.15 x̃: 10
HURT stats (rel)   min: <.01% max: 17.76% x̄: 1.91% x̃: 0.69%
95% mean confidence interval for cycles value: -16.55 -12.86
95% mean confidence interval for cycles %-change: -1.44% -1.24%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
2020-04-01 00:28:38 +00:00
Ian Romanick 62795475e8 nir/algebraic: Distribute source modifiers into instructions
There are three main classes of cases that are helped by this change:

1. When the negation is applied to a value being type converted (e.g.,
   float(-x)).  This could possibly also be handled with more clever
   code generation.

2. When the negation is applied to a phi node source (e.g., x = -(...);
   at the end of a basic block).  This was the original case that caught
   my attention while looking at shader-db dumps.

3. When the negation is applied to the source of an instruction that
   cannot have source modifiers.  This includes texture instructions and
   math box instructions on pre-Gen7 platforms (see more details below).

In many these cases the negation can be propagated into the instructions
that generate the value (e.g., -(a*b) = (-a)*b).

In addition to the operations implemtned in this patch, I also tried:

 - frcp - Helped 6 or fewer shaders on Gen7+, and hurt just as many on
   pre-Gen7.  On Gen6 and earlier, frcp is a math box instruction, and
   math box instructions cannot have source modifiers.

   I suspect this is why so many more shaders are helped on Gen6 than on
   Gen5 or Gen7.  Gen6 supports OpenGL 3.3, so a lot more shaders
   compile on it.  A lot of these shaders may have things like cos(-x)
   or rcp(-x) that could result in an explicit negation instruction.

 - bcsel - Hurt a few shaders with none helped.  bcsel operates on
   integer sources, so the fabs or fneg cannot be a source modifier in
   the bcsel itself.

 - Integer instructions - No changes on any Intel platform.

Some notes about the shader-db results below.

 - On Tiger Lake, a single Deus Ex fragment shader is hurt for both
   spills and fills.

 - On Haswell, a different Deus Ex fragment shader is hurt for both
   spills and fills.

 - On GM45, the "LOST: 1" and "GAINED: 1" is a single Left4Dead 2
   (very high graphics settings, lol) fragment shader that upgrades
   from SIMD8 to SIMD16.

v2: Add support for fsign.  Add some patterns that remove redundant
negations and redundant absolute value rather than trying to push them
down the tree.

Tiger Lake
total instructions in shared programs: 17611333 -> 17586465 (-0.14%)
instructions in affected programs: 3033734 -> 3008866 (-0.82%)
helped: 10310
HURT: 632
helped stats (abs) min: 1 max: 35 x̄: 2.61 x̃: 1
helped stats (rel) min: 0.04% max: 16.67% x̄: 1.43% x̃: 1.01%
HURT stats (abs)   min: 1 max: 47 x̄: 3.21 x̃: 2
HURT stats (rel)   min: 0.04% max: 5.08% x̄: 0.88% x̃: 0.63%
95% mean confidence interval for instructions value: -2.33 -2.21
95% mean confidence interval for instructions %-change: -1.32% -1.27%
Instructions are helped.

total cycles in shared programs: 338365223 -> 338262252 (-0.03%)
cycles in affected programs: 125291811 -> 125188840 (-0.08%)
helped: 5224
HURT: 2031
helped stats (abs) min: 1 max: 5670 x̄: 46.73 x̃: 12
helped stats (rel) min: <.01% max: 34.78% x̄: 1.91% x̃: 0.97%
HURT stats (abs)   min: 1 max: 2882 x̄: 69.50 x̃: 14
HURT stats (rel)   min: <.01% max: 44.93% x̄: 2.35% x̃: 0.74%
95% mean confidence interval for cycles value: -18.71 -9.68
95% mean confidence interval for cycles %-change: -0.80% -0.63%
Cycles are helped.

total spills in shared programs: 8942 -> 8946 (0.04%)
spills in affected programs: 8 -> 12 (50.00%)
helped: 0
HURT: 1

total fills in shared programs: 9399 -> 9401 (0.02%)
fills in affected programs: 21 -> 23 (9.52%)
helped: 0
HURT: 1

Ice Lake
total instructions in shared programs: 16124348 -> 16102258 (-0.14%)
instructions in affected programs: 2830928 -> 2808838 (-0.78%)
helped: 11294
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.96 x̃: 1
helped stats (rel) min: 0.07% max: 17.65% x̄: 1.32% x̃: 0.93%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 3.45% max: 4.00% x̄: 3.72% x̃: 3.72%
95% mean confidence interval for instructions value: -1.99 -1.93
95% mean confidence interval for instructions %-change: -1.34% -1.29%
Instructions are helped.

total cycles in shared programs: 335393932 -> 335325794 (-0.02%)
cycles in affected programs: 123834609 -> 123766471 (-0.06%)
helped: 5034
HURT: 2128
helped stats (abs) min: 1 max: 3256 x̄: 43.39 x̃: 11
helped stats (rel) min: <.01% max: 35.79% x̄: 1.98% x̃: 1.00%
HURT stats (abs)   min: 1 max: 2634 x̄: 70.63 x̃: 16
HURT stats (rel)   min: <.01% max: 49.49% x̄: 2.73% x̃: 0.62%
95% mean confidence interval for cycles value: -13.66 -5.37
95% mean confidence interval for cycles %-change: -0.69% -0.48%
Cycles are helped.

LOST:   0
GAINED: 2

Skylake
total instructions in shared programs: 14949240 -> 14927930 (-0.14%)
instructions in affected programs: 2594756 -> 2573446 (-0.82%)
helped: 11000
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.94 x̃: 1
helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76%
95% mean confidence interval for instructions value: -1.97 -1.91
95% mean confidence interval for instructions %-change: -1.42% -1.37%
Instructions are helped.

total cycles in shared programs: 324829346 -> 324821596 (<.01%)
cycles in affected programs: 121566087 -> 121558337 (<.01%)
helped: 4611
HURT: 2147
helped stats (abs) min: 1 max: 3715 x̄: 33.29 x̃: 10
helped stats (rel) min: <.01% max: 36.08% x̄: 1.94% x̃: 1.00%
HURT stats (abs)   min: 1 max: 2551 x̄: 67.88 x̃: 16
HURT stats (rel)   min: <.01% max: 53.79% x̄: 3.69% x̃: 0.89%
95% mean confidence interval for cycles value: -4.25 1.96
95% mean confidence interval for cycles %-change: -0.28% -0.02%
Inconclusive result (value mean confidence interval includes 0).

Broadwell
total instructions in shared programs: 14971203 -> 14949957 (-0.14%)
instructions in affected programs: 2635699 -> 2614453 (-0.81%)
helped: 10982
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.93 x̃: 1
helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76%
95% mean confidence interval for instructions value: -1.97 -1.90
95% mean confidence interval for instructions %-change: -1.42% -1.37%
Instructions are helped.

total cycles in shared programs: 336215033 -> 336086458 (-0.04%)
cycles in affected programs: 127383198 -> 127254623 (-0.10%)
helped: 4884
HURT: 1963
helped stats (abs) min: 1 max: 25696 x̄: 51.78 x̃: 12
helped stats (rel) min: <.01% max: 58.28% x̄: 2.00% x̃: 1.05%
HURT stats (abs)   min: 1 max: 3401 x̄: 63.33 x̃: 16
HURT stats (rel)   min: <.01% max: 39.95% x̄: 2.20% x̃: 0.70%
95% mean confidence interval for cycles value: -29.99 -7.57
95% mean confidence interval for cycles %-change: -0.89% -0.71%
Cycles are helped.

total fills in shared programs: 24905 -> 24901 (-0.02%)
fills in affected programs: 117 -> 113 (-3.42%)
helped: 4
HURT: 0

LOST:   0
GAINED: 16

Haswell
total instructions in shared programs: 13148927 -> 13131528 (-0.13%)
instructions in affected programs: 2220941 -> 2203542 (-0.78%)
helped: 8017
HURT: 4
helped stats (abs) min: 1 max: 12 x̄: 2.17 x̃: 1
helped stats (rel) min: 0.07% max: 15.25% x̄: 1.40% x̃: 0.93%
HURT stats (abs)   min: 1 max: 7 x̄: 2.50 x̃: 1
HURT stats (rel)   min: 0.33% max: 4.76% x̄: 2.73% x̃: 2.91%
95% mean confidence interval for instructions value: -2.21 -2.13
95% mean confidence interval for instructions %-change: -1.43% -1.37%
Instructions are helped.

total cycles in shared programs: 321221791 -> 321079870 (-0.04%)
cycles in affected programs: 126886055 -> 126744134 (-0.11%)
helped: 4674
HURT: 1729
helped stats (abs) min: 1 max: 23654 x̄: 56.47 x̃: 16
helped stats (rel) min: <.01% max: 53.22% x̄: 2.13% x̃: 1.05%
HURT stats (abs)   min: 1 max: 3694 x̄: 70.58 x̃: 18
HURT stats (rel)   min: <.01% max: 63.06% x̄: 2.48% x̃: 0.90%
95% mean confidence interval for cycles value: -33.31 -11.02
95% mean confidence interval for cycles %-change: -0.99% -0.78%
Cycles are helped.

total spills in shared programs: 19872 -> 19874 (0.01%)
spills in affected programs: 21 -> 23 (9.52%)
helped: 0
HURT: 1

total fills in shared programs: 20941 -> 20941 (0.00%)
fills in affected programs: 62 -> 62 (0.00%)
helped: 1
HURT: 1

LOST:   0
GAINED: 8

Ivy Bridge
total instructions in shared programs: 11875553 -> 11853839 (-0.18%)
instructions in affected programs: 1553112 -> 1531398 (-1.40%)
helped: 7304
HURT: 3
helped stats (abs) min: 1 max: 16 x̄: 2.97 x̃: 2
helped stats (rel) min: 0.07% max: 15.25% x̄: 1.62% x̃: 1.15%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.05% max: 3.33% x̄: 2.44% x̃: 2.94%
95% mean confidence interval for instructions value: -3.04 -2.90
95% mean confidence interval for instructions %-change: -1.65% -1.59%
Instructions are helped.

total cycles in shared programs: 178246425 -> 178184484 (-0.03%)
cycles in affected programs: 13702146 -> 13640205 (-0.45%)
helped: 4409
HURT: 1566
helped stats (abs) min: 1 max: 531 x̄: 24.52 x̃: 13
helped stats (rel) min: <.01% max: 38.67% x̄: 2.14% x̃: 1.02%
HURT stats (abs)   min: 1 max: 356 x̄: 29.48 x̃: 10
HURT stats (rel)   min: <.01% max: 64.73% x̄: 1.87% x̃: 0.70%
95% mean confidence interval for cycles value: -11.60 -9.14
95% mean confidence interval for cycles %-change: -1.19% -0.99%
Cycles are helped.

LOST:   0
GAINED: 10

Sandy Bridge
total instructions in shared programs: 10695740 -> 10667483 (-0.26%)
instructions in affected programs: 2337607 -> 2309350 (-1.21%)
helped: 10720
HURT: 1
helped stats (abs) min: 1 max: 49 x̄: 2.64 x̃: 2
helped stats (rel) min: 0.07% max: 20.00% x̄: 1.54% x̃: 1.13%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 1.04% max: 1.04% x̄: 1.04% x̃: 1.04%
95% mean confidence interval for instructions value: -2.69 -2.58
95% mean confidence interval for instructions %-change: -1.57% -1.51%
Instructions are helped.

total cycles in shared programs: 153478839 -> 153416223 (-0.04%)
cycles in affected programs: 22050900 -> 21988284 (-0.28%)
helped: 5342
HURT: 2200
helped stats (abs) min: 1 max: 1020 x̄: 20.34 x̃: 16
helped stats (rel) min: <.01% max: 24.05% x̄: 1.51% x̃: 0.86%
HURT stats (abs)   min: 1 max: 335 x̄: 20.93 x̃: 6
HURT stats (rel)   min: <.01% max: 20.18% x̄: 1.03% x̃: 0.30%
95% mean confidence interval for cycles value: -9.18 -7.42
95% mean confidence interval for cycles %-change: -0.82% -0.71%
Cycles are helped.

Iron Lake
total instructions in shared programs: 8114882 -> 8105574 (-0.11%)
instructions in affected programs: 1232504 -> 1223196 (-0.76%)
helped: 4109
HURT: 2
helped stats (abs) min: 1 max: 6 x̄: 2.27 x̃: 1
helped stats (rel) min: 0.05% max: 8.33% x̄: 0.99% x̃: 0.66%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.94% max: 4.35% x̄: 2.65% x̃: 2.65%
95% mean confidence interval for instructions value: -2.31 -2.21
95% mean confidence interval for instructions %-change: -1.01% -0.96%
Instructions are helped.

total cycles in shared programs: 188504036 -> 188466296 (-0.02%)
cycles in affected programs: 31203798 -> 31166058 (-0.12%)
helped: 3447
HURT: 36
helped stats (abs) min: 2 max: 92 x̄: 11.03 x̃: 8
helped stats (rel) min: <.01% max: 5.41% x̄: 0.21% x̃: 0.13%
HURT stats (abs)   min: 2 max: 30 x̄: 7.33 x̃: 6
HURT stats (rel)   min: 0.01% max: 1.65% x̄: 0.18% x̃: 0.10%
95% mean confidence interval for cycles value: -11.16 -10.51
95% mean confidence interval for cycles %-change: -0.22% -0.20%
Cycles are helped.

LOST:   0
GAINED: 1

GM45
total instructions in shared programs: 4989697 -> 4984531 (-0.10%)
instructions in affected programs: 703952 -> 698786 (-0.73%)
helped: 2493
HURT: 2
helped stats (abs) min: 1 max: 6 x̄: 2.07 x̃: 1
helped stats (rel) min: 0.05% max: 8.33% x̄: 1.03% x̃: 0.66%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.95% max: 4.35% x̄: 2.65% x̃: 2.65%
95% mean confidence interval for instructions value: -2.13 -2.01
95% mean confidence interval for instructions %-change: -1.07% -0.99%
Instructions are helped.

total cycles in shared programs: 128929136 -> 128903886 (-0.02%)
cycles in affected programs: 21583096 -> 21557846 (-0.12%)
helped: 2214
HURT: 17
helped stats (abs) min: 2 max: 92 x̄: 11.44 x̃: 8
helped stats (rel) min: <.01% max: 5.41% x̄: 0.24% x̃: 0.13%
HURT stats (abs)   min: 2 max: 8 x̄: 4.24 x̃: 4
HURT stats (rel)   min: 0.01% max: 1.65% x̄: 0.20% x̃: 0.09%
95% mean confidence interval for cycles value: -11.75 -10.88
95% mean confidence interval for cycles %-change: -0.25% -0.22%
Cycles are helped.

LOST:   1
GAINED: 1

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
2020-04-01 00:28:38 +00:00