Commit Graph

92 Commits

Author SHA1 Message Date
Jason Ekstrand eb93d6dec8 nir/search: Don't match inexact expressions with exact subexpressions
In the first pass of implementing exact handling, I made a mistake with
search-and-replace.  In particular, we only reallly handled exact/inexact
on the root of the tree.  Instead, we need to check every node in the tree
for an exact/inexact match.  As an example of this, consider the following
GLSL code

precise float a = b + c;
if (a < 0) {
   do_stuff();
}

In that case, only the add will be declared "exact" and an expression that
looks for "b + c < 0" will still match and replace it with "b < -c" which
may yield different results.  The solution is to simply bail if any of the
values are exact when matching an inexact expression.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
2016-04-04 13:48:10 -07:00
Jason Ekstrand de60e250f5 nir: Add an opcode for stomping a 32-bit value to 16-bit precision
This correlates directly to the SPIR-V opcode OpQuantizeToF16

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-04-01 13:52:28 -07:00
Ian Romanick 08ff5f4d1f nir: Simplify a bcsel to logical-or
Oddly, this did not affect the shader where I first noticed the pattern.
That particular shader doesn't get its if-statement converted to a bcsel
because there are two assignments in the else-statement.  This led to me
submitting https://bugs.freedesktop.org/show_bug.cgi?id=94747.

shader-db results:

Sandy Bridge
total instructions in shared programs: 8467384 -> 8467069 (-0.00%)
instructions in affected programs: 36594 -> 36279 (-0.86%)
helped: 46
HURT: 0

total cycles in shared programs: 117573448 -> 117568518 (-0.00%)
cycles in affected programs: 339114 -> 334184 (-1.45%)
helped: 46
HURT: 0

Ivy Bridge / Haswell / Broadwell / Skylake:
total instructions in shared programs: 7774258 -> 7773999 (-0.00%)
instructions in affected programs: 30874 -> 30615 (-0.84%)
helped: 46
HURT: 0

total cycles in shared programs: 65739190 -> 65734530 (-0.01%)
cycles in affected programs: 180380 -> 175720 (-2.58%)
helped: 45
HURT: 1

No change on G45 or Ironlake.

I also tried these expressions, but none of them affected any shaders in
shader-db:

   (('bcsel', a, 'a@bool', 'b@bool'), ('ior', a, b)),
   (('bcsel', a, 'b@bool', False),    ('iand', a, b)),
   (('bcsel', a, 'b@bool', 'a@bool'), ('iand', a, b)),

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2016-03-31 14:59:36 -07:00
Matt Turner 05ee6627d6 nir: Fix typo from commit 6702f1acde. 2016-03-30 19:18:35 -07:00
Matt Turner 6702f1acde nir: Propagate negates up multiplication chains.
total instructions in shared programs: 7112159 -> 7088092 (-0.34%)
instructions in affected programs: 1374915 -> 1350848 (-1.75%)
helped: 7392
HURT: 621

GAINED: 2
LOST:   2
2016-03-30 13:12:34 -07:00
Jason Ekstrand 35e2e96b30 nir: Add a helper for getting the current block from a cursor
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand be98c47528 nir/lower_out_to_temp: Add an "entrypoint" parameter
Previously, the pass assumed that the entrypoint would be whatever function
happened to have the name "main".  We really shouldn't trust in the
function names.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand 31a5bec93f nir/lower_out_to_temp: Steal the output's constant initializer
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand 38de85f9a5 nir: Add a helper for getting the unique function in a shader
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand 49be812be6 nir/sweep: Sweep function parameters
They are no longer in the list of local variables so we need to explicitly
sweep them.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand 1be4c61c95 nir/builder: Add a helper for creating undefs
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand 6a2479d618 nir/builder: Add a helper for storing to variable derefs
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand 77e2ac1da7 nir/builder: Add a helper for building fdot instructions
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand da422663a6 nir: Add a variable_foreach_safe helper
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand 731870fbe3 nir/Makefile: Fix alphabetization
Reviewed-by: Rob Clark <robdclark@gmail.com>
2016-03-28 18:32:48 -07:00
Jason Ekstrand 22b343a8ec nir: Add a pass to inline functions
This commit adds a new NIR pass that lowers all function calls away by
inlining the functions.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand debf23ec68 nir/builder: Add helpers for easily inserting copy_var intrinsics
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 79dec93ead nir: Add return lowering pass
This commit adds a NIR pass for lowering away returns in functions.  If the
return is in a loop, it is lowered to a break.  If it is not in a loop,
it's lowered away by moving/deleting code as needed.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 8d61d72524 nir: Add a cursor helper for getting a cursor after any phi nodes
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 18b0166749 nir/builder: Add a helper for inserting jump instructions
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 97b663481c nir/cf: Make extracting or re-inserting nothing a no-op
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 7022a673cd nir: Add a function for comparing cursors
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 124f229ece nir/cf: Handle relinking top-level blocks
This can happen if a function ends in a return instruction and you remove
the return.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 364212f1ed nir: Add a pass to repair SSA form
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand ea98d415e4 nir/vars_to_ssa: Use the new nir_phi_builder helper
The efficiency should be approximately the same.  We do a little more work
per phi node because we have to sort the predecessors.  However, we no
longer have to walk the blocks a second time to pop things off the stack.
The bigger advantage, however, is that we can now re-use the phi placement
and per-block SSA value tracking in other passes.

As a side-benifit, the phi builder actually handles unreachable blocks
correctly.  The original vars_to_ssa code, because of the way it iterated
the blocks and added phi sources, didn't add sources corresponding to
predecessors of unreachable blocks.  The new strategy employed by the phi
builder creates a phi source for each predecessor and should correctly
handle unreachable blocks by setting those sources to SSA undefs.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand 42ddfc611f nir/dominance: Handle unreachable blocks
Previously, nir_dominance.c didn't properly handle unreachable blocks.
This can happen if, for instance, you have something like this:

loop {
   if (...) {
      break;
   } else {
      break;
   }
}

In this case, the block right after the if statement will be unreachable.
This commit makes two changes to handle this.  First, it removes an assert
and allows block->imm_dom to be null if the block is unreachable.  Second,
it properly skips unreachable blocks in calc_dom_frontier_cb.

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2016-03-24 15:20:44 -07:00
Jason Ekstrand e4dc82cfcf nir: Add a phi node placement helper
Right now, we have phi placement code in two places and there are other
places where it would be nice to be able to do this analysis.  Instead of
repeating it all over the place, this commit adds a helper for placing all
of the needed phi nodes for a value.

v2: Add better documentation

Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
2016-03-24 15:20:44 -07:00
Rob Clark 0bea0e7141 nir: fix dangling ssadef->name ptrs
In many places, the convention is to pass an existing ssadef name ptr
when construction/initializing a new nir_ssa_def.  But that goes badly
(as noticed by garbage in nir_print output) when the original string
gets freed.

Just use ralloc_strdup() instead, and add ralloc_free() in the two
places that would care (not that the strings wouldn't eventually get
freed anyways).

Also fixup the nir_search code which was directly setting ssadef->name
to use the parent instruction as memctx.

Signed-off-by: Rob Clark <robclark@freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2016-03-24 08:30:04 -04:00
Jason Ekstrand a984e44abd nir/glsl: Propagate invariant into NIR alu ops
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:07 -07:00
Jason Ekstrand 91d6272c2b nir/alu_to_scalar: Propagate the "exact" bit
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:06 -07:00
Jason Ekstrand 5f39e3e165 nir/cse: Properly handle nir_ssa_def.exact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:06 -07:00
Jason Ekstrand 0dbda153aa nir/algebraic: Flag inexact optimizations
Many of our optimizations, while great for cutting shaders down to size,
aren't really precision-safe.  This commit tries to flag all of the
inexact floating-point optimizations so they don't get run on values that
are flagged "exact".  It's a bit conservative and maybe flags some safe
optimizations as unsafe but that's better than missing one.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:02 -07:00
Jason Ekstrand ed3a029e80 nir/algebraic: Fix fmin detection to match the spec
The previous transformation got the arguments to fmin backwards.  When NaNs
are involved, the GLSL min/max aren't commutative so it matters.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:28:00 -07:00
Jason Ekstrand 89545b1314 nir/algebraic: Get rid of an invlid fxor optimization
The fxor opcode is required to return 1.0f or 0.0f but the input variable
may not be 1.0f or 0.0f.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:58 -07:00
Jason Ekstrand 3a7cb6534c nir/algebraic: Allow for flagging operations as being inexact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:55 -07:00
Jason Ekstrand a6f25fa7d7 nir/search: Propagate exactness into newly created expressions
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:27:52 -07:00
Jason Ekstrand ded3133d47 nir/builder: Add a flag for setting exact
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:26:34 -07:00
Jason Ekstrand 4ff89377d9 nir: Add an "exact" bit to nir_alu_instr
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
2016-03-23 16:26:34 -07:00
Jason Ekstrand f849f53990 nir/clone: Export nir_variable_clone
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:26:11 -07:00
Jason Ekstrand 5fe8959912 nir/clone: Expose nir_constant_clone
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:26:08 -07:00
Jason Ekstrand c4c373f156 nir: Fix whitespace
Reviewed-by: Rob Clark <robclark@gmail.com>
2016-03-23 15:25:53 -07:00
Ian Romanick d7a25a9def nir: Don't abs slt and friends
No shader-db changes, but this is symmetric with the previous commit.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick 2bb006af68 nir: Don't abs the result of b2f or b2i
In the results below, 2 SIMD16 shaders in Trine are lost.

G4X
total instructions in shared programs: 4012279 -> 4011108 (-0.03%)
instructions in affected programs: 116776 -> 115605 (-1.00%)
helped: 339
HURT: 0

total cycles in shared programs: 84315862 -> 84313584 (-0.00%)
cycles in affected programs: 1767232 -> 1764954 (-0.13%)
helped: 274
HURT: 81

Ironlake
total instructions in shared programs: 6399073 -> 6396998 (-0.03%)
instructions in affected programs: 218050 -> 215975 (-0.95%)
helped: 600
HURT: 0

total cycles in shared programs: 128892088 -> 128888810 (-0.00%)
cycles in affected programs: 2867452 -> 2864174 (-0.11%)
helped: 422
HURT: 137

Sandy Bridge
total instructions in shared programs: 8462174 -> 8460759 (-0.02%)
instructions in affected programs: 178529 -> 177114 (-0.79%)
helped: 596
HURT: 0

total cycles in shared programs: 117542276 -> 117534098 (-0.01%)
cycles in affected programs: 1239166 -> 1230988 (-0.66%)
helped: 369
HURT: 150

Ivy Bridge
total instructions in shared programs: 7775131 -> 7773410 (-0.02%)
instructions in affected programs: 162903 -> 161182 (-1.06%)
helped: 590
HURT: 0

total cycles in shared programs: 65759882 -> 65747268 (-0.02%)
cycles in affected programs: 1004354 -> 991740 (-1.26%)
helped: 467
HURT: 141

Haswell
total instructions in shared programs: 7107786 -> 7106327 (-0.02%)
instructions in affected programs: 140954 -> 139495 (-1.04%)
helped: 590
HURT: 0

total cycles in shared programs: 64668028 -> 64655322 (-0.02%)
cycles in affected programs: 967080 -> 954374 (-1.31%)
helped: 452
HURT: 149

LOST:   2
GAINED: 0

Broadwell
total instructions in shared programs: 8980029 -> 8978287 (-0.02%)
instructions in affected programs: 197232 -> 195490 (-0.88%)
helped: 715
HURT: 0

total cycles in shared programs: 70070448 -> 70055970 (-0.02%)
cycles in affected programs: 975724 -> 961246 (-1.48%)
helped: 471
HURT: 111

LOST:   2
GAINED: 0

Skylake
total instructions in shared programs: 9115178 -> 9113436 (-0.02%)
instructions in affected programs: 203012 -> 201270 (-0.86%)
helped: 715
HURT: 0

total cycles in shared programs: 68848660 -> 68834004 (-0.02%)
cycles in affected programs: 993888 -> 979232 (-1.47%)
helped: 473
HURT: 116

LOST:   2
GAINED: 0

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:48:02 -07:00
Ian Romanick 348e5a71d8 nir: Simplify 0 < fabs(a)
Sandy Bridge / Ivy Bridge / Haswell
total instructions in shared programs: 8462180 -> 8462174 (-0.00%)
instructions in affected programs: 564 -> 558 (-1.06%)
helped: 6
HURT: 0

total cycles in shared programs: 117542462 -> 117542276 (-0.00%)
cycles in affected programs: 9768 -> 9582 (-1.90%)
helped: 12
HURT: 0

Broadwell / Skylake
total instructions in shared programs: 8980833 -> 8980826 (-0.00%)
instructions in affected programs: 626 -> 619 (-1.12%)
helped: 7
HURT: 0

total cycles in shared programs: 70077900 -> 70077714 (-0.00%)
cycles in affected programs: 9378 -> 9192 (-1.98%)
helped: 12
HURT: 0

G45 and Ironlake showed no change.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:47:56 -07:00
Ian Romanick 564a8b8a26 nir: Simplify 0 >= b2f(a)
This also prevented some regressions with other patches in my local
tree.

Broadwell / Skylake
total instructions in shared programs: 8980835 -> 8980833 (-0.00%)
instructions in affected programs: 45 -> 43 (-4.44%)
helped: 1
HURT: 0

total cycles in shared programs: 70077904 -> 70077900 (-0.00%)
cycles in affected programs: 122 -> 118 (-3.28%)
helped: 1
HURT: 0

No changes on earlier platforms.

v2: Modify the comments to look more like a proof.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:44:57 -07:00
Ian Romanick bf0d60aa11 nir: Simplify i2b with negated or abs operand
This enables removing ssa_201 and ssa_202 in sequences like:

                 vec1 ssa_200 = flt ssa_199, ssa_194
                 vec1 ssa_201 = b2i ssa_200
                 vec1 ssa_202 = i2b -ssa_201

shader-db results:

Sandy Bridge
total instructions in shared programs: 8462257 -> 8462180 (-0.00%)
instructions in affected programs: 3846 -> 3769 (-2.00%)
helped: 35
HURT: 0

total cycles in shared programs: 117542934 -> 117542462 (-0.00%)
cycles in affected programs: 20072 -> 19600 (-2.35%)
helped: 20
HURT: 1

Ivy Bridge
total instructions in shared programs: 7775252 -> 7775137 (-0.00%)
instructions in affected programs: 3645 -> 3530 (-3.16%)
helped: 35
HURT: 0

total cycles in shared programs: 65760522 -> 65760068 (-0.00%)
cycles in affected programs: 21082 -> 20628 (-2.15%)
helped: 25
HURT: 2

Haswell
total instructions in shared programs: 7108666 -> 7108589 (-0.00%)
instructions in affected programs: 3253 -> 3176 (-2.37%)
helped: 35
HURT: 0

total cycles in shared programs: 64675726 -> 64675272 (-0.00%)
cycles in affected programs: 21034 -> 20580 (-2.16%)
helped: 26
HURT: 1

Broadwell / Skylake
total instructions in shared programs: 8980912 -> 8980835 (-0.00%)
instructions in affected programs: 3223 -> 3146 (-2.39%)
helped: 35
HURT: 0

total cycles in shared programs: 70077926 -> 70077904 (-0.00%)
cycles in affected programs: 21886 -> 21864 (-0.10%)
helped: 21
HURT: 6

G45 and Ironlake showed no change.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Suggested-by: Jason Ekstrand <jason.ekstrand@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:43:28 -07:00
Ian Romanick a4079f1cb2 nir: Lower flrp with Boolean interpolator to bcsel
On Intel platforms that don't set lower_flrp, using bcsel instead of
flrp seems to be a small amount worse.  On those platforms, the use of
flrp, bcsel, and multiply of b2f is still an active area of research.
In review, Matt suggested this is because bcsel turns into CMP+SEL, and
because of the flag register we can't schedule instructions well.

shader-db results:

G4X / Ironlake
total instructions in shared programs: 4016538 -> 4012279 (-0.11%)
instructions in affected programs: 161556 -> 157297 (-2.64%)
helped: 1077
HURT: 1

total cycles in shared programs: 84328296 -> 84315862 (-0.01%)
cycles in affected programs: 4174570 -> 4162136 (-0.30%)
helped: 926
HURT: 53

Unsurprisingly, no changes on later platforms.

Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2016-03-22 14:42:42 -07:00
Connor Abbott 58fe7837b8 nir: propagate bitsize information in nir_search
When we replace an expresion we have to compute bitsize information for the
replacement. We do this in two passes to validate that bitsize information
is consistent and correct: first we propagate bitsize from child nodes to
parent, then we do it the other way around, starting from the original's
instruction destination bitsize.

v2 (Iago):
- Always use nir_type_bool32 instead of nir_type_bool when generating
  algebraic optimizations. Before we used nir_type_bool32 with constants
  and nir_type_bool with variables.
- Fix bool comparisons in nir_search.c to account for bitsized types.

v3 (Sam):
- Unpack the double constant value as unsigned long long (8 bytes) in
nir_algrebraic.py.

v4 (Sam):
- Use helpers to get type size and base type from nir_alu_type.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:54:45 +01:00
Connor Abbott 3124ce699b nir: add a bit_size parameter to nir_ssa_dest_init
v2: Squash multiple commits addressing the new parameter in different
    files so we don't break the build (Iago)

v3: Fix tgsi (Samuel)

v4: Fix nir_clone.c (Samuel)

v5: Fix vc4 and freedreno (Iago)

v6 (Sam)
- Fix build errors in nir_lower_indirect_derefs
- Use helper to get type size from nir_alu_type.

Signed-off-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Samuel Iglesias Gonsalvez <siglesias@igalia.com>
Tested-by: Rob Clark <robdclark@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
2016-03-17 11:54:45 +01:00
Iago Toral Quiroga 084b24f558 nir: rename nir_const_value fields to include bitsize information
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
2016-03-17 11:16:33 +01:00