Commit Graph

81585 Commits

Author SHA1 Message Date
Jason Ekstrand ca3b4d6d17 nir/opt_peephole_ffma: Fix a couple typos in a comment
Acked-by: Matt Turner <mattst88@gmail.com>
2015-04-02 11:09:37 -07:00
Ilia Mirkin 4609ba6ea3 mesa: add ARB_depth_buffer_float to ES3.0 required extension list
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-02 13:35:18 -04:00
Eric Anholt a9152376b4 vc4: Add support for nir_iabs.
Tested using the GLSL 1.30 tests for integer abs().  Not currently used,
but it was one of the new opcodes used by robclark's idiv lowering.
2015-04-02 10:32:35 -07:00
Jason Ekstrand e50cf5faa5 i965/generator: Get rid of the ! in the unreachable statement
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-02 10:21:18 -07:00
Jason Ekstrand 0573d0e484 nir/print: Correctly print swizzles for explicitly sized alu sources
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-02 10:21:18 -07:00
Ilia Mirkin 4a3c0e9950 freedreno/a3xx: add MRT support
The hardware only supports 4 MRTs. It should be possible to emulate
support for 8, but doesn't seem worth the trouble.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin 6f4c1976f4 freedreno: convert blit program to array for each number of rts
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin d9992ab35a freedreno: add support for laying out MRTs in gmem
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin 602bc6c88d freedreno: add core infrastructure support for MRTs
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin d13803c76f freedreno/ir3: add support for FS_COLOR0_WRITES_ALL_CBUFS property
This will enable the driver to tell which regids to link up to which
MRT outputs.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin f27ec59084 freedreno/a3xx: add independent blend function support
This is needed for MRT support

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Ilia Mirkin 8efa3e340d freedreno: remove alpha key from ir3_shader
This complication is unnecessary and makes MRTs more complicated and
likely to generate tons of variants.

Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-02 00:09:14 -04:00
Stéphane Marchesin 70eed78cac i915g: Implement EGL_EXT_image_dma_buf_import
This adds all the plumbing to get EGL_EXT_image_dma_buf_import in
i915g.

Signed-off-by: Stéphane Marchesin <marcheu@chromium.org>
2015-04-01 20:13:37 -07:00
Matt Turner a03d0ba78f i965/fs: Relax type check in cmod propagation.
The thing we want to avoid is int/float comparisons, but int/unsigned
comparisons with 0 are equivalent.

total instructions in shared programs: 6194829 -> 6193996 (-0.01%)
instructions in affected programs:     117192 -> 116359 (-0.71%)
helped:                                471

Reviewed-by: Jason Ekstrand <jason.ekstrand@intel.com>
2015-04-01 13:43:57 -07:00
Matt Turner 781badee7a nir: Remove useless ftrunc inside f2i/f2u.
No shader-db changes, probably because they're all removed by the GLSL
compiler optimization added in commit 69ad5fd4.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 97e6c1b957 nir: Recognize (a < b || a < c) as a < max(b, c).
Doesn't work for analogous && cases, because of NaNs.

total instructions in shared programs: 6195712 -> 6194829 (-0.01%)
instructions in affected programs:     42000 -> 41117 (-2.10%)
helped:                                403

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner a2b6e908cf nir: Add addition/multiplication identities of exp/log.
instructions in affected programs:     2858 -> 2808 (-1.75%)
helped:                                12

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 099c729b4c nir: Add identities for the log function.
The rcp(log(x)) pattern affects instruction counts.

instructions in affected programs:     144 -> 138 (-4.17%)
helped:                                6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner 8a6ae384b2 nir: Add identities for the exponential function.
No changes in shader-db.

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner e26783d445 nir: Recognize another open coded lrp.
total instructions in shared programs: 6195924 -> 6195768 (-0.00%)
instructions in affected programs:     4876 -> 4720 (-3.20%)
helped:                                58
HURT:                                  10

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Matt Turner e82437e141 nir: Recognize open coded lrp.
total instructions in shared programs: 6197614 -> 6195924 (-0.03%)
instructions in affected programs:     34773 -> 33083 (-4.86%)
helped:                                147
HURT:                                  6

Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:43:57 -07:00
Kenneth Graunke 25e214db00 nir: Use _mesa_flsll(InputsRead) in prog->nir.
InputsRead is a 64-bit bitfield.  Using _mesa_fls would silently
truncate off the high bits, claiming inputs 32..56 (VARYING_SLOT_MAX)
were never read.

Using <= here was a hack I threw in at the last minute to fix programs
which happened to use input slot 32.  Switch back to using < now that
the underlying problem is fixed.

Fixes crashes in "Euro Truck Simulator 2" when using prog->nir, which
uses input slot 33.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 13:30:13 -07:00
Kenneth Graunke 3d166b313d mesa: Implement _mesa_flsll().
This is _mesa_fls() for 64-bit values.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 13:30:13 -07:00
Kenneth Graunke 4b38c5c783 nir: In prog->nir, don't wrap dot products with ptn_channel(..., X).
ptn_move_dest and nir_fadd already take care of replicating the last
channel out, so we can just use a scalar and skip splatting it.

Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eric Anholt <eric@anholt.net>
2015-04-01 13:30:13 -07:00
Jason Ekstrand 218e45e2f7 i965: Use the same nir options for all gens
If we tell NIR to split ffma's, then we don't need seperate options
anymore.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand b9d7454571 i965/nir: Run DCE again before going out of SSA
We run lowering and optimization passes that might leave garbage lying
around. This keeps the FS cse from having to clean it up.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand 37703040a1 i965/nir: Run the ffma peephole after the rest of the optimizations
The idea here is that fusing multiply-add combinations too early can reduce
our ability to perform CSE and value-numbering.  Instead, we split ffma
opcodes up-front, hope CSE cleans up, and then fuse after-the-fact.
Unless an algebraic pass does something silly where it inserts something
between the multiply and the add, splitting and re-fusing should never
cause a problem.  We run the late algebraic optimizations after this so
that things like compare-with-zero don't hurt our ability to fuse things.

shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4390538 -> 4379236 (-0.26%)
instructions in affected programs:     989359 -> 978057 (-1.14%)
helped:                                5308
HURT:                                  97
GAINED:                                78
LOST:                                  5

This does, unfortunately, cause some substantial hurt to a shader in Kerbal
Space Program.  However, the damage is caused by changing a single
instruction from a ffma to an add.  This, in turn, *decreases* register
pressure in one part of the program causing it to fail to register allocate
and spill.  Given the overwhelmingly positive results in other shaders and
the fact that the NIR for the Kerbal shaders is actually better, this
should be considered a positive.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand 7f344721b1 nir/peephole_ffma: Be less agressive about fusing multiply-adds
shader-db results for fragment shaders on Haswell:
total instructions in shared programs: 4395688 -> 4389623 (-0.14%)
instructions in affected programs:     355876 -> 349811 (-1.70%)
helped:                                1455
HURT:                                  14
GAINED:                                5
LOST:                                  0

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand a8c8b3b872 nir: Add a dedicated ffma peephole optimization
i965/nir: Use the dedicated ffma peephole

total instructions in shared programs: 4418748 -> 4394618 (-0.55%)
instructions in affected programs:     1292790 -> 1268660 (-1.87%)
helped:                                5999
HURT:                                  457
GAINED:                                4
LOST:                                  9

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:04 -07:00
Jason Ekstrand e06a3d0282 nir: Move the compare-with-zero optimizations to the late section
total instructions in shared programs: 4422307 -> 4422363 (0.00%)
instructions in affected programs:     4230 -> 4286 (1.32%)
helped:                                0
HURT:                                  12

While this does hurt some things, the losses are minor and it prevents the
compare-with-zero optimization from fighting with ffma which is much more
important.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand da294f9b2f nir/algebraic: Add a seperate section for "late" optimizations
i965/nir: Use the late optimizations

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand 1779dc060f nir/algebraic: Remove a duplicate optimization
This optimization is repeated verbatim above

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand 22ee7eeb4e nir/algebraic: #define around structure definitions
Previously, we couldn't generate two algebraic passes in the same file
because of multiple structure definitions.  To solve this, we play the
age-old header file trick and just #define around it.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 12:51:03 -07:00
Jason Ekstrand 793a94d6b5 nir/print: Don't print extra swizzzle components
Previously, NIR would just print 4 swizzle components if the swizzle was
anything other than foo.xyzw.  This creates lots of noise if, for example,
you have a one-component element with a swizzle of foo.xxxx.

Reviewed-by: Kenneth Grunke <kenneth@whitecape.org>
2015-04-01 12:49:49 -07:00
Emil Velikov d99135b2e9 configure: nuke --with-max-{width,height}
Unused as of commit 630ab0d27ba(mesa: remove last of MAX_WIDTH,
MAX_HEIGHT). Update all the remaining references to the defines.

v2: Use the correct variable name in the comments

Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Brian Paul <brianp@vmware.com>
2015-04-01 19:43:34 +00:00
Emil Velikov bd4925c6ac gallium: ship tgsi_to_nir.h in the tarball
Acked-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
2015-04-01 19:33:37 +00:00
Emil Velikov 4008975e6f configure.ac: error out if python/mako is not found when required
In case of using a distribution tarball (or a dirty git tree) one can
have the generated sources locally. Make configure.ac error out
otherwise, to alert that about the unmet requirement(s) of python/mako.

v2: Check only for a single file for each dependency.

Suggested-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Emil Velikov <emil.l.velikov@gmail.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
2015-04-01 19:33:37 +00:00
Matt Turner 3384179faa glsl: Make sure not to dereference NULL.
Found by Coverity.
2015-04-01 12:25:29 -07:00
Laura Ekstrand 142909f19d main: create_buffers unlocks mutex when throwing OUT_OF_MEMORY.
Ilia Mirkin found that I had forgotten to free the mutex in the error case.

Reviewed-by: Ilia Mirkin <imirkin@alum.mit.edu>
2015-04-01 12:07:28 -07:00
Jose Fonseca 3321724c10 automake,scons: Put NIR source files in a separate var to fix SCons build.
SCons does not build NIR yet.

Trivial.
2015-04-01 19:49:09 +01:00
Jose Fonseca 7f0682cebf automake: Fix out-of-source builds.
Add include path for generated nir_opcodes.h.

Trivial.
2015-04-01 19:48:09 +01:00
Brian Paul 1625d7a87a mesa: don't include colormac.h in format code
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Brian Paul 2768a0b1b4 mesa: remove unneeded #include of colormac.h
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Brian Paul f1d55017d7 tnl: remove unneeded #include of colormac.h
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Brian Paul 8ac9407a83 swrast: remove unneeded #include of colormac.h
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Brian Paul 2ad8af1a0c mesa: remove unused macros from colormac.h
Acked-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
2015-04-01 12:04:28 -06:00
Eric Anholt 15b03b7964 nir: Recognize a pattern of bool frobbing from TGSI KILL_IF.
TGSI's conditional discards take float arg and negate it, so GLSL to TGSI
generates a b2f and negates that value.  Only, in NIR we want a proper
bool once again, so we compare with 0.  This is a lot of pointless extra
instructions.

total instructions in shared programs: 39735 -> 39702 (-0.08%)
instructions in affected programs:     1342 -> 1309 (-2.46%)

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Eric Anholt 6e8d4a2f80 nir: Recognize a pattern for doing b2f without the opcode.
Since we have patterns based on b2f, generate them if we see the b2f
equivalent using an iand.  This is common when generating NIR from TGSI.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2015-04-01 10:57:01 -07:00
Eric Anholt 26261bca21 vc4: Add shader-db dumping of NIR instruction count.
I was previously using temporary disables of VC4 optimization to show the
benefits of improved NIR optimization, but this can get me quick and dirty
numbers for NIR-only improvements without having to add hacks to disable
VC4's code (disabling of which might hide ways that the NIR changes would
hurt actual VC4 codegen).
2015-04-01 10:57:01 -07:00
Eric Anholt 73e2d4837d vc4: Convert to consuming NIR.
NIR brings us better optimization than I would have bothered to write
within the driver, developers sharing future optimization work, and the
ability to share device-specific lowering code that we and other
GLES2-level drivers need.

total uniforms in shared programs: 13421 -> 13422 (0.01%)
uniforms in affected programs:     62 -> 63 (1.61%)
total instructions in shared programs: 39961 -> 39707 (-0.64%)
instructions in affected programs:     15494 -> 15240 (-1.64%)

v2: Add missing imov support, and assert that there are no dest saturates.
v3: Rebase on the target-specific algebraic series.
v4: Rebase on gallium-includes-from-NIR changes in mater.
v5: Rebase on variables being in lists instead of hash tables.
v6: Squash in intermediate changes that used the NIR-to-TGSI pass (which
    I'm not committing)
2015-04-01 10:57:01 -07:00