Commit Graph

3786 Commits

Author SHA1 Message Date
Neil Roberts 34d4b3e367 glsl: Set default precision on record members
Record types have their own slot to store the precision for each
member in glsl_struct_field. Previously if the member didn’t have an
explicit precision qualifier this was being left as
GLSL_PRECISION_NONE. This patch makes it take into account the type’s
default precision qualifier like it does for regular variables in
apply_type_qualifier_to_variable.

This has the additional benefit of correctly reporting an error when a
float type is used in a struct without declaring the default type.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 09:29:53 +02:00
Neil Roberts 235425771c glsl/linker: Make precision matching optional in intrastage_match
This function is confusingly also used to match interstage interfaces
as well as intrastage. In the interstage case it needs to avoid
comparing the precisions. This patch adds a parameter to specify
whether to take the precision into account or not so that it can be
used for both cases.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 09:29:53 +02:00
Neil Roberts 19b27a8569 glsl/linker: Don’t check precision for shader interface
On GLES, the interface between vertex and fragment shaders doesn’t
need to have matching precision.

Section 4.3.10 of the GLSL ES 3.00 spec:

“The type of vertex outputs and fragment inputs with the same name
 must match, otherwise the link command will fail. The precision does
 not need to match.”

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 09:29:53 +02:00
Neil Roberts 230d1e8d86 compiler/types: Making comparing record precision optional
On GLES, the interface between vertex and fragment shaders doesn’t
need to have matching precision. This adds an extra argument to
glsl_types::record_compare to disable the precision comparison. This
will later be used for the shader interface check.

In order to make this work this patch also adds a helper function to
recursively compare types while ignoring the precision.

v2: Call record_compare from within compare_no_precision to avoid
    duplicating code (Eric Anholt).

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 09:29:53 +02:00
Iago Toral Quiroga 2a2501247b nir: detect more dynamically uniform expressions
Shader-db results for v3d:

total instructions in shared programs: 9132728 -> 9119238 (-0.15%)
instructions in affected programs: 596886 -> 583396 (-2.26%)
helped: 1118
HURT: 224

total threads in shared programs: 234298 -> 234308 (<.01%)
threads in affected programs: 10 -> 20 (100.00%)
helped: 5
HURT: 0

total uniforms in shared programs: 3022949 -> 3022622 (-0.01%)
uniforms in affected programs: 29163 -> 28836 (-1.12%)
helped: 108
HURT: 37

total max-temps in shared programs: 1328030 -> 1327762 (-0.02%)
max-temps in affected programs: 10097 -> 9829 (-2.65%)
helped: 263
HURT: 15

total spills in shared programs: 3793 -> 3777 (-0.42%)
spills in affected programs: 432 -> 416 (-3.70%)
helped: 16
HURT: 0

total fills in shared programs: 4380 -> 4266 (-2.60%)
fills in affected programs: 828 -> 714 (-13.77%)
helped: 16
HURT: 0

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-14 08:00:52 +02:00
Connor Abbott 37b92b0ae6 nir: Don't manually index intrinsic index enum
This fixes a rebase fail in ea51275e07, and prevents it from happening
again. There's no reason to do this manually.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-13 17:10:41 +02:00
Daniel Schürmann 7a858f274c spirv/nir: add support for AMD_shader_ballot and Groups capability
This commit also renames existing AMD capabilities:
 - gcn_shader -> amd_gcn_shader
 - trinary_minmax -> amd_trinary_minmax

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann ea51275e07 nir: add intrinsics for AMD_shader_ballot
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann 1b89ebeede nir/spirv: add support for the SubgroupBallotKHR SPIR-V capability
This capability is required for the VK_EXT_shader_subgroup_ballot extension.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Daniel Schürmann de56ebadce nir/spirv: add support for the SubgroupVoteKHR SPIR-V capability
This capability is required for the VK_EXT_shader_subgroup_vote extension.

Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
2019-06-13 12:44:23 +00:00
Caio Marcelo de Oliveira Filho 2cb5907508 glsl: Check order and uniqueness of interlock functions
With this commit all remaining compilation tests in Piglit for
ARB_fragment_shader_interlock will pass.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2019-06-10 14:29:32 -07:00
Caio Marcelo de Oliveira Filho b7c9fc72fd glsl: Make interlock builtins follow same compiler rules as barriers
Generalize the barrier code to provide correct error messages for
other builtins.

Fixes most of piglit compilation tests for
ARB_fragment_shader_interlock.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
2019-06-10 14:29:26 -07:00
Eduardo Lima Mitev fb2169040a nir/opt_algebraic: Fix rules for imadsh_mix16
The rules added in patch 3addd7c are inverted:

It should be:

(al * bh) << 16 + c

instead of:

(ah * bl) << 16 + c

Fixes a number of regressions under
dEQP-GLES31.functional.draw_indirect.compute_interop.large.*
on Freedreno.

Reviewed-by: Rob Clark <robdclark@gmail.com>
2019-06-10 22:27:46 +02:00
Eric Engestrom 440fe0eb43 nir: fix s/&&/||/ typo
Fixes: cd73b6174b "nir/lower_to_source_mods: Stop turning add, sat, and neg into mov"
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-07 16:06:25 +01:00
Eduardo Lima Mitev 3addd7c8d9 nir_algebraic: Add basic optimizations for umul_low and imadsh_mix16
For umul_low (al * bl), zero is returned if the low 16-bits word of either
source is zero.

for imadsh_mix16 (ah * bl << 16 + c), c is returned if either 'ah' or 'bl'
is zero.

A couple of nir_search_helpers are added:

is_upper_half_zero() returns true if the highest word of all components of
an integer NIR alu src are zero.

is_lower_half_zero() returns true if the lowest word of all components of
an integer nir alu src are zero.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Eduardo Lima Mitev c27b3758fa nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes
'umul_low' is the low 32-bits of unsigned integer multiply. It maps
directly to ir3's MULL_U.

'imadsh_mix16' is multiply add with shift and mix, an ir3 specific
instruction that maps directly to ir3's IMADSH_M16.

Both are necessary for the lowering of integer multiplication on
Freedreno, which will be introduced later in this series.

Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-07 08:45:05 +02:00
Jason Ekstrand 7a18ce0b91 glsl/loop_analysis: Don't search for NULL variables in the hash table
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
2019-06-06 00:27:53 +00:00
Jason Ekstrand d96878a66a nir/propagate_invariant: Don't add NULL vars to the hash table
Fixes: 8410cf66d "nir/propagate_invariant: Skip unknown vars"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
2019-06-06 00:27:53 +00:00
Kenneth Graunke c7d1b52a2c nir: Combine lower_fmod16/32 back into a single lower_fmod.
We originally had a single lower_fmod option.  In commit 2ab2d2e5, Sam
split 32 and 64-bit lowering into separate flags, with the rationale
that some drivers might want different options there.  This left 16-bit
unhandled, so Iago added a lower_fmod16 option in commit ca31df6f.

Now that lower_fmod64 is gone (in favor of nir_lower_doubles and
nir_lower_dmod), we re-combine lower_fmod16 and lower_fmod32 into a
single lower_fmod flag again.  I'm not aware of any hardware which
need lowering for one bitsize and not the other.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Kenneth Graunke edd45af9ba nir: Drop lower_fmod64 option.
nir_lower_doubles offers a wide variety of fp64 lowering, including
lowering fmod@64.  The version there also better handles imprecisions
due to lowered frcp@64.  Let's consolidate on one version.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2019-06-05 16:45:12 -07:00
Jason Ekstrand fe2fc30cb5 nir: Don't replace the nir_shader when NIR_TEST_SERIALIZE=1
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-06-05 20:07:28 +00:00
Jason Ekstrand 9eba6d9a88 nir: Don't replace the nir_shader when NIR_TEST_CLONE=1
Instead, we add a new helper which stomps one nir_shader and replaces it
with another.  The new helper effectively just changes which pointer
gets used for the base nir_shader.  It should be 99% as good at testing
cloning but without requiring that everything handle having the shader
swapped out from under it constantly.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108957
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Rob Clark <robdclark@chromium.org>
2019-06-05 20:07:28 +00:00
Alyssa Rosenzweig d2d3cc66cf nir/algebraic: Simplify max(abs(a), 0.0) -> abs(a)
This pattern was noticed in glmark's jellyfish scene.

v2: Add inexact qualifier due to NaN behaviour.

Minimal shader-db changes (slightly helped).

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Elie Tournier <tournier.elie@gmail.com>
2019-06-04 19:57:19 +00:00
Caio Marcelo de Oliveira Filho d482a8f680 spirv: Update the OpenCL.std.h header
This corresponds to commit 8b911bd2ba37677037b38c9bd286c7c05701bcda on
GitHub.

We previously tweaked OpenCL.std.h from upstream to be included in C
code.  Now upstream header can be included, however the symbol names
are slightly different (include an OpenCLstd_ prefix), so this patch
also fixes vtn_opencl.c to use those.

Reviewed-by: Karol Herbst <kherbst@redhat.com>
2019-06-04 12:12:51 -07:00
Jason Ekstrand 5176805471 spirv: Implement SPV_EXT_fragment_shader_interlock
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-04 17:30:51 +00:00
Jason Ekstrand b5aa76b1df spirv: Update the headers from latest Khronos master
This corresponds to 8b911bd2ba37677037b38c9bd286c7c05701bcda in
https://github.com/KhronosGroup/SPIRV-Headers.

Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-06-04 17:30:51 +00:00
Caio Marcelo de Oliveira Filho 61de825e11 spirv: Like Uniform, do nothing for UniformId
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho b4eff83180 spirv: Implement SpvOpCopyLogical
This is the same as SpvOpCopyObject but without the type checking,
which is how vtn_composite_copy works, so we just need to hook the
operation.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho 81586e9f53 spirv: Generalize OpSelect
SPIR-V 1.4 supports OpSelect over any composite type, and also allows
scalar boolean condition for vector types -- a case which we already
handled to support old GLSLang.

Added a helper function to recursively perform nir_bcsel, that makes
easier to support structs.

v2: Replace asserts() with vtn_fail_if().  (Jason)

v3: Simplify Condition and Result types verifications.  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho 17630291e5 spirv: Move OpSelect handling to a function
This will make a later change easier to review.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:20:54 -07:00
Caio Marcelo de Oliveira Filho ea0e89859c nir/vars_to_ssa: Handle UNDEF_NODE in more places
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110832
Fixes: 911ea2c66f "nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 17:09:22 -07:00
Caio Marcelo de Oliveira Filho 1f8546ba2f spirv: Implement OpPtrEqual, OpPtrNotEqual and OpPtrDiff
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 13:45:09 -07:00
Caio Marcelo de Oliveira Filho ca164ab495 nir: Add functions to subtract and compare addresses
v2: Fix comparing addresses from formats that have more than one
    component by using nir_ball_iequal().  (Jason)

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 13:45:09 -07:00
Caio Marcelo de Oliveira Filho 09cc3389b9 nir: Add nir_ball_iequal() helper
Similar to nir_bany_inequal().  Suggested by Jason.

Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-06-03 13:45:09 -07:00
Jonathan Marek 91672becc3 nir: copy intrinsic type when lowering load input/uniform and store output
Fixes: c1275052 "nir: add type information to load uniform/input and store output intrinsics"

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Erico Nunes <nunes.erico@gmail.com>
Tested-by: Andreas Baierl <ichgeh@imkreisrum.de>
2019-06-03 12:46:14 +00:00
Caio Marcelo de Oliveira Filho 75590604a9 nir: Return nir_type_invalid for non-numeric base types
Now that the type gathering function look at instructions that might
have other types, return invalid type instead of crashing.  That
invalid will be properly ignored later.

Fixes: c12750527b "nir: add type information to load uniform/input and store output intrinsics"
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 16:27:03 -07:00
Jonathan Marek f387c2b238 nir: remove bool lowering from lower_int_to_float
Removes the bool_to_float logic from the int_to_float pass, so that both
can be used separately. By having separate passes we have better validation
and it makes it possible to use with the lower_ftrunc option (int lowering
generates ftrunc, but lower_ftrunc generates bools, ftrunc lowering should
probably be reworked). For now we always expect lower_bool to come after
lower_int.

Also fixes f2i32 to become ftrunc and adds u2f/f2u cases.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek f6579ee204 nir: fix lower_{int,bool}_to_float for new mov opcode
It is treated like the vecN instructions which also have no type.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek f889180ee1 nir: add lower_bitshift option
Add a "lower_bitshift" option, which disables optimizations introducing
bitshifts and lowers ishl by constant to a multiply, so that we don't have
to deal with bitshifts in int_to_float lowering.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek 887c2a6092 nir: fix gather_ssa_types
Consts and undefs can be used as different types (common with "0" constant)
so don't copy types from consts/undefs, only to them. It doesn't entirely
solve the problem that the type given to the const could be wrong , but
now the only realistic case is with "0" which is the same when casted to
float, so it doesn't matter for lower_int_to_float.

The other change is to get type information for load input/uniform and
store output, and use that to get correct results.

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek c12750527b nir: add type information to load uniform/input and store output intrinsics
This type information will be used by gather_ssa_types to get usable results

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Jonathan Marek 6016df211f nir: improvements to native_integers removal
Improvements related to the patch that removed native_integers:
 * In glsl_to_nir, special cases for i2f,u2f,etc are no longer needed
 * In prog_to_nir, use sge/slt and let lower_scmp lower it if needed

Signed-off-by: Jonathan Marek <jonathan@marek.ca>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 21:35:26 +00:00
Connor Abbott 3bd0733011 nir/instr_set: Use _mesa_set_search_or_add()
Before this change, we were searching for each instruction twice, once
when checking if it exists and once when figuring out where to insert
it. By using the new function, we can do everything we need to do in one
operation.

Compilation time numbers for my shader-db database:

Difference at 95.0% confidence
	-4.04706 +/- 0.669508
	-0.922142% +/- 0.151948%
	(Student's t, pooled s = 0.95824)

Reviewed-by: Eric Anholt <eric@anholt.net>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
2019-05-31 19:13:59 +02:00
Ian Romanick 3ee2e84c60 nir: Rematerialize compare instructions
On some architectures, Boolean values used to control conditional
branches or condtional selection must be propagated into a flag.  This
generally means that a stored Boolean value must be compared with zero.
Rather than force the generation of extra compares with zero, re-emit
the original comparison instruction.  This can save register pressure by
not needing to store the Boolean value.

There are several possible ares for future improvement to this pass:

1. Be more conservative.  If both sources to the comparison instruction
are non-constants, it may be better for register pressure to emit the
extra compare.  The current shader-db results on Intel GPUs (next
commit) lead me to believe that this is not currently a problem.

2. Be less conservative.  Currently the pass requires that all users of
the comparison match the pattern.  The idea is that after the pass is
complete, no instruction will use the resulting Boolean value.  The only
uses will be of the flag value.  It may be beneficial to relax this
requirement in some cases.

3. Be less conservative.  Also try to rematerialize comparisons used for
discard_if intrinsics.  After changing the way the Intel compiler
generates cod e for discard_if (see MR!935), I tried implementing this
already.  The changes were pretty small.  Instructions were helped in 19
shaders, but, overall, cycles were hurt.  A commit "nir: Rematerialize
comparisons for nir_intrinsic_discard_if too" is on my fd.o cgit.

4. Copy the preceeding ALU instruction.  If the comparison is a
comparison with zero, and it is the only user of a particular ALU
instruction (e.g., (a+b) != 0.0), it may be a further improvment to also
copy the preceeding ALU instruction.  On Intel GPUs, this may enable
cmod propagation to make additional progress.

v2: Use much simpler method to get the prev_block for an if-statement.
Suggested by Tim.

Reviewed-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Ian Romanick 336eab0630 nir: Add a shallow clone function for nir_alu_instr
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Suggested-by: Matt Turner <mattst88@gmail.com>
2019-05-31 08:47:03 -07:00
Bas Nieuwenhuizen e24a7840f6 nir: Actually propagate progress in nir_opt_move_load_ubo.
Found with Jasons new metadata rework (https://gitlab.freedesktop.org/mesa/mesa/merge_requests/950).

Fixes: af355aaa07 "nir: add nir_opt_move_load_ubo() optimization pass"
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
2019-05-31 07:45:43 +00:00
Jason Ekstrand f1cb3348f1 nir/split_vars: Properly bail in the presence of complex derefs
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand cc59503b16 nir/vars_to_ssa: Properly ignore variables with complex derefs
Because the core principle of the vars_to_ssa pass is that it globally
(within a function) looks at all of the uses of a never-indirected path
and does a full into-SSA on that path, it can't handle a path which has
any chance of having aliasing.  If a function_temp variable has a cast
or anything else which may cause aliasing, we have to assume that all
paths to that variable may alias and ignore the entire variable.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand 911ea2c66f nir/vars_to_ssa: Use a non-null UNDEF_NODE pointer
We're about to change the meaning of get_deref_node returning NULL so we
need a non-NULL value to mean properly undefined.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00
Jason Ekstrand e84194686d nir/deref: Add a has_complex_use helper
This lets passes easily detect derefs which have uses that fall outside
the standard load/store/copy pattern so they can bail appropriately.

Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
2019-05-31 01:08:03 +00:00