Eric Anholt
035fd4fb9f
nir/lower_clip: Fix picking of unused driver locations.
...
This fixes things when the last input/output is a struct or matrix.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4670 >
2020-04-23 18:52:46 +00:00
Eric Anholt
91668ae839
nir/lower_two_sided_color: Fix picking of new driver location.
...
We have shader->num_inputs for "last used input + 1" already, which
respects struct/matrix varyings.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4670 >
2020-04-23 18:52:46 +00:00
Gert Wollny
49ce749d0e
nir: Add umad24 and umul24 opcodes
...
So far only the singed versions are defined.
v2: Make umad24 and umul24 non-driver specific (Eric Anholt)
v3: Take care of nir_builder and automatic lowering of the
opcodes if they are not supported by the backend.
Signed-off-by: Gert Wollny <gert.wollny@collabora.com >
Reviewed-by: Eric Anholt <eric@anholt.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610 >
2020-04-23 18:23:04 +00:00
Gert Wollny
42aa348dad
nir: Add r600 specific intrinsics for tesselation shader IO
...
Signed-off-by: Gert Wollny <gert.wollny@collabora.com >
Reviewed-by: Eric Anholt <eric@anholt.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4610 >
2020-04-23 18:23:04 +00:00
Rhys Perry
32d871b48f
nir/algebraic: don't undo lowering of 8/16-bit comparisons to 32-bit
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4387 >
2020-04-23 10:57:38 +00:00
Rhys Perry
6d79298992
nir/lower_bit_size: fix lowering of {imul,umul}_high
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4387 >
2020-04-23 10:57:38 +00:00
Rhys Perry
715ef95700
nir/lower_bit_size: fix lowering of shifts
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4387 >
2020-04-23 10:57:38 +00:00
Kenneth Graunke
155bb74ea9
nir: Actually do load/store vectorization beyond vec2
...
nir_opt_load_store_vectorize has an is_strided_vector() function that
looks for types with weird explicit strides. It does so by comparing
the explicit stride against the type-size-derived typical stride.
This had a subtle bug. Simple vector types (vec2/3/4) have no explicit
stride, so glsl_get_explicit_stride() returns 0. This never matches the
typical stride for a vector, so is_strided_vector() would return true
for basically any vector type, causing the vectorizer to bail.
I found this by looking at a compute shader with scalar SSBO loads at
offsets 0x220, 0x224, 0x228, 0x22c. nir_opt_load_store_vectorize would
properly vectorize the first two into a vec2 load, but would refuse to
extend it to a vec3 and ultimately vec4 load because is_strided_vector()
saw a vec2 and freaked out.
Neither ACO nor ANV do load/store vectorization before lowering derefs,
so this shouldn't affect them. However, I'd like to fix this bug to
avoid the trap for anyone who decides to in the future. In a branch
where anv used this lowering, this cut an additional 38% of the send
messages in the shader by properly vectorizing more things.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4255 >
2020-04-22 21:22:36 -07:00
Alejandro Piñeiro
9fd180394b
nir: add nir_tex_instr_need_sampler helper
...
That is basically nir_tex_instr sampler_index documentation comment
expressed as a helper.
Reviewed-by: Eric Anholt <eric@anholt.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4677 >
2020-04-22 23:43:18 +02:00
Dylan Baker
8e3696137f
remove final imports.h and imports.c bits
...
This moves the fi_types to a new mesa_private.h and removes the
imports.c file. The vast majority of this patch is just removing
pound includes of imports.h and fixing up the recursive includes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com >
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3024 >
2020-04-21 11:09:04 -07:00
Jason Ekstrand
7c43b8ce1b
nir: Delete the fnoise opcodes
...
As of the previous commit, they are never used.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Reviewed-by: Eric Anholt <eric@anholt.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4624 >
2020-04-21 06:16:13 +00:00
Jonathan Marek
71820c6b02
nir: convert_ycbcr: preserve alpha channel
...
Signed-off-by: Jonathan Marek <jonathan@marek.ca >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Reviewed-by: D Scott Phillips <d.scott.phillips@intel.com >
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528 >
2020-04-20 22:01:43 +00:00
Jonathan Marek
f8558fb1ce
nir: add common convert_ycbcr for vulkan csc
...
Copied from anv, replaced state with passing model/range directly.
Signed-off-by: Jonathan Marek <jonathan@marek.ca >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Acked-by: Jason Ekstrand <jason@jlekstrand.net >
Reviewed-by: D Scott Phillips <d.scott.phillips@intel.com >
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4528 >
2020-04-20 22:01:43 +00:00
Dave Airlie
c2d8a4bf17
nir/linking: fix issue with two compact variables in a row. (v2)
...
If we have a clip dist float[1] compact followed by a tess factor
float[2] we don't want to overlap them, but the partial check
only happens for non-compact vars.
This fixes some issues seen with my sw vulkan layer with
dEQP-VK.clipping.user_defined.clip_distance*
v2: v1 failed with clip/cull mixtures, since in that
case the cull has a location_frac to follow after the clip
so only reset if we get a location_frac of 0 in a subsequent
clip var
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4635 >
2020-04-20 21:08:54 +00:00
Samuel Pitoiset
59427b6d1d
nir/opt_algebraic: lower 64-bit fmin3/fmax3/fmed3
...
This unconditionally lowers 64-bit fmin3/fmax3/fmed3 because
AMD hardware doesn't have native instructions, and no drivers
except RADV uses these instructions.
Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.*.f64.*
with ACO.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4570 >
2020-04-20 06:59:47 +00:00
Samuel Pitoiset
eed0ace466
nir/lower_int64: lower imin3/imax3/umin3/umax3/imed3/umed3
...
Fixes dEQP-VK.spirv_assembly.instruction.amd_trinary_minmax.*.i64.*
with ACO because this backend compiler expects most of the 64-bit
operations to be lowered.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4570 >
2020-04-20 06:59:47 +00:00
Timothy Arceri
839818332c
nir/gcm: dont move movs unless we can replace them later with their src
...
This helps us avoid moving the movs outside if branches when there
src can't be scalarized.
For example it avoids:
vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
if ... {
r0 = imov ssa_7.z
r1 = imov ssa_7.y
r2 = imov ssa_7.x
r3 = imov ssa_7.w
...
} else {
...
if ... {
r0 = imov ssa_7.x
r1 = imov ssa_7.w
...
else {
r0 = imov ssa_7.z
r1 = imov ssa_7.y
...
}
r2 = imov ssa_7.x
r3 = imov ssa_7.w
}
...
vec4 32 ssa_36 = vec4 r0, r1, r2, r3
Becoming something like:
vec4 32 ssa_7 = tex ssa_6 (coord), 0 (texture), 0 (sampler),
r0 = imov ssa_7.z
r1 = imov ssa_7.y
r2 = imov ssa_7.x
r3 = imov ssa_7.w
if ... {
...
} else {
if ... {
r0 = imov r2
r1 = imov r3
...
else {
...
}
...
}
While this is has a smaller instruction count it requires more work
for the same result. With more complex examples we can also end up
shuffling the registers around in a way that requires more registers
to use as temps so that we don't overwrite our original values along
the way.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Timothy Arceri
e4e5beee8a
nir/gcm: be more conservative about moving instructions from loops
...
Here we only pull instructions further up control flow if they are
constant or texture instructions. See the code comment for more
information.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Timothy Arceri
bf4a6c99d2
nir/gcm: allow derivative dependent intrinisics to be moved earlier
...
We can't move them later as we could move them into non-uniform
control flow, but moving them earlier should be fine.
This helps avoid a bunch of spilling in unigine shaders due to
moving the tex instructions sources earlier (outside if branches)
but not the instruction itself.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Jason Ekstrand
50a6dd0d65
nir/gcm: Prefer the instruction's original block
...
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Jason Ekstrand
d4cf2df01a
nir/gcm: Delete dead instructions
...
Classically, global code motion is also a dead code pass. However, in
the initial implementation, the decision was made to place every
instruction and let conventional DCE clean up the dead ones. Because
any uses of a dead instruction are unreachable, we have no late block
and the dead instructions are always scheduled early. The problem is
that, because we place the dead instruction early, it pushes the
placement of any dependencies of the dead instruction earlier than they
may need to be placed. In order prevent dead instructions from
affecting the placement of live ones, we need to delete them.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Jason Ekstrand
dca3f351e5
nir/gcm: Add a real concept of "progress"
...
Now that the GCM pass is more conservative and only moves instructions
to different blocks when it's advantageous to do so, we can have a
proper notion of what it means to make progress.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Jason Ekstrand
5b1615fdb7
nir/gcm: Move block choosing into a helper function
...
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Jason Ekstrand
1f60f1aa3d
nir/gcm: Use an array for storing the early block
...
We are about to adjust our instruction block assignment algorithm and we
will want to know the current block that the instruction lives in. In
order to allow for this, we can't overwrite nir_instr::block in the
early scheduling pass.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Jason Ekstrand
6006a9e275
nir/gcm: Loop over blocks in pin_instructions
...
Now that we have the new block iterators, we can simplify things a bit.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Jason Ekstrand
4d083b52c0
nir/dominance: Better handle unreachable blocks
...
v2: Fix minor comments (Ken)
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4636 >
2020-04-20 03:46:29 +00:00
Arcady Goldmints-Orlov
ec1b96fdc8
nir: Lower returns correctly inside nested loops
...
Inside nested flow control, nir_lower_returns inserts predicated breaks
in the outer block. However, it would omit doing this if the remainder
of the outer block (after the inner block) was empty. This is not
correct in the case of loops, as execution just wraps back around to the
start of the loop, so this change doesn't skip the predication inside
loops.
Fixes: 79dec93ead (nir: Add return lowering pass)
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/2724
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4603 >
2020-04-19 02:54:08 +00:00
Timothy Arceri
c19ebca308
nir: add matrix_layout to nir_variable data
...
This will be used by the following patch.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4623 >
2020-04-18 11:50:44 +00:00
Jason Ekstrand
f5deed138a
spirv,nir: Move the SPIR-V vector insert code to NIR
...
This also makes spirv_to_nir a bit simpler because the new
nir_vector_insert helper automatically handles a constant component
selector like nir_vector_extract does.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4495 >
2020-04-17 19:21:44 +00:00
Jason Ekstrand
acaccff4d3
nir/builder: Handle any bit-size selector in nir_extract
...
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4495 >
2020-04-17 19:21:44 +00:00
Jason Ekstrand
9b17d7caac
nir: Add some sanity assertions in opt_large_constants
...
We make some assumptions in opt_large_constants such as the size_align
function returning the obvious sizes for vectors. Now that we've got
the deref_size lying around, we may as well assert it's consistent with
our assumptions. In particular, we now assert that it really claims
booleans are 32-bit. If anyone's driver ever decides to be clever and
change this, we'll now catch the breakage earlier.
Reviewed-by: Eric Anholt <eric@anholt.net >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468 >
2020-04-16 17:00:13 +00:00
Jason Ekstrand
33eb43349e
nir: Add an alignment to nir_intrinsic_load_constant
...
In f1883cc73d we tried to pass through alignments from load_constant
intrinsics when rewriting them to load_ubo in iris. However, those
intrinsics don't have ALIGN_MUL or ALIGN_OFFSET indices. It's easy
enough to add them. We just call the size/align function on the vector
type at the end of our deref chain and use the alignment returned from
there. It's possible we could do better by walking the whole deref
chain but this should be good enough.
Fixes: f1883cc73d "iris: Set alignments on cbuf0 and constant reads"
Closes : #2739
Reviewed-by: Eric Anholt <eric@anholt.net >
Tested-by: Ian Romanick <ian.d.romanick@intel.com >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4468 >
2020-04-16 17:00:13 +00:00
Connor Abbott
abcfb64370
ir3: Fix LDC offset units
...
I had missed that LDC actually uses vec4 units for its offset. This
means that we have to create a new instruction, and lower it in
ir3_nir_lower_io_offsets, similar to the existing SSBO instructions.
Unfortunately we can't assume that loads are always vec4-aligned, so we
have to use the alignment information that NIR gives us. Unfortunately,
it's currently woefully inadequate, and will have to be fixed to give us
good codegen in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4568 >
2020-04-15 22:38:20 +00:00
Connor Abbott
274f3815a5
ir3: Plumb through bindless support
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4358 >
2020-04-09 15:56:55 +00:00
Timothy Arceri
52c8bc0130
nir: make opt_if_loop_terminator() less strict
...
nir_cf_{extract,reinsert}() can't stitch a block together
if the block we are extracting ends in a jump but other jumps
nested in further ifs should be fine to move.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4477 >
2020-04-08 01:35:45 +00:00
Caio Marcelo de Oliveira Filho
5dc85abc4f
nir: Add per_view attribute to nir_variable
...
If a nir_variable is tagged with per_view, it must be an array with
size corresponding to the number of views. For slot-tracking, it is
considered to take just the slot for a single element -- drivers will
take care of expanding this appropriately.
This will be used to implement the ability of having per-view position
in a vertex shader in Intel platforms.
Acked-by: Rafael Antognolli <rafael.antognolli@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2313 >
2020-04-07 17:16:09 +00:00
Rob Clark
57557783f6
nir/lower_amul: fix slot calculation
...
Fixes incorrect indexing in
dEQP-GLES31.functional.ssbo.layout.instance_array_basic_type.packed.mat2x3
Signed-off-by: Rob Clark <robdclark@chromium.org >
Reviewed-by: Eric Anholt <eric@anholt.net >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455 >
2020-04-06 18:00:17 +00:00
Rob Clark
4638a16a93
nir: add some swizzle helpers
...
Signed-off-by: Rob Clark <robdclark@chromium.org >
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Reviewed-by: Eric Anholt <eric@anholt.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4455 >
2020-04-06 18:00:17 +00:00
Jason Ekstrand
e78a7a1825
nir: Assert memory loads are aligned
...
We've had alignment parameters on these operations for a while but a
bunch of places weren't setting them. That should be resolved now so we
can start validating that they're always set.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4441 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4441 >
2020-04-06 15:57:30 +00:00
Hyunjun Ko
9f174eb2df
nir: fix wrong assignment to buffer in xfb_varyings_info
...
Tested with dEQP-VK.transform_feedback.fuzz.various_buffers.buffers100_instance_array_vertex
Signed-off-by: Hyunjun Ko <zzoon@igalia.com >
Reviewed-by: Tapani Pälli <tapani.palli@intel.com >
Cc: mesa-stable@lists.freedesktop.org
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4459 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4459 >
2020-04-06 08:55:05 +00:00
Rob Clark
bf64648864
nir: fix definition of imadsh_mix16 for vectors
...
Fixes: c27b3758fa ("nir/opcodes: Add new 'umul_low' and 'imadsh_mix16' opcodes")
Signed-off-by: Rob Clark <robdclark@chromium.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4423 >
2020-04-04 00:07:10 +00:00
Daniel Schürmann
dc69738b0f
nir: fix unpack_64_4x16 in lower_alu_to_scalar()
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4002 >
2020-04-03 23:13:15 +01:00
Jason Ekstrand
36a32af008
nir/load_store_vectorize: Add support for nir_var_mem_global
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367 >
2020-04-03 20:26:54 +00:00
Jason Ekstrand
b6273291b5
nir/load_store_vectorize: Use nir_iadd_imm for offsets
...
This makes it capable of handling 64-bit offsets
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367 >
2020-04-03 20:26:54 +00:00
Jason Ekstrand
04d08ea149
nir/load_store_vectorize: Fix shared atomic info
...
These were clearly copied and pasted from SSBOs. The shared atomics
don't have an SSBO index so their offset is src0 and data is src1.
Fixes: ce9205c03b "nir: add a load/store vectorization pass"
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4367 >
2020-04-03 20:26:54 +00:00
Jason Ekstrand
c71c1f44b0
nir/from_ssa: Only chain movs when a src is also a dest
...
The algorithm we use for resolving parallel copy instructions plays this
little shell game with the values. The reason for this is that it lets
us handle cases where, for instance we have a -> b and b -> a and we
need to use a temporary to do a swap. One result of this algorithm is
that it tends to emit a lot of mov chains which are typcially really bad
for GPUs where a mov is far from free. For instance, it's likely to
turn this:
r16 = ssa_0; r17 = ssa_0; r18 = ssa_0; r15 = ssa_0
into this:
r15 = mov ssa_0
r18 = mov r15
r17 = mov r18
r16 = mov r17
which, if it's the only thing in a block (this is common for phis) is
impossible for a scheduler to fix because of the dependencies and you
end up with significant stalling. If, on the other hand, we only do the
chaining in the actual case where we need to free up a so that it can be
used as a destination, we can emit this:
r15 = mov ssa_0
r18 = mov ssa_0
r17 = mov ssa_0
r16 = mov ssa_0
which is far nicer to the scheduler. On Intel, our copy propagation
pass will undo the chain for us so this has no shader-db impact.
However, for less intelligent back-ends, it's probably a lot better.
Reviewed-by: Rob Clark <robdclark@chromium.org >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4412 >
2020-04-02 19:06:46 +00:00
Mark Janes
90a8b458ac
nir: check shader type before writing to shaderinfo.tess union
...
If the shader is not a tesselation shader, then writing to the tess
member of the shaderinfo union will overwrite other members and crash.
Closes : #2722
Fixes: f1dd81ae10 ("nir: Collect if shader uses cross-invocation or indirect I/O.")
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com >
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4408 >
2020-04-01 20:25:55 +00:00
Ian Romanick
b097e326b8
nir/algebraic: Remove a redundant fabs pattern
...
Made redundant by 5544b2cbbd ("nir/algebraic: Use value range analysis
to eliminate useless unary ops").
No shader-db changes on any Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359 >
2020-04-01 00:28:38 +00:00
Ian Romanick
af1bc7e0c7
nir/algebraic: Use value range analysis to convert fmax to fsat
...
This is conceptually similar to the 1-fsat(a) <=> fsat(1-a) rearragement
done in:
3b74790941 ("nir/algebraic: Recognize open-coded flrp(a, b, fsat(c))")
2d259713b7 ("nir/algebraic: Commute 1-fsat(a) to fsat(1-a) for all
non-fmul instructions").
Note: this helps the Aztex Ruins shader that was hurt for spills and
fills on Braodwell in the previous commit, but it does not fix the
spills or fills. :(
All Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 14528985 -> 14526116 (-0.02%)
instructions in affected programs: 477300 -> 474431 (-0.60%)
helped: 2332
HURT: 0
helped stats (abs) min: 1 max: 18 x̄: 1.23 x̃: 1
helped stats (rel) min: 0.07% max: 8.89% x̄: 0.88% x̃: 0.64%
95% mean confidence interval for instructions value: -1.27 -1.19
95% mean confidence interval for instructions %-change: -0.92% -0.85%
Instructions are helped.
total cycles in shared programs: 203723684 -> 203692984 (-0.02%)
cycles in affected programs: 4878847 -> 4848147 (-0.63%)
helped: 1764
HURT: 324
helped stats (abs) min: 1 max: 706 x̄: 22.94 x̃: 17
helped stats (rel) min: <.01% max: 17.75% x̄: 1.94% x̃: 1.66%
HURT stats (abs) min: 1 max: 400 x̄: 30.15 x̃: 10
HURT stats (rel) min: <.01% max: 17.76% x̄: 1.91% x̃: 0.69%
95% mean confidence interval for cycles value: -16.55 -12.86
95% mean confidence interval for cycles %-change: -1.44% -1.24%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359 >
2020-04-01 00:28:38 +00:00
Ian Romanick
62795475e8
nir/algebraic: Distribute source modifiers into instructions
...
There are three main classes of cases that are helped by this change:
1. When the negation is applied to a value being type converted (e.g.,
float(-x)). This could possibly also be handled with more clever
code generation.
2. When the negation is applied to a phi node source (e.g., x = -(...);
at the end of a basic block). This was the original case that caught
my attention while looking at shader-db dumps.
3. When the negation is applied to the source of an instruction that
cannot have source modifiers. This includes texture instructions and
math box instructions on pre-Gen7 platforms (see more details below).
In many these cases the negation can be propagated into the instructions
that generate the value (e.g., -(a*b) = (-a)*b).
In addition to the operations implemtned in this patch, I also tried:
- frcp - Helped 6 or fewer shaders on Gen7+, and hurt just as many on
pre-Gen7. On Gen6 and earlier, frcp is a math box instruction, and
math box instructions cannot have source modifiers.
I suspect this is why so many more shaders are helped on Gen6 than on
Gen5 or Gen7. Gen6 supports OpenGL 3.3, so a lot more shaders
compile on it. A lot of these shaders may have things like cos(-x)
or rcp(-x) that could result in an explicit negation instruction.
- bcsel - Hurt a few shaders with none helped. bcsel operates on
integer sources, so the fabs or fneg cannot be a source modifier in
the bcsel itself.
- Integer instructions - No changes on any Intel platform.
Some notes about the shader-db results below.
- On Tiger Lake, a single Deus Ex fragment shader is hurt for both
spills and fills.
- On Haswell, a different Deus Ex fragment shader is hurt for both
spills and fills.
- On GM45, the "LOST: 1" and "GAINED: 1" is a single Left4Dead 2
(very high graphics settings, lol) fragment shader that upgrades
from SIMD8 to SIMD16.
v2: Add support for fsign. Add some patterns that remove redundant
negations and redundant absolute value rather than trying to push them
down the tree.
Tiger Lake
total instructions in shared programs: 17611333 -> 17586465 (-0.14%)
instructions in affected programs: 3033734 -> 3008866 (-0.82%)
helped: 10310
HURT: 632
helped stats (abs) min: 1 max: 35 x̄: 2.61 x̃: 1
helped stats (rel) min: 0.04% max: 16.67% x̄: 1.43% x̃: 1.01%
HURT stats (abs) min: 1 max: 47 x̄: 3.21 x̃: 2
HURT stats (rel) min: 0.04% max: 5.08% x̄: 0.88% x̃: 0.63%
95% mean confidence interval for instructions value: -2.33 -2.21
95% mean confidence interval for instructions %-change: -1.32% -1.27%
Instructions are helped.
total cycles in shared programs: 338365223 -> 338262252 (-0.03%)
cycles in affected programs: 125291811 -> 125188840 (-0.08%)
helped: 5224
HURT: 2031
helped stats (abs) min: 1 max: 5670 x̄: 46.73 x̃: 12
helped stats (rel) min: <.01% max: 34.78% x̄: 1.91% x̃: 0.97%
HURT stats (abs) min: 1 max: 2882 x̄: 69.50 x̃: 14
HURT stats (rel) min: <.01% max: 44.93% x̄: 2.35% x̃: 0.74%
95% mean confidence interval for cycles value: -18.71 -9.68
95% mean confidence interval for cycles %-change: -0.80% -0.63%
Cycles are helped.
total spills in shared programs: 8942 -> 8946 (0.04%)
spills in affected programs: 8 -> 12 (50.00%)
helped: 0
HURT: 1
total fills in shared programs: 9399 -> 9401 (0.02%)
fills in affected programs: 21 -> 23 (9.52%)
helped: 0
HURT: 1
Ice Lake
total instructions in shared programs: 16124348 -> 16102258 (-0.14%)
instructions in affected programs: 2830928 -> 2808838 (-0.78%)
helped: 11294
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.96 x̃: 1
helped stats (rel) min: 0.07% max: 17.65% x̄: 1.32% x̃: 0.93%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 3.45% max: 4.00% x̄: 3.72% x̃: 3.72%
95% mean confidence interval for instructions value: -1.99 -1.93
95% mean confidence interval for instructions %-change: -1.34% -1.29%
Instructions are helped.
total cycles in shared programs: 335393932 -> 335325794 (-0.02%)
cycles in affected programs: 123834609 -> 123766471 (-0.06%)
helped: 5034
HURT: 2128
helped stats (abs) min: 1 max: 3256 x̄: 43.39 x̃: 11
helped stats (rel) min: <.01% max: 35.79% x̄: 1.98% x̃: 1.00%
HURT stats (abs) min: 1 max: 2634 x̄: 70.63 x̃: 16
HURT stats (rel) min: <.01% max: 49.49% x̄: 2.73% x̃: 0.62%
95% mean confidence interval for cycles value: -13.66 -5.37
95% mean confidence interval for cycles %-change: -0.69% -0.48%
Cycles are helped.
LOST: 0
GAINED: 2
Skylake
total instructions in shared programs: 14949240 -> 14927930 (-0.14%)
instructions in affected programs: 2594756 -> 2573446 (-0.82%)
helped: 11000
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.94 x̃: 1
helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76%
95% mean confidence interval for instructions value: -1.97 -1.91
95% mean confidence interval for instructions %-change: -1.42% -1.37%
Instructions are helped.
total cycles in shared programs: 324829346 -> 324821596 (<.01%)
cycles in affected programs: 121566087 -> 121558337 (<.01%)
helped: 4611
HURT: 2147
helped stats (abs) min: 1 max: 3715 x̄: 33.29 x̃: 10
helped stats (rel) min: <.01% max: 36.08% x̄: 1.94% x̃: 1.00%
HURT stats (abs) min: 1 max: 2551 x̄: 67.88 x̃: 16
HURT stats (rel) min: <.01% max: 53.79% x̄: 3.69% x̃: 0.89%
95% mean confidence interval for cycles value: -4.25 1.96
95% mean confidence interval for cycles %-change: -0.28% -0.02%
Inconclusive result (value mean confidence interval includes 0).
Broadwell
total instructions in shared programs: 14971203 -> 14949957 (-0.14%)
instructions in affected programs: 2635699 -> 2614453 (-0.81%)
helped: 10982
HURT: 2
helped stats (abs) min: 1 max: 12 x̄: 1.93 x̃: 1
helped stats (rel) min: 0.07% max: 18.75% x̄: 1.39% x̃: 0.94%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 4.76% max: 4.76% x̄: 4.76% x̃: 4.76%
95% mean confidence interval for instructions value: -1.97 -1.90
95% mean confidence interval for instructions %-change: -1.42% -1.37%
Instructions are helped.
total cycles in shared programs: 336215033 -> 336086458 (-0.04%)
cycles in affected programs: 127383198 -> 127254623 (-0.10%)
helped: 4884
HURT: 1963
helped stats (abs) min: 1 max: 25696 x̄: 51.78 x̃: 12
helped stats (rel) min: <.01% max: 58.28% x̄: 2.00% x̃: 1.05%
HURT stats (abs) min: 1 max: 3401 x̄: 63.33 x̃: 16
HURT stats (rel) min: <.01% max: 39.95% x̄: 2.20% x̃: 0.70%
95% mean confidence interval for cycles value: -29.99 -7.57
95% mean confidence interval for cycles %-change: -0.89% -0.71%
Cycles are helped.
total fills in shared programs: 24905 -> 24901 (-0.02%)
fills in affected programs: 117 -> 113 (-3.42%)
helped: 4
HURT: 0
LOST: 0
GAINED: 16
Haswell
total instructions in shared programs: 13148927 -> 13131528 (-0.13%)
instructions in affected programs: 2220941 -> 2203542 (-0.78%)
helped: 8017
HURT: 4
helped stats (abs) min: 1 max: 12 x̄: 2.17 x̃: 1
helped stats (rel) min: 0.07% max: 15.25% x̄: 1.40% x̃: 0.93%
HURT stats (abs) min: 1 max: 7 x̄: 2.50 x̃: 1
HURT stats (rel) min: 0.33% max: 4.76% x̄: 2.73% x̃: 2.91%
95% mean confidence interval for instructions value: -2.21 -2.13
95% mean confidence interval for instructions %-change: -1.43% -1.37%
Instructions are helped.
total cycles in shared programs: 321221791 -> 321079870 (-0.04%)
cycles in affected programs: 126886055 -> 126744134 (-0.11%)
helped: 4674
HURT: 1729
helped stats (abs) min: 1 max: 23654 x̄: 56.47 x̃: 16
helped stats (rel) min: <.01% max: 53.22% x̄: 2.13% x̃: 1.05%
HURT stats (abs) min: 1 max: 3694 x̄: 70.58 x̃: 18
HURT stats (rel) min: <.01% max: 63.06% x̄: 2.48% x̃: 0.90%
95% mean confidence interval for cycles value: -33.31 -11.02
95% mean confidence interval for cycles %-change: -0.99% -0.78%
Cycles are helped.
total spills in shared programs: 19872 -> 19874 (0.01%)
spills in affected programs: 21 -> 23 (9.52%)
helped: 0
HURT: 1
total fills in shared programs: 20941 -> 20941 (0.00%)
fills in affected programs: 62 -> 62 (0.00%)
helped: 1
HURT: 1
LOST: 0
GAINED: 8
Ivy Bridge
total instructions in shared programs: 11875553 -> 11853839 (-0.18%)
instructions in affected programs: 1553112 -> 1531398 (-1.40%)
helped: 7304
HURT: 3
helped stats (abs) min: 1 max: 16 x̄: 2.97 x̃: 2
helped stats (rel) min: 0.07% max: 15.25% x̄: 1.62% x̃: 1.15%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.05% max: 3.33% x̄: 2.44% x̃: 2.94%
95% mean confidence interval for instructions value: -3.04 -2.90
95% mean confidence interval for instructions %-change: -1.65% -1.59%
Instructions are helped.
total cycles in shared programs: 178246425 -> 178184484 (-0.03%)
cycles in affected programs: 13702146 -> 13640205 (-0.45%)
helped: 4409
HURT: 1566
helped stats (abs) min: 1 max: 531 x̄: 24.52 x̃: 13
helped stats (rel) min: <.01% max: 38.67% x̄: 2.14% x̃: 1.02%
HURT stats (abs) min: 1 max: 356 x̄: 29.48 x̃: 10
HURT stats (rel) min: <.01% max: 64.73% x̄: 1.87% x̃: 0.70%
95% mean confidence interval for cycles value: -11.60 -9.14
95% mean confidence interval for cycles %-change: -1.19% -0.99%
Cycles are helped.
LOST: 0
GAINED: 10
Sandy Bridge
total instructions in shared programs: 10695740 -> 10667483 (-0.26%)
instructions in affected programs: 2337607 -> 2309350 (-1.21%)
helped: 10720
HURT: 1
helped stats (abs) min: 1 max: 49 x̄: 2.64 x̃: 2
helped stats (rel) min: 0.07% max: 20.00% x̄: 1.54% x̃: 1.13%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 1.04% max: 1.04% x̄: 1.04% x̃: 1.04%
95% mean confidence interval for instructions value: -2.69 -2.58
95% mean confidence interval for instructions %-change: -1.57% -1.51%
Instructions are helped.
total cycles in shared programs: 153478839 -> 153416223 (-0.04%)
cycles in affected programs: 22050900 -> 21988284 (-0.28%)
helped: 5342
HURT: 2200
helped stats (abs) min: 1 max: 1020 x̄: 20.34 x̃: 16
helped stats (rel) min: <.01% max: 24.05% x̄: 1.51% x̃: 0.86%
HURT stats (abs) min: 1 max: 335 x̄: 20.93 x̃: 6
HURT stats (rel) min: <.01% max: 20.18% x̄: 1.03% x̃: 0.30%
95% mean confidence interval for cycles value: -9.18 -7.42
95% mean confidence interval for cycles %-change: -0.82% -0.71%
Cycles are helped.
Iron Lake
total instructions in shared programs: 8114882 -> 8105574 (-0.11%)
instructions in affected programs: 1232504 -> 1223196 (-0.76%)
helped: 4109
HURT: 2
helped stats (abs) min: 1 max: 6 x̄: 2.27 x̃: 1
helped stats (rel) min: 0.05% max: 8.33% x̄: 0.99% x̃: 0.66%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.94% max: 4.35% x̄: 2.65% x̃: 2.65%
95% mean confidence interval for instructions value: -2.31 -2.21
95% mean confidence interval for instructions %-change: -1.01% -0.96%
Instructions are helped.
total cycles in shared programs: 188504036 -> 188466296 (-0.02%)
cycles in affected programs: 31203798 -> 31166058 (-0.12%)
helped: 3447
HURT: 36
helped stats (abs) min: 2 max: 92 x̄: 11.03 x̃: 8
helped stats (rel) min: <.01% max: 5.41% x̄: 0.21% x̃: 0.13%
HURT stats (abs) min: 2 max: 30 x̄: 7.33 x̃: 6
HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.18% x̃: 0.10%
95% mean confidence interval for cycles value: -11.16 -10.51
95% mean confidence interval for cycles %-change: -0.22% -0.20%
Cycles are helped.
LOST: 0
GAINED: 1
GM45
total instructions in shared programs: 4989697 -> 4984531 (-0.10%)
instructions in affected programs: 703952 -> 698786 (-0.73%)
helped: 2493
HURT: 2
helped stats (abs) min: 1 max: 6 x̄: 2.07 x̃: 1
helped stats (rel) min: 0.05% max: 8.33% x̄: 1.03% x̃: 0.66%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.95% max: 4.35% x̄: 2.65% x̃: 2.65%
95% mean confidence interval for instructions value: -2.13 -2.01
95% mean confidence interval for instructions %-change: -1.07% -0.99%
Instructions are helped.
total cycles in shared programs: 128929136 -> 128903886 (-0.02%)
cycles in affected programs: 21583096 -> 21557846 (-0.12%)
helped: 2214
HURT: 17
helped stats (abs) min: 2 max: 92 x̄: 11.44 x̃: 8
helped stats (rel) min: <.01% max: 5.41% x̄: 0.24% x̃: 0.13%
HURT stats (abs) min: 2 max: 8 x̄: 4.24 x̃: 4
HURT stats (rel) min: 0.01% max: 1.65% x̄: 0.20% x̃: 0.09%
95% mean confidence interval for cycles value: -11.75 -10.88
95% mean confidence interval for cycles %-change: -0.25% -0.22%
Cycles are helped.
LOST: 1
GAINED: 1
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359 >
2020-04-01 00:28:38 +00:00