Jason Ekstrand
14a49f31d3
nir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64
...
We have the code to do the lowering, we were just missing the
boilerplate bits to make should_lower_int64_alu_instr return true.
Fixes: 62d55f1281 "nir: Wire up int64 lowering functions"
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365 >
2020-03-31 00:18:05 +00:00
Jason Ekstrand
b113170559
nir/opt_loop_unroll: Fix has_nested_loop handling
...
In 87839680c0 , a very subtle mistake was made with the CFG walking
recursion. Instead of setting the local has_nested_loop variable when
process child loops, has_nested_loop_out was passed directly into the
process_loop_in_block call. This broke nested loop detection heuristics
and caused loop unrolling to run massively out of control. In
particular, it makes the following CTS test compile virtually forever:
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom
Fixes: 87839680c0 "nir: Fix breakage of foreach_list_typed_safe..."
Closes : #2710
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380 >
2020-03-30 22:20:47 +00:00
Jason Ekstrand
f5b14d983e
nir: Set UBO alignments in lower_uniforms_to_ubo
...
Fixes: fb64954d9d "nir: Validate that memory load/store ops work on..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378 >
2020-03-30 19:18:17 +00:00
Jason Ekstrand
fb64954d9d
nir: Validate that memory load/store ops work on whole bytes
...
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338 >
2020-03-30 15:46:19 +00:00
Jason Ekstrand
c217ee8d35
nir: Insert b2b1s around booleans in nir_lower_to
...
By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics
generated by nir_lower_io, we can ensure that the intrinsic has the
correct destination bit size. Not having the right size can mess up
passes which try to optimize access. In particular, it was causing
brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant
that booleans uniforms weren't getting pushed as push constants. I
don't think this is an actual functional bug anywhere hence no CC to
stable but it may improve perf somewhere.
Shader-db results on ICL with iris:
total instructions in shared programs: 16076707 -> 16075246 (<.01%)
instructions in affected programs: 129034 -> 127573 (-1.13%)
helped: 487
HURT: 0
helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36%
95% mean confidence interval for instructions value: -3.00 -3.00
95% mean confidence interval for instructions %-change: -1.37% -1.29%
Instructions are helped.
total cycles in shared programs: 338015639 -> 337983311 (<.01%)
cycles in affected programs: 971986 -> 939658 (-3.33%)
helped: 362
HURT: 110
helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43
helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60%
HURT stats (abs) min: 1 max: 554 x̄: 26.55 x̃: 18
HURT stats (rel) min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96%
95% mean confidence interval for cycles value: -79.97 -57.01
95% mean confidence interval for cycles %-change: -4.60% -3.47%
Cycles are helped.
total sends in shared programs: 815037 -> 814550 (-0.06%)
sends in affected programs: 5701 -> 5214 (-8.54%)
helped: 487
HURT: 0
LOST: 2
GAINED: 0
The two lost programs were SIMD16 shaders in CS:GO. However, CS:GO was
also one of the most helped programs where it shaves sends off of 134
programs. This seems to reduce GPU core clocks by about 4% on the first
1000 frames of the PTS benchmark.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338 >
2020-03-30 15:46:19 +00:00
Jason Ekstrand
d2dfcee7f7
nir: Use b2b opcodes for shared and constant memory
...
No shader-db changes on ICL with iris
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338 >
2020-03-30 15:46:19 +00:00
Jason Ekstrand
b2db84153a
nir: Add b2b opcodes
...
These exist to convert between different types of boolean values. In
particular, we want to use these for uniform and shared memory
operations where we need to convert to a reasonably sized boolean but we
don't care what its format is so we don't want to make the back-end
insert an actual i2b/b2i. In the case of uniforms, Mesa can tweak the
format of the uniform boolean to whatever the driver wants. In the case
of shared, every value in a shared variable comes from the shader so
it's already in the right boolean format.
The new boolean conversion opcodes get replaced with mov in
lower_bool_to_int/float32 so the back-end will hopefully never see them.
However, while we're in the middle of optimizing our NIR, they let us
have sensible load_uniform/ubo intrinsics and also have the bit size
conversion.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338 >
2020-03-30 15:46:19 +00:00
Samuel Pitoiset
3935a729d9
nir/algebraic: add fexp2(fmul(flog2(a), 0.5) -> fsqrt(a) optimization
...
Helps some Wolfenstein II and Wolfenstein Youngblood shaders.
pipeline-db (VEGA10/ACO):
Totals from affected shaders:
SGPRS: 17904 -> 17904 (0.00 %)
VGPRS: 14492 -> 14492 (0.00 %)
Spilled SGPRs: 20 -> 20 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1753152 -> 1749708 (-0.20 %) bytes
Max Waves: 2581 -> 2581 (0.00 %)
pipeline-db (VEGA10/LLVM):
Totals from affected shaders:
SGPRS: 26656 -> 26656 (0.00 %)
VGPRS: 23780 -> 23780 (0.00 %)
Spilled SGPRs: 2112 -> 2112 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 2552712 -> 2549236 (-0.14 %) bytes
Max Waves: 3359 -> 3359 (0.00 %)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353 >
2020-03-30 14:07:43 +00:00
Timur Kristóf
f1dd81ae10
nir: Collect if shader uses cross-invocation or indirect I/O.
...
The following new fields are added to tess shader info:
* `tcs_cross_invocation_inputs_read`
* `tcs_cross_invocation_outputs_read`
These are I/O masks that are a subset of inputs_read and outputs_read
and they contain which per-vertex inputs and outputs are read
cross-invocation.
Additionall, the following new fields are added to shader_info:
* `inputs_read_indirectly`
* `outputs_accessed_indirectly`
* `patch_inputs_read_indirectly`
* `patch_outputs_accessed_indirectly`
These new fields can be used for optimizing TCS in a back-end compiler.
If you can be sure that the TCS doesn't use cross-invocation inputs
or outputs, you can choose a different strategy for storing VS and TCS
outputs. However, such optimizations might need to be disabled when
the inputs/outputs are accessed indirectly due to backend limitations,
so this information is also collected.
Example: RADV currently has to store all VS and TCS outputs in LDS, but
for shaders when only inputs and/or outputs belonging to the current
invocation ID are used, it could skip storing these in LDS entirely.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165 >
2020-03-30 13:09:08 +00:00
Danylo Piliaiev
87839680c0
nir: Fix breakage of foreach_list_typed_safe assumptions in loop unrolling
...
foreach_list_typed_safe works with assumption that even if current node
becomes invalid, the next will be still valid.
However process_loops broke this assumption, because during iteration
when immediate child is unrolled - not only current node could be removed
but also the one after it.
This doesn't cause issues now but it will cause issues when undefined
behaviour in foreach* macros is fixed.
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com >
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4189 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4189 >
2020-03-30 14:41:30 +03:00
Eric Engestrom
79af30768d
meson: inline inc_common
...
Let's make it clear what includes are being added everywhere, so that
they can be cleaned up.
Signed-off-by: Eric Engestrom <eric@engestrom.ch >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360 >
2020-03-28 21:36:54 +01:00
Marek Olšák
e5339fe4a4
Move compiler.h and imports.h/c from src/mesa/main into src/util
...
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4324 >
2020-03-27 21:00:09 +00:00
Timothy Arceri
b5e00f5c2b
nir: fix packing of TCS varyings not read by the TES
...
Unlike other stages TCS outputs not read by the TES cannot always
be demoted to globals e.g. when they are read by other TCS
invocations.
We were not taking these outputs into account when packing which
could result in other outputs being assigned to the same location.
Here we make sure to gather information on these outputs and group
them together when packing.
This fixes rendering issues in QUBE 2 via Proton.
Closes : #2653
Fixes: 26aa460940 ("nir: rewrite varying component packing")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328 >
2020-03-27 07:26:39 +00:00
Erik Faye-Lund
c98e745e78
compiler/nir: move build_log helper into builtin-builder
...
Reviewed-by: Karol Herbst <kherbst@redhat.com >
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318 >
2020-03-26 10:14:22 +00:00
Erik Faye-Lund
f59ae68838
compiler/nir: move build_exp helper into builtin-builder
...
Reviewed-by: Karol Herbst <kherbst@redhat.com >
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318 >
2020-03-26 10:14:22 +00:00
Marek Olšák
85a723975b
nir: add and gather shader_info::writes_memory
...
for out-of-order drawing.
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4152 >
2020-03-26 03:08:34 -04:00
Pierre-Eric Pelloux-Prayer
84da4ded4b
nir: update uses_demote flag in discard_to_demote pass
...
Otherwise the ctx.ac.postponed_kill will not be allocated.
Fixes: ce87da71e9 ("nir: add pass to lower discard() to demote()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2662
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4301 >
2020-03-25 08:19:33 +01:00
Iago Toral Quiroga
467c9a0faa
nir: add a bool bitsize lowering pass
...
The pass lowers 1-bit booleans produced by NIR to the native bitsize
of the operations that produce them.
v2: change on lower_load_const_instr after upstream changes. Added
TODO2 to explain it, as it was not properly tested yet (see
already existing TODO) (Neil)
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885 >
2020-03-24 23:21:21 +00:00
Rhys Perry
9f4ba2d2b4
nir/gather_info: fix per-vertex handling in try_mask_partial_io
...
pipeline-db (Navi, ACO):
Totals from affected shaders:
SGPRS: 6432 -> 6432 (0.00 %)
VGPRS: 11924 -> 11924 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 1596 -> 1596 (0.00 %) dwords per thread
Code Size: 575524 -> 518620 (-9.89 %) bytes
LDS: 12187 -> 12187 (0.00 %) blocks
Max Waves: 2695 -> 2695 (0.00 %)
Helps a few hundred Dark Souls 3 shaders.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
CC: <mesa-stable@lists.freedesktop.org >
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4190 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4190 >
2020-03-24 11:09:15 +00:00
Rhys Perry
5193688e1a
nir/gather_info: handle emit_vertex_with_counter
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
CC: <mesa-stable@lists.freedesktop.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4193 >
2020-03-19 15:37:07 +00:00
Marek Olšák
3c03718fd7
nir: fix clip/cull_distance_array_size in nir_lower_clip_cull_distance_arrays
...
This fixes a GPU hang on radeonsi.
It only works if optimizations have already been run.
Cc: 19.3 20.0 <mesa-stable@lists.freedesktop.org >
Reviewed-by: Tapani Pälli <tapani.palli@intel.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4194 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4194 >
2020-03-19 01:47:28 -04:00
Ian Romanick
b421c0466d
soft-fp64/flt: Perform checks in a different order
...
The change to nir_opt_algebraic cleans up a pattern that was never
produced before the rest of this commit was added.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 843005 -> 841666 (-0.16%)
instructions in affected programs: 460655 -> 459316 (-0.29%)
helped: 64
HURT: 17
helped stats (abs) min: 1 max: 72 x̄: 21.72 x̃: 20
helped stats (rel) min: 0.01% max: 28.07% x̄: 12.67% x̃: 16.07%
HURT stats (abs) min: 1 max: 7 x̄: 3.00 x̃: 2
HURT stats (rel) min: 0.01% max: 0.04% x̄: 0.02% x̃: 0.02%
95% mean confidence interval for instructions value: -20.87 -12.19
95% mean confidence interval for instructions %-change: -12.35% -7.66%
Instructions are helped.
total cycles in shared programs: 6944998 -> 6927246 (-0.26%)
cycles in affected programs: 3891872 -> 3874120 (-0.46%)
helped: 71
HURT: 10
helped stats (abs) min: 2 max: 772 x̄: 254.21 x̃: 156
helped stats (rel) min: <.01% max: 66.44% x̄: 21.72% x̃: 18.40%
HURT stats (abs) min: 18 max: 69 x̄: 29.70 x̃: 20
HURT stats (rel) min: 0.02% max: 0.04% x̄: 0.03% x̃: 0.03%
95% mean confidence interval for cycles value: -270.82 -167.50
95% mean confidence interval for cycles %-change: -24.41% -13.65%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
4e3d69ad07
nir/algebraic: Simplify a contradiction that can occur in __flt64_nonnan
...
The pattern is added to opt_algebraic because, for example, comparisons
with constant 0.0 will produce (a1 < 0).
Even with a pass that optimized Boolean expressions, I think this would
be very difficult to automatically recognize and optimize.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 933054 -> 929619 (-0.37%)
instructions in affected programs: 784041 -> 780606 (-0.44%)
helped: 59
HURT: 0
helped stats (abs) min: 2 max: 213 x̄: 58.22 x̃: 44
helped stats (rel) min: 0.02% max: 2.51% x̄: 0.72% x̃: 0.46%
95% mean confidence interval for instructions value: -70.80 -45.64
95% mean confidence interval for instructions %-change: -0.92% -0.53%
Instructions are helped.
total cycles in shared programs: 7304712 -> 7280180 (-0.34%)
cycles in affected programs: 7176260 -> 7151728 (-0.34%)
helped: 92
HURT: 0
helped stats (abs) min: 8 max: 1414 x̄: 266.65 x̃: 166
helped stats (rel) min: 0.04% max: 2.34% x̄: 0.43% x̃: 0.22%
95% mean confidence interval for cycles value: -333.05 -200.26
95% mean confidence interval for cycles %-change: -0.54% -0.31%
Cycles are helped.
Regular shader-db changes:
No changes on any Intel platform.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
e0cefc5a23
nir/algebraic: Constant reassociation for bitwise operations too
...
Like 5886cd79a0 , but for iand, ior, and ixor.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake
total instructions in shared programs: 903108 -> 902830 (-0.03%)
instructions in affected programs: 654910 -> 654632 (-0.04%)
helped: 31
HURT: 5
helped stats (abs) min: 2 max: 31 x̄: 9.58 x̃: 7
helped stats (rel) min: 0.01% max: 0.23% x̄: 0.06% x̃: 0.04%
HURT stats (abs) min: 1 max: 10 x̄: 3.80 x̃: 3
HURT stats (rel) min: 0.01% max: 0.10% x̄: 0.03% x̃: 0.02%
95% mean confidence interval for instructions value: -10.55 -4.89
95% mean confidence interval for instructions %-change: -0.07% -0.03%
Instructions are helped.
total cycles in shared programs: 7059681 -> 7058006 (-0.02%)
cycles in affected programs: 5081309 -> 5079634 (-0.03%)
helped: 33
HURT: 12
helped stats (abs) min: 1 max: 444 x̄: 60.91 x̃: 18
helped stats (rel) min: <.01% max: 2.17% x̄: 0.25% x̃: 0.05%
HURT stats (abs) min: 1 max: 288 x̄: 27.92 x̃: 2
HURT stats (rel) min: <.01% max: 1.00% x̄: 0.23% x̃: 0.02%
95% mean confidence interval for cycles value: -68.32 -6.12
95% mean confidence interval for cycles %-change: -0.28% 0.03%
Inconclusive result (%-change mean confidence interval includes 0).
Ice Lake
total instructions in shared programs: 895384 -> 895159 (-0.03%)
instructions in affected programs: 658678 -> 658453 (-0.03%)
helped: 37
HURT: 0
helped stats (abs) min: 3 max: 16 x̄: 6.08 x̃: 4
helped stats (rel) min: <.01% max: 0.07% x̄: 0.04% x̃: 0.04%
95% mean confidence interval for instructions value: -7.46 -4.70
95% mean confidence interval for instructions %-change: -0.04% -0.03%
Instructions are helped.
total cycles in shared programs: 7092224 -> 7091195 (-0.01%)
cycles in affected programs: 5221666 -> 5220637 (-0.02%)
helped: 35
HURT: 11
helped stats (abs) min: 1 max: 247 x̄: 43.46 x̃: 12
helped stats (rel) min: <.01% max: 2.17% x̄: 0.23% x̃: 0.05%
HURT stats (abs) min: 2 max: 432 x̄: 44.73 x̃: 5
HURT stats (rel) min: <.01% max: 1.00% x̄: 0.25% x̃: 0.02%
95% mean confidence interval for cycles value: -49.00 4.26
95% mean confidence interval for cycles %-change: -0.27% 0.03%
Inconclusive result (value mean confidence interval includes 0).
Regular shader-db results:
All Haswell+ platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611408 -> 17611398 (<.01%)
instructions in affected programs: 1648 -> 1638 (-0.61%)
helped: 2
HURT: 0
total cycles in shared programs: 338366148 -> 338366124 (<.01%)
cycles in affected programs: 124048 -> 124024 (-0.02%)
helped: 2
HURT: 0
No changes on any earlier Intel platforms.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
1d36af9338
nir/algebraic: Generalize some and-of-shift-right patterns [v2]
...
Generalizes some of the patterns from 76289fbfa8 and 905ff86198 . In
particular, some of the soft-fp64 code generates (a & 0x7fffffff) << 1
when constant 0.0 is compared (flt or feq).
v2: Reduce the set of added patterns to those that actually help
something. This reduces the size of the state transition tables by
about 29k. Suggested by Jason. Remove the existing patterns that this
commit subsumes.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake
total instructions in shared programs: 903171 -> 903108 (<.01%)
instructions in affected programs: 635903 -> 635840 (<.01%)
helped: 25
HURT: 11
helped stats (abs) min: 1 max: 16 x̄: 5.04 x̃: 3
helped stats (rel) min: <.01% max: 0.15% x̄: 0.04% x̃: 0.03%
HURT stats (abs) min: 2 max: 14 x̄: 5.73 x̃: 5
HURT stats (rel) min: <.01% max: 0.11% x̄: 0.04% x̃: 0.02%
95% mean confidence interval for instructions value: -3.91 0.41
95% mean confidence interval for instructions %-change: -0.03% <.01%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 7059527 -> 7059681 (<.01%)
cycles in affected programs: 5249401 -> 5249555 (<.01%)
helped: 41
HURT: 9
helped stats (abs) min: 2 max: 76 x̄: 11.90 x̃: 10
helped stats (rel) min: <.01% max: 11.86% x̄: 0.99% x̃: 0.01%
HURT stats (abs) min: 2 max: 380 x̄: 71.33 x̃: 12
HURT stats (rel) min: <.01% max: 0.22% x̄: 0.04% x̃: 0.01%
95% mean confidence interval for cycles value: -14.93 21.09
95% mean confidence interval for cycles %-change: -1.40% -0.20%
Inconclusive result (value mean confidence interval includes 0).
Ice Lake
total instructions in shared programs: 895506 -> 895384 (-0.01%)
instructions in affected programs: 658800 -> 658678 (-0.02%)
helped: 37
HURT: 0
helped stats (abs) min: 2 max: 8 x̄: 3.30 x̃: 2
helped stats (rel) min: <.01% max: 0.03% x̄: 0.02% x̃: 0.02%
95% mean confidence interval for instructions value: -4.00 -2.59
95% mean confidence interval for instructions %-change: -0.02% -0.02%
Instructions are helped.
total cycles in shared programs: 7092748 -> 7092224 (<.01%)
cycles in affected programs: 5272008 -> 5271484 (<.01%)
helped: 36
HURT: 14
helped stats (abs) min: 2 max: 440 x̄: 21.67 x̃: 8
helped stats (rel) min: <.01% max: 11.86% x̄: 1.12% x̃: 0.02%
HURT stats (abs) min: 2 max: 122 x̄: 18.29 x̃: 6
HURT stats (rel) min: <.01% max: 0.07% x̄: 0.01% x̃: <.01%
95% mean confidence interval for cycles value: -29.24 8.28
95% mean confidence interval for cycles %-change: -1.40% -0.21%
Inconclusive result (value mean confidence interval includes 0).
Regular shader-db results:
All Haswell+ platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611489 -> 17611408 (<.01%)
instructions in affected programs: 21188 -> 21107 (-0.38%)
helped: 23
HURT: 1
helped stats (abs) min: 1 max: 16 x̄: 3.78 x̃: 3
helped stats (rel) min: 0.03% max: 5.82% x̄: 1.13% x̃: 0.85%
HURT stats (abs) min: 6 max: 6 x̄: 6.00 x̃: 6
HURT stats (rel) min: 0.60% max: 0.60% x̄: 0.60% x̃: 0.60%
95% mean confidence interval for instructions value: -5.27 -1.48
95% mean confidence interval for instructions %-change: -1.70% -0.42%
Instructions are helped.
total cycles in shared programs: 338418502 -> 338366148 (-0.02%)
cycles in affected programs: 2289052 -> 2236698 (-2.29%)
helped: 18
HURT: 3
helped stats (abs) min: 4 max: 18000 x̄: 2909.67 x̃: 38
helped stats (rel) min: 0.09% max: 4.07% x̄: 0.96% x̃: 0.43%
HURT stats (abs) min: 2 max: 14 x̄: 6.67 x̃: 4
HURT stats (rel) min: 0.22% max: 1.13% x̄: 0.66% x̃: 0.64%
95% mean confidence interval for cycles value: -5204.00 217.91
95% mean confidence interval for cycles %-change: -1.31% -0.14%
Inconclusive result (value mean confidence interval includes 0).
Ivy Bridge
total instructions in shared programs: 11875617 -> 11875615 (<.01%)
instructions in affected programs: 1339 -> 1337 (-0.15%)
helped: 2
HURT: 0
No changes on any earlier Intel platforms.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Reviewed-by: Matt Turner <mattst88@gmail.com > [v1]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
d6d63aec18
nir/algebraic: optimize ior(ine(a, 0), ine(b, 0)) to ine(ior(a, b), 0)
...
Like 70f9e2589e . Also scrub the unnecessary size qualifier in both
replacement patterns.
This occurs in a handful of places in the soft-fp64 code, and that is
the primary reason for the change.
Perhaps the patterns that generate umin should be conditioned on
something, but I'm not sure what. lower_bitops might cover the cases
that matter, but it seems ugly.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 936505 -> 933388 (-0.33%)
instructions in affected programs: 925719 -> 922602 (-0.34%)
helped: 154
HURT: 1
helped stats (abs) min: 1 max: 211 x̄: 35.45 x̃: 16
helped stats (rel) min: 0.34% max: 9.30% x̄: 2.28% x̃: 0.96%
HURT stats (abs) min: 2342 max: 2342 x̄: 2342.00 x̃: 2342
HURT stats (rel) min: 2.28% max: 2.28% x̄: 2.28% x̃: 2.28%
95% mean confidence interval for instructions value: -51.21 10.99
95% mean confidence interval for instructions %-change: -2.61% -1.89%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 7323502 -> 7306184 (-0.24%)
cycles in affected programs: 7220376 -> 7203058 (-0.24%)
helped: 126
HURT: 1
helped stats (abs) min: 2 max: 946 x̄: 159.10 x̃: 95
helped stats (rel) min: 0.01% max: 9.62% x̄: 0.80% x̃: 0.37%
HURT stats (abs) min: 2728 max: 2728 x̄: 2728.00 x̃: 2728
HURT stats (rel) min: 0.37% max: 0.37% x̄: 0.37% x̃: 0.37%
95% mean confidence interval for cycles value: -192.07 -80.66
95% mean confidence interval for cycles %-change: -1.07% -0.51%
Cycles are helped.
total spills in shared programs: 635 -> 817 (28.66%)
spills in affected programs: 635 -> 817 (28.66%)
helped: 0
HURT: 3
total fills in shared programs: 2065 -> 2438 (18.06%)
fills in affected programs: 2019 -> 2392 (18.47%)
helped: 0
HURT: 2
Regular shader-db results:
All Haswell+ platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611506 -> 17611489 (<.01%)
instructions in affected programs: 33442 -> 33425 (-0.05%)
helped: 32
HURT: 6
helped stats (abs) min: 1 max: 6 x̄: 1.69 x̃: 1
helped stats (rel) min: 0.08% max: 1.90% x̄: 0.27% x̃: 0.11%
HURT stats (abs) min: 1 max: 15 x̄: 6.17 x̃: 5
HURT stats (rel) min: 0.09% max: 1.50% x̄: 0.65% x̃: 0.55%
95% mean confidence interval for instructions value: -1.70 0.80
95% mean confidence interval for instructions %-change: -0.30% 0.05%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 338419218 -> 338418502 (<.01%)
cycles in affected programs: 385795 -> 385079 (-0.19%)
helped: 42
HURT: 3
helped stats (abs) min: 2 max: 192 x̄: 24.57 x̃: 16
helped stats (rel) min: 0.04% max: 2.09% x̄: 0.33% x̃: 0.22%
HURT stats (abs) min: 64 max: 164 x̄: 105.33 x̃: 88
HURT stats (rel) min: 0.77% max: 1.58% x̄: 1.09% x̃: 0.93%
95% mean confidence interval for cycles value: -29.76 -2.06
95% mean confidence interval for cycles %-change: -0.40% -0.07%
Cycles are helped.
Ivy Bridge and Sandy Bridge had similar results. (Ivy Bridge shown)
total instructions in shared programs: 11875620 -> 11875617 (<.01%)
instructions in affected programs: 421 -> 418 (-0.71%)
helped: 2
HURT: 0
total cycles in shared programs: 178245336 -> 178245326 (<.01%)
cycles in affected programs: 3425 -> 3415 (-0.29%)
helped: 2
HURT: 0
No changes on Gen4 or Gen5.
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Ian Romanick
88eb8f190b
nir/algebraic: Simplify logic to detect sign of an integer
...
This occurs in a handful of places in the soft-fp64 code, and that is
the primary reason for the change.
v2: Fix a typo in a comment. Noticed by Matt. Copy the correct fp64
shader-db results to the commit message. I realized that I used
accidentally used the results from the next commit.
Results on the 308 shaders extracted from the fp64 portion of the OpenGL
CTS:
Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
total instructions in shared programs: 906235 -> 906149 (<.01%)
instructions in affected programs: 353966 -> 353880 (-0.02%)
helped: 31
HURT: 2
helped stats (abs) min: 1 max: 8 x̄: 3.03 x̃: 3
helped stats (rel) min: 0.01% max: 1.59% x̄: 0.10% x̃: 0.04%
HURT stats (abs) min: 3 max: 5 x̄: 4.00 x̃: 4
HURT stats (rel) min: 0.02% max: 0.02% x̄: 0.02% x̃: 0.02%
95% mean confidence interval for instructions value: -3.51 -1.70
95% mean confidence interval for instructions %-change: -0.19% <.01%
Inconclusive result (%-change mean confidence interval includes 0).
total cycles in shared programs: 7076552 -> 7076173 (<.01%)
cycles in affected programs: 2878361 -> 2877982 (-0.01%)
helped: 37
HURT: 2
helped stats (abs) min: 2 max: 48 x̄: 10.81 x̃: 6
helped stats (rel) min: <.01% max: 2.17% x̄: 0.47% x̃: 0.01%
HURT stats (abs) min: 1 max: 20 x̄: 10.50 x̃: 10
HURT stats (rel) min: <.01% max: 0.01% x̄: <.01% x̃: <.01%
95% mean confidence interval for cycles value: -13.96 -5.48
95% mean confidence interval for cycles %-change: -0.72% -0.16%
Cycles are helped.
total fills in shared programs: 2064 -> 2065 (0.05%)
fills in affected programs: 45 -> 46 (2.22%)
helped: 0
HURT: 1
Regular shader-db results:
All Gen7+ platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611530 -> 17611506 (<.01%)
instructions in affected programs: 5934 -> 5910 (-0.40%)
helped: 10
HURT: 0
helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 2
helped stats (rel) min: 0.14% max: 1.24% x̄: 0.47% x̃: 0.34%
95% mean confidence interval for instructions value: -3.53 -1.27
95% mean confidence interval for instructions %-change: -0.78% -0.17%
Instructions are helped.
total cycles in shared programs: 338419178 -> 338419218 (<.01%)
cycles in affected programs: 19244 -> 19284 (0.21%)
helped: 4
HURT: 2
helped stats (abs) min: 2 max: 4 x̄: 3.00 x̃: 3
helped stats (rel) min: 0.05% max: 0.11% x̄: 0.08% x̃: 0.08%
HURT stats (abs) min: 26 max: 26 x̄: 26.00 x̃: 26
HURT stats (rel) min: 1.20% max: 1.20% x̄: 1.20% x̃: 1.20%
95% mean confidence interval for cycles value: -9.08 22.41
95% mean confidence interval for cycles %-change: -0.35% 1.04%
Inconclusive result (value mean confidence interval includes 0).
No changes on any earlier Intel platform.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Reviewed-by: Matt Turner <mattst88@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4142 >
2020-03-18 20:36:29 +00:00
Caio Marcelo de Oliveira Filho
bf432cd831
nir: Add pass to combine adjacent scoped memory barriers
...
SPIR-V generates very granular barriers, however HW and backends might
not necessarily take advantage of those. This pass provides a general
mechanism to combine such barriers.
Reviewed-by: Eric Anholt <eric@anholt.net >
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224 >
2020-03-12 19:21:36 +00:00
Caio Marcelo de Oliveira Filho
d31a8ed8fd
nir: Reorder nir_scopes so wider scope has larger numeric value
...
Makes code comparing and combining scopes slightly more readable.
Reviewed-by: Eric Anholt <eric@anholt.net >
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224 >
2020-03-12 19:21:36 +00:00
Caio Marcelo de Oliveira Filho
67fc88fbb9
nir: Don't skip a bit in nir_memory_semantics
...
There was another enum entry in the draft versions of
nir_memory_semantics, but when it got dropped the entries were not
updated.
Reviewed-by: Eric Anholt <eric@anholt.net >
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3224 >
2020-03-12 19:21:36 +00:00
Juan A. Suarez Romero
90550b2a3e
nir/algebraic: coalesce fmod lowering
...
As fmod for 16/32/64 bits lowering does the same, let's merge all of
them in a single case.
Fixes dEQP-VK.glsl.builtin.precision_double.mod.compute.* on ACO.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4118 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4118 >
2020-03-12 16:42:52 +00:00
Juan A. Suarez Romero
acd0dd3b4b
nir/lower_double_ops: relax lower mod()
...
Currently when lowering mod() we add an extra instruction so if
mod(a,b) == b then 0 is returned instead of b, as mathematically
mod(a,b) is in the interval [0, b).
But Vulkan spec has relaxed this restriction, and allows the result to
be in the interval [0, b].
For the OpenGL case, while the spec does not allow this behaviour, due
the allowed precision errors we can end up having the same result, so
from a practical point of view, this behaviour is allowed (see
https://github.com/KhronosGroup/VK-GL-CTS/issues/51 ).
This commit takes this in account to remove the extra instruction
required to return 0 instead.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4118 >
2020-03-12 16:42:52 +00:00
Timur Kristóf
ec16535b49
nir: Add ability to lower non-const quad broadcasts to const ones.
...
Some hardware doesn't support subgroup shuffle, and on such hardware
it makes no sense to lower quad broadcasts to shuffle. Instead, let's
lower them to four const quad broadcasts, paired with bcsel instructions.
Cc: mesa-stable@lists.freedesktop.org
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4147 >
2020-03-12 13:16:07 +00:00
Rob Clark
3535797e8c
nir/print: show variable precision
...
Signed-off-by: Rob Clark <robdclark@chromium.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4071 >
2020-03-10 16:01:39 +00:00
Daniel Schürmann
ce87da71e9
nir: add pass to lower discard() to demote()
...
This pass is intended to work around game bugs, only!
It also lowers nir_intrinsic_load_helper_invocation to
nir_intrinsic_is_helper_invocation for consistency.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047 >
2020-03-09 12:29:32 +00:00
Daniel Schürmann
5adcfa68a9
nir: gather info whether a shader uses demote_to_helper
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4047 >
2020-03-09 12:29:32 +00:00
Samuel Pitoiset
913d2dcd23
nir/lower_input_attachments: remove bogus assert in try_lower_input_texop()
...
It can be a sampler too.
Fixes: 84b08971fb ("nir/lower_input_attachments: lower nir_texop_fragment_{mask}_fetch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2558
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4043 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4043 >
2020-03-06 09:13:40 +00:00
Jason Ekstrand
b20693be41
nir: Flush to zero with OOB low exponents in ldexp
...
Reviewed-by: Arcady Goldmints-Orlov <agoldmints@igalia.com >
Reviewed-by: Matt Turner <mattst88@gmail.com >
2020-03-04 11:39:50 -06:00
Icecream95
574b03eebf
nir: Allow nir_format conversions to work on 32-bit values
...
The constant has to changed to unsigned long long, as shifting a
32-bit value by 32 is undefined behaviour.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3943 >
2020-02-28 11:52:40 +00:00
Marek Olšák
6d7b076166
nir: fix 5 warnings
...
Reviewed-by: Eric Anholt <eric@anholt.net >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3970 >
2020-02-27 22:53:12 -05:00
Marek Olšák
d18d07c9d7
nir: replace GCC unroll with an option that works on GCC < 8.0
...
Reviewed-by: Eric Anholt <eric@anholt.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3970 >
2020-02-27 22:53:12 -05:00
Albert Astals Cid
d988061172
cube_face_index: Use fabsf instead of fabs since we know it's floats
...
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933 >
2020-02-26 21:47:01 +00:00
Albert Astals Cid
6db7467b59
cube_face_coord: Use fabsf instead of fabs since we know it's floats
...
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3933 >
2020-02-26 21:47:01 +00:00
Jason Ekstrand
349898a967
nir: Drop nir_tex_instr::texture_array_size
...
It's set by lots of things and we spend a lot of time maintaining it but
no one actually uses the value for anything useful.
Reviewed-by: Dave Airlie <airlied@redhat.com >
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3940 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3940 >
2020-02-26 18:29:49 +00:00
Juan A. Suarez Romero
784c454607
nir/lower_double_ops: add note for lowering mod
...
Add a note to clarify that while Vulkan allows mod(x,y) to be in [0, y]
range, OpenGL does not allow it, so the lowering ensures the result is
always in [0, y) range, as this lowering is shared by the Vulkan and
OpenGL implementation.
Reviewed-by: Elie Tournier <elie.tournier@collabora.com >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3315 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3315 >
2020-02-26 10:46:06 +00:00
Caio Marcelo de Oliveira Filho
956e4b2d37
nir, intel: Move use_scoped_memory_barrier to nir_options
...
This option will be used later by GLSL, so move to a common struct.
Because nir_options is filled in the compiler instead of the Vulkan
driver, fix that up. GLSL will ignore that for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913 >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913 >
2020-02-24 19:12:11 +00:00
Caio Marcelo de Oliveira Filho
6be766336a
nir/tests: Use nir_scoped_memory_barrier() helper
...
Most of the vars tests already had a local helper, so just drop it in
favor of the one in nir_builder. Remaining two tests changed to use
the helper.
The load_store_vectorizer tests were using the specific memory
barriers, but since scoped barriers are also handled, prefer that.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913 >
2020-02-24 19:12:11 +00:00
Caio Marcelo de Oliveira Filho
6ff898a653
nir: Add the alias NIR_MEMORY_ACQ_REL
...
This will help upcoming C++ code that will have to combine those two
semantics. In C++ it is not possible to do this without a cast or
adding an operator| to the enum. Since having the short form will
also be convient to C, we picked the former solution.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913 >
2020-02-24 19:12:11 +00:00
Caio Marcelo de Oliveira Filho
424737da3e
nir/builder: Add nir_scoped_memory_barrier()
...
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3913 >
2020-02-24 19:12:11 +00:00
Eric Anholt
3e16434acd
nir: Move intel's intrinsic_image_coordinate_components() to core nir.
...
This is a query that both Intel and freedreno need to do. We can simplify
it a lot with the new glsl_get_sampler_dim_coordinate_components()
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3728 >
2020-02-24 18:25:02 +00:00