Commit Graph

4672 Commits

Author SHA1 Message Date
Ian Romanick c0bdf37c91 nir/algebraic: Change the default cursor location when replacing a unary op
If the expression tree that is being replaced has a unary operation at
its root, set the cursor (location where new instructions are inserted)
at the source instruction instead.

This doesn't do much now because there are very few patterns that have a
unary operation as the root.  Almost all of the patterns that do have a
unary operation as the root have inot.  All of the shaders that are
affected by this commit have expression trees with an inot at the root.

This change prevents some significant, spurious caused by the next
commit.  There is further explanation in the large comment added in
the code.

I also considered a couple other options that may still be worth exploring.

1. Add some mark-up to the search pattern to denote where new
   instructions should be added.  I considered using "@" to denote the
   cursor location.  For example,

    (('fneg', ('fadd@', a, b)), ...)

2. To prevent other kinds of unintended code motion, add the ability to
   name expressions in the search pattern so that they can be reused in
   the replacement.  For example,

   (('bcsel', ('ige', ('find_lsb=b', a), 0), ('find_lsb', a), -1), b),

   An alternative would be to add some kind of CSE at the time of
   inserting the replacements.  Create a new instruction, then check to
   see if it already exists.  That option might be better overall.

Over the years I know Matt has heard me complain, "I added a pattern
that just deleted an instruction, but it added a bunch of spills!"  This
was always in large, complex shaders that are very hard to analyze.  I
always blamed these cases on the scheduler being dumb.  I am now very
suspicious that unintended code motion was the real problem.

All Gen4+ Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611405 -> 17611333 (<.01%)
instructions in affected programs: 18613 -> 18541 (-0.39%)
helped: 41
HURT: 13
helped stats (abs) min: 1 max: 18 x̄: 4.46 x̃: 4
helped stats (rel) min: 0.27% max: 5.68% x̄: 1.29% x̃: 1.34%
HURT stats (abs)   min: 1 max: 20 x̄: 8.54 x̃: 7
HURT stats (rel)   min: 0.30% max: 4.20% x̄: 2.15% x̃: 2.38%
95% mean confidence interval for instructions value: -3.29 0.63
95% mean confidence interval for instructions %-change: -0.95% 0.02%
Inconclusive result (value mean confidence interval includes 0).

total cycles in shared programs: 338366118 -> 338365223 (<.01%)
cycles in affected programs: 257889 -> 256994 (-0.35%)
helped: 42
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 39.38 x̃: 34
helped stats (rel) min: 0.04% max: 2.55% x̄: 0.86% x̃: 0.76%
HURT stats (abs)   min: 6 max: 204 x̄: 50.60 x̃: 34
HURT stats (rel)   min: 0.11% max: 4.75% x̄: 1.12% x̃: 0.56%
95% mean confidence interval for cycles value: -30.39 -1.02
95% mean confidence interval for cycles %-change: -0.66% -0.02%
Cycles are helped.

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
2020-04-01 00:28:38 +00:00
Timothy Arceri 0f4a81430e nir: fix crash in varying packing on interface mismatch
For example when the outputs are scalars but the inputs are struct
members.

Fixes: 26aa460940 ("nir: rewrite varying component packing")

Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4351>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4351>
2020-03-31 23:43:31 +00:00
Jason Ekstrand 7a53e67816 spirv: Implement OpCopyObject and OpCopyLogical as blind copies
Because the types etc. are required to logically match, we can just
copy-propagate the guts of the vtn_value.  This was causing issues with
some new CTS tests that are doing an OpCopyObject of a sampler which is
a special-cased type in spirv_to_nir.  Of course, this is only a partial
solution.  Ideally, we've got a bit of work to do to make all the
composite stuff able to handle all types including images, sampler, and
combined image/samplers but this gets some CTS tests passing.

Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Acked-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4375>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4375>
2020-03-31 17:55:30 +00:00
Jason Ekstrand 9468f0729b nir: Handle vec8/16 in nir_shrink_array_vars
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand c26bf848ba nir: Handle vec8/16 in opt_undef_vecN
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand 99540edfde nir: Treat vec8/16 as select in opt_peephole_select
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand e3554a293b nir: Handle vec8/16 in opt_split_alu_of_phi
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand 2aab7999e4 nir: Handle vec8/16 in lower_regs_to_ssa
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand 1033255952 nir: Handle vec8/16 in lower_phis_to_scalar
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand ac7a940eba nir: Handle vec8/16 in gather_ssa_types
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand a18c4ee7b0 nir: Handle vec8/16 in bool_to_bitsize
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand f5bbdf7621 nir: Copy propagate through vec8s and vec16s
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand 842338e2f0 nir: Add a nir_op_is_vec helper
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand 84ab61160a nir/algebraic: Add downcast-of-pack opts
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand 14a49f31d3 nir/lower_int64: Lower 8 and 16-bit downcasts with nir_lower_mov64
We have the code to do the lowering, we were just missing the
boilerplate bits to make should_lower_int64_alu_instr return true.

Fixes: 62d55f1281 "nir: Wire up int64 lowering functions"
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4365>
2020-03-31 00:18:05 +00:00
Jason Ekstrand b113170559 nir/opt_loop_unroll: Fix has_nested_loop handling
In 87839680c0, a very subtle mistake was made with the CFG walking
recursion.  Instead of setting the local has_nested_loop variable when
process child loops, has_nested_loop_out was passed directly into the
process_loop_in_block call.  This broke nested loop detection heuristics
and caused loop unrolling to run massively out of control.  In
particular, it makes the following CTS test compile virtually forever:

dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom

Fixes: 87839680c0 "nir: Fix breakage of foreach_list_typed_safe..."
Closes: #2710
Reviewed-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4380>
2020-03-30 22:20:47 +00:00
Rhys Perry d101ca3f5a glsl: fix race in instance getters
Insertions can modify entry->data. Seems to fix random Fossilize crashes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Eric Anholt <eric@anholt.net>
CC: <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4335>
2020-03-30 20:17:43 +00:00
Jason Ekstrand f5b14d983e nir: Set UBO alignments in lower_uniforms_to_ubo
Fixes: fb64954d9d "nir: Validate that memory load/store ops work on..."
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4378>
2020-03-30 19:18:17 +00:00
Jason Ekstrand fb64954d9d nir: Validate that memory load/store ops work on whole bytes
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
2020-03-30 15:46:19 +00:00
Jason Ekstrand c217ee8d35 nir: Insert b2b1s around booleans in nir_lower_to
By inserting a b2b1 around the load_ubo, load_input, etc. intrinsics
generated by nir_lower_io, we can ensure that the intrinsic has the
correct destination bit size.  Not having the right size can mess up
passes which try to optimize access.  In particular, it was causing
brw_nir_analyze_ubo_ranges to ignore load_ubo of booleans which meant
that booleans uniforms weren't getting pushed as push constants.  I
don't think this is an actual functional bug anywhere hence no CC to
stable but it may improve perf somewhere.

Shader-db results on ICL with iris:

    total instructions in shared programs: 16076707 -> 16075246 (<.01%)
    instructions in affected programs: 129034 -> 127573 (-1.13%)
    helped: 487
    HURT: 0
    helped stats (abs) min: 3 max: 3 x̄: 3.00 x̃: 3
    helped stats (rel) min: 0.45% max: 3.00% x̄: 1.33% x̃: 1.36%
    95% mean confidence interval for instructions value: -3.00 -3.00
    95% mean confidence interval for instructions %-change: -1.37% -1.29%
    Instructions are helped.

    total cycles in shared programs: 338015639 -> 337983311 (<.01%)
    cycles in affected programs: 971986 -> 939658 (-3.33%)
    helped: 362
    HURT: 110
    helped stats (abs) min: 1 max: 1664 x̄: 97.37 x̃: 43
    helped stats (rel) min: 0.03% max: 36.22% x̄: 5.58% x̃: 2.60%
    HURT stats (abs)   min: 1 max: 554 x̄: 26.55 x̃: 18
    HURT stats (rel)   min: 0.03% max: 10.99% x̄: 1.04% x̃: 0.96%
    95% mean confidence interval for cycles value: -79.97 -57.01
    95% mean confidence interval for cycles %-change: -4.60% -3.47%
    Cycles are helped.

    total sends in shared programs: 815037 -> 814550 (-0.06%)
    sends in affected programs: 5701 -> 5214 (-8.54%)
    helped: 487
    HURT: 0

    LOST:   2
    GAINED: 0

The two lost programs were SIMD16 shaders in CS:GO.  However, CS:GO was
also one of the most helped programs where it shaves sends off of 134
programs.  This seems to reduce GPU core clocks by about 4% on the first
1000 frames of the PTS benchmark.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
2020-03-30 15:46:19 +00:00
Jason Ekstrand d2dfcee7f7 nir: Use b2b opcodes for shared and constant memory
No shader-db changes on ICL with iris

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
2020-03-30 15:46:19 +00:00
Jason Ekstrand b2db84153a nir: Add b2b opcodes
These exist to convert between different types of boolean values.  In
particular, we want to use these for uniform and shared memory
operations where we need to convert to a reasonably sized boolean but we
don't care what its format is so we don't want to make the back-end
insert an actual i2b/b2i.  In the case of uniforms, Mesa can tweak the
format of the uniform boolean to whatever the driver wants.  In the case
of shared, every value in a shared variable comes from the shader so
it's already in the right boolean format.

The new boolean conversion opcodes get replaced with mov in
lower_bool_to_int/float32 so the back-end will hopefully never see them.
However, while we're in the middle of optimizing our NIR, they let us
have sensible load_uniform/ubo intrinsics and also have the bit size
conversion.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4338>
2020-03-30 15:46:19 +00:00
Samuel Pitoiset 3935a729d9 nir/algebraic: add fexp2(fmul(flog2(a), 0.5) -> fsqrt(a) optimization
Helps some Wolfenstein II and Wolfenstein Youngblood shaders.

pipeline-db (VEGA10/ACO):
Totals from affected shaders:
SGPRS: 17904 -> 17904 (0.00 %)
VGPRS: 14492 -> 14492 (0.00 %)
Spilled SGPRs: 20 -> 20 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 1753152 -> 1749708 (-0.20 %) bytes
Max Waves: 2581 -> 2581 (0.00 %)

pipeline-db (VEGA10/LLVM):
Totals from affected shaders:
SGPRS: 26656 -> 26656 (0.00 %)
VGPRS: 23780 -> 23780 (0.00 %)
Spilled SGPRs: 2112 -> 2112 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Code Size: 2552712 -> 2549236 (-0.14 %) bytes
Max Waves: 3359 -> 3359 (0.00 %)

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4353>
2020-03-30 14:07:43 +00:00
Timur Kristóf f1dd81ae10 nir: Collect if shader uses cross-invocation or indirect I/O.
The following new fields are added to tess shader info:

* `tcs_cross_invocation_inputs_read`
* `tcs_cross_invocation_outputs_read`

These are I/O masks that are a subset of inputs_read and outputs_read
and they contain which per-vertex inputs and outputs are read
cross-invocation.

Additionall, the following new fields are added to shader_info:

* `inputs_read_indirectly`
* `outputs_accessed_indirectly`
* `patch_inputs_read_indirectly`
* `patch_outputs_accessed_indirectly`

These new fields can be used for optimizing TCS in a back-end compiler.
If you can be sure that the TCS doesn't use cross-invocation inputs
or outputs, you can choose a different strategy for storing VS and TCS
outputs. However, such optimizations might need to be disabled when
the inputs/outputs are accessed indirectly due to backend limitations,
so this information is also collected.

Example: RADV currently has to store all VS and TCS outputs in LDS, but
for shaders when only inputs and/or outputs belonging to the current
invocation ID are used, it could skip storing these in LDS entirely.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4165>
2020-03-30 13:09:08 +00:00
Danylo Piliaiev 87839680c0 nir: Fix breakage of foreach_list_typed_safe assumptions in loop unrolling
foreach_list_typed_safe works with assumption that even if current node
becomes invalid, the next will be still valid.

However process_loops broke this assumption, because during iteration
when immediate child is unrolled - not only current node could be removed
but also the one after it.

This doesn't cause issues now but it will cause issues when undefined
behaviour in foreach* macros is fixed.

Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4189>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4189>
2020-03-30 14:41:30 +03:00
Eric Engestrom 79af30768d meson: inline inc_common
Let's make it clear what includes are being added everywhere, so that
they can be cleaned up.

Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4360>
2020-03-28 21:36:54 +01:00
Marek Olšák e5339fe4a4 Move compiler.h and imports.h/c from src/mesa/main into src/util
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4324>
2020-03-27 21:00:09 +00:00
Timothy Arceri b5e00f5c2b nir: fix packing of TCS varyings not read by the TES
Unlike other stages TCS outputs not read by the TES cannot always
be demoted to globals e.g. when they are read by other TCS
invocations.

We were not taking these outputs into account when packing which
could result in other outputs being assigned to the same location.

Here we make sure to gather information on these outputs and group
them together when packing.

This fixes rendering issues in QUBE 2 via Proton.

Closes: #2653
Fixes: 26aa460940 ("nir: rewrite varying component packing")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328>
2020-03-27 07:26:39 +00:00
Timothy Arceri 8b9ebbcb54 glsl: fix varying packing for 64bit integers
Without this we can incorrectly end up marking things as making
use of ARB_enhanced_layouts style packing.

Cc: 19.3 20.0 <mesa-stable@lists.freedesktop.org>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4328>
2020-03-27 07:26:39 +00:00
Tapani Pälli 0847fe6e7f glsl: set error_emitted true if type not ok for assignment
Patch changes also existing assert to not trigger when we have
error types in assignment.

v2: simplify, cleanup (Ian)

Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2629
Fixes: d1fa69ed61 ("glsl: do not attempt assignment if operand type not parsed correctly")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4178>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4178>
2020-03-26 12:41:12 +00:00
Boris Brezillon efdce97e4b vtn/opencl: add rint-support
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318>
2020-03-26 10:14:22 +00:00
Erik Faye-Lund 6d69ed88f8 vtn/opencl: add native exp2/log2-support
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318>
2020-03-26 10:14:22 +00:00
Erik Faye-Lund 7b2bfb6bc4 vtn/opencl: add native exp10/log10-support
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318>
2020-03-26 10:14:22 +00:00
Erik Faye-Lund 25cb87bcdd vtn/opencl: add native exp/log-support
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318>
2020-03-26 10:14:22 +00:00
Erik Faye-Lund c98e745e78 compiler/nir: move build_log helper into builtin-builder
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318>
2020-03-26 10:14:22 +00:00
Erik Faye-Lund f59ae68838 compiler/nir: move build_exp helper into builtin-builder
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318>
2020-03-26 10:14:22 +00:00
Erik Faye-Lund 4821ec6d8f vtn/opencl: fully enable OpenCLstd_Clz
Fixes: 7325f6ac98 ("vtn/opencl: add clz support")
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4318>
2020-03-26 10:14:22 +00:00
Marek Olšák 85a723975b nir: add and gather shader_info::writes_memory
for out-of-order drawing.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4152>
2020-03-26 03:08:34 -04:00
Pierre-Eric Pelloux-Prayer 84da4ded4b nir: update uses_demote flag in discard_to_demote pass
Otherwise the ctx.ac.postponed_kill will not be allocated.

Fixes: ce87da71e9 ("nir: add pass to lower discard() to demote()")
Closes: https://gitlab.freedesktop.org/mesa/mesa/issues/2662
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4301>
2020-03-25 08:19:33 +01:00
Neil Roberts fc8432e6d6 glsl/lower_precision: Lower builtins depending on arguments
When an ir_call is encountered that invokes a builtin, it will now try
to generate a lowered version of the builtin. This only happens if all
of the arguments to the function are lowerable. Previously the builtin
would be inlined before the lowering pass is invoked and then the
implementation would be lowered as a consequence of the pass. However
this causes problems if the builtin has multiple arguments and the
implementation has operations on only a few of the arguments before
combining it with the others. In that case the entire builtin should
only be lowered if all of the arguments are lower precision. The
previous approach would end up lowering only parts of the
implementation.

The lowered implementations are cached in a hash table in case they can
be reused.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Neil Roberts e7434c0a06 glsl: Inline builtins in a separate pass
Previously, the ir_call functions for builtin functions were replaced
with the inline implementation immediately after being added to the
instruction list. This patch replaces that with a separate pass that
lowers them after the conversion from AST to IR is complete. This will
be useful to be able to insert some handling for the precision lowering
pass before the inlining. This needs to happen because the precision
of the operations in the inlined implementation depends on the highest
precision of all of the arguments to the call.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Iago Toral Quiroga 467c9a0faa nir: add a bool bitsize lowering pass
The pass lowers 1-bit booleans produced by NIR to the native bitsize
of the operations that produce them.

v2: change on lower_load_const_instr after upstream changes. Added
    TODO2 to explain it, as it was not properly tested yet (see
    already existing TODO) (Neil)

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Neil Roberts cc09745714 glsl: Add unit tests for the lower_precision pass
Adds a unit tests script that invokes the standalone compiler with
--lower-precision and verifies that lowered operations are being used.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Neil Roberts 32cd3bd850 glsl/standalone: Add an option to lower the precision
Adds a --lower-precision option that just sets the LowerPrecision
compiler option. That way it can be used in unit tests to test the
precision lowering pass.

Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Neil Roberts b83f4b9fa2 glsl: Add an IR lowering pass to convert mediump operations to 16-bit
This works by finding the first rvalue that it can lower using an
ir_rvalue_visitor. In that case it adds a conversion to float16
after each rvalue and a conversion back to float before storing
the assignment.

Also it uses a set to keep track of rvalues that have been
lowred already. The handle_rvalue method of the rvalue visitor doesn’t
provide any way to stop iteration. If we handle a value in
find_precision_visitor we want to be able to stop it from descending into
the lowered rvalue again.

Additionally this pass disallows converting nodes containing non-float.
The can_lower_rvalue function explicitly excludes any branches
that have non-float types except bools. This avoids the need to have
special handling for functions that convert to int or double.

Co-authored-by: Hyunjun Ko <zzoon@igalia.com>

v2. Adds lowering for texture samples

v3. Instead of checking whether each node can be lowered while walking the
tree, a separate tree walk is now done to check all of the nodes in a
single pass. The lowerable nodes are added to a set which is checked
during find_precision_visitor instead of calling can_lower_rvalue.

v4. Move the special case for temporaries to find_lowerable_rvalues. This
needs to be handled while checking for lowerable rvalues so that any
later dereferences of the variable will see the right precision.

v5. Add an override to visit ir_call instructions and apply the same
technique to override the precision of the temporary variable in the
same way as done for builtin temporaries and ir_assignment calls.

v6. Changes the pass so that it doesn’t need to lower an entire subtree in
order do perform a lowering. Instead, certain instructions can be
marked as being indepedent of their child instructions. For example,
this is the case with array dereferences. The precision of the array
index doesn’t have any bearing on whether things using the result of
the array deref can be lowered.

Now, only toplevel lowerable nodes are added to the lowerable_rvalues
instead instead of additionally adding all of the subnodes.

It now also only needs one hash table instead of two.

v7. Don’t try to lower sampler types. Instead, the sample instruction is
now treated as an independent point where the result of the sample can
be used in a lowered section. The precision of the sampler type
determines the precision of the sample instruction. This also means
the coordinates to the sampler can be lowered.

v8. Use f2fmp instead of f2f16.

v9.  Disable lowering derivatives calcualtions, which might not work
properly on some hw backends.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Neil Roberts c525785edc glsl/hierarchical_visitor: Call leave_callback on leaf nodes
Previously for leaf ir_instructions only the enter callback was
called. This makes it a bit difficult to make a pass that wants to
visit every instruction using a stack. Making it call the leave
callback as well makes it behave less surprisingly.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Neil Roberts 0e1680a1e2 glsl: Add a method to get precision from a deref instruction
Adds ir_dereference::precision(). For a normal variable dereference,
the precision comes from the variable. For a record member it comes
from the field within the record. For an array it can come from
either, depending on where the underlying array is stored. The method
recursively walks the derefs until it finds one of the first two.

Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3885>
2020-03-24 23:21:21 +00:00
Rhys Perry 9f4ba2d2b4 nir/gather_info: fix per-vertex handling in try_mask_partial_io
pipeline-db (Navi, ACO):
Totals from affected shaders:
SGPRS: 6432 -> 6432 (0.00 %)
VGPRS: 11924 -> 11924 (0.00 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Scratch size: 1596 -> 1596 (0.00 %) dwords per thread
Code Size: 575524 -> 518620 (-9.89 %) bytes
LDS: 12187 -> 12187 (0.00 %) blocks
Max Waves: 2695 -> 2695 (0.00 %)

Helps a few hundred Dark Souls 3 shaders.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4190>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4190>
2020-03-24 11:09:15 +00:00
Eric Anholt 050ec8ff53 glsl: Restore the IsES flag on the shader when reading from cache.
I found that when trying to MESA_SHADER_CAPTURE_PATH a trace, I was
getting "GLSL >= 3.00" for the ES shaders I was trying to capture.
Keeping this metadata in the cached shader program lets us capture
correctly.

Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4219>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4219>
2020-03-22 20:49:37 -07:00
Rhys Perry 5193688e1a nir/gather_info: handle emit_vertex_with_counter
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-By: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: <mesa-stable@lists.freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4193>
2020-03-19 15:37:07 +00:00