Alyssa Rosenzweig
3d35ea6a6b
mesa_clc: add depfile support
...
This allows the tool to tell ninja what headers it read, so ninja can
correctly rebuild when necessary.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505 >
2024-12-06 13:48:26 -05:00
Dylan Baker
33a1acb0da
clc: Tell clang to track imported dependencies
...
Clang is capable of tacking what headers it imports, as long as we set
it up to do that. While that isn't important for rusticl, it would be
useful for the various `_clc` tools, as they can then tell Ninja which
headers they read to make rebuilds more reliable.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505 >
2024-12-06 13:48:26 -05:00
Karmjit Mahil
047049dcb5
nir: Fix the spelling of compare
...
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189 >
2024-12-06 08:42:36 +00:00
Karmjit Mahil
b79994e92d
nir,ir3: Add icsel_eqz
...
In IR3 `sel.b32` works based on the 0 so add `icsel_eqz` to fuse the
cmp and sel that we'd otherwise need.
total Instruction Count in shared programs: 1112814 -> 1110473 (-0.21%)
Instruction Count in affected programs: 162701 -> 160360 (-1.44%)
helped: 81
HURT: 29
Instruction count are helped.
total MOV Count in shared programs: 86777 -> 88671 (2.18%)
MOV Count in affected programs: 28119 -> 30013 (6.74%)
helped: 1
HURT: 292
Mov count are HURT.
total COV Count in shared programs: 15070 -> 14962 (-0.72%)
COV Count in affected programs: 5770 -> 5662 (-1.87%)
helped: 76
HURT: 2
Cov count are helped.
total Last helper instruction in shared programs: 592729 -> 590638 (-0.35%)
Last helper instruction in affected programs: 91331 -> 89240 (-2.29%)
helped: 30
HURT: 1
Last helper instruction are helped.
total Instructions with SS sync bit in shared programs: 29336 -> 29546 (0.72%)
Instructions with SS sync bit in affected programs: 4702 -> 4912 (4.47%)
helped: 8
HURT: 43
Instructions with ss sync bit are HURT.
total Estimated cycles stalled on SS in shared programs: 111590 -> 112401 (0.73%)
Estimated cycles stalled on SS in affected programs: 27708 -> 28519 (2.93%)
helped: 21
HURT: 61
Estimated cycles stalled on ss are HURT.
total cat1 instructions in shared programs: 101933 -> 103695 (1.73%)
cat1 instructions in affected programs: 35804 -> 37566 (4.92%)
helped: 18
HURT: 290
Cat1 instructions are HURT.
total cat2 instructions in shared programs: 380299 -> 377499 (-0.74%)
cat2 instructions in affected programs: 128609 -> 125809 (-2.18%)
helped: 322
HURT: 0
Cat2 instructions are helped.
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189 >
2024-12-06 08:42:36 +00:00
Karmjit Mahil
aad0aa0a9c
nir/algebraic: turn u{ge,lt} a, 1 to i{ne,eq} a, 0
...
Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189 >
2024-12-06 08:42:36 +00:00
Ian Romanick
e1bb53bb3c
nir/algebraic: Optimize some trivial bfi
...
In fossil-db, one big compute shader on Hogwarts Legacy is helped for
spills and fills. It has a lot of instances of bfi(0x3f, a, a).
On Tiger Lake and Skylake, a compute shader in Unicom that has a
single instance of this pattern is hurt for spills and fills. I think
this is just due to non-determinism in the register allocation
algorithm.
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16992643 -> 16992548 (<.01%)
instructions in affected programs: 17533 -> 17438 (-0.54%)
helped: 33 / HURT: 0
total cycles in shared programs: 914313986 -> 914316238 (<.01%)
cycles in affected programs: 3734544 -> 3736796 (0.06%)
helped: 26 / HURT: 6
fossil-db:
Lunar Lake, Meteor Lake, DG2, and Ice Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 208952780 -> 208952537 (-0.00%)
Send messages: 10934879 -> 10934875 (-0.00%)
Cycle count: 30988230904 -> 30988228660 (-0.00%); split: -0.00%, +0.00%
Spill count: 534864 -> 534843 (-0.00%)
Fill count: 667081 -> 667068 (-0.00%)
Max live registers: 65686656 -> 65686624 (-0.00%)
Non SSA regs after NIR: 244185358 -> 244185335 (-0.00%)
Totals from 3 (0.00% of 704834) affected shaders:
Instrs: 4708 -> 4465 (-5.16%)
Send messages: 234 -> 230 (-1.71%)
Cycle count: 264382 -> 262138 (-0.85%); split: -0.88%, +0.03%
Spill count: 91 -> 70 (-23.08%)
Fill count: 73 -> 60 (-17.81%)
Max live registers: 647 -> 615 (-4.95%)
Non SSA regs after NIR: 3957 -> 3934 (-0.58%)
Tiger Lake
Totals:
Instrs: 230516919 -> 230515185 (-0.00%); split: -0.00%, +0.00%
Send messages: 12657684 -> 12657680 (-0.00%)
Cycle count: 23060318600 -> 23060279758 (-0.00%); split: -0.00%, +0.00%
Spill count: 548462 -> 548446 (-0.00%); split: -0.00%, +0.00%
Fill count: 582304 -> 582294 (-0.00%); split: -0.00%, +0.00%
Scratch Memory Size: 19538944 -> 19539968 (+0.01%)
Max live registers: 41713622 -> 41713593 (-0.00%)
Non SSA regs after NIR: 260667939 -> 260667712 (-0.00%); split: -0.00%, +0.00%
Totals from 174 (0.02% of 794323) affected shaders:
Instrs: 158346 -> 156612 (-1.10%); split: -1.13%, +0.04%
Send messages: 14330 -> 14326 (-0.03%)
Cycle count: 24859875 -> 24821033 (-0.16%); split: -0.32%, +0.16%
Spill count: 183 -> 167 (-8.74%); split: -9.29%, +0.55%
Fill count: 284 -> 274 (-3.52%); split: -7.39%, +3.87%
Scratch Memory Size: 9216 -> 10240 (+11.11%)
Max live registers: 12587 -> 12558 (-0.23%)
Non SSA regs after NIR: 164466 -> 164239 (-0.14%); split: -0.16%, +0.02%
Skylake
Totals:
Instrs: 158904982 -> 158903764 (-0.00%)
Send messages: 8490500 -> 8490496 (-0.00%)
Cycle count: 19732284279 -> 19732345496 (+0.00%); split: -0.00%, +0.00%
Spill count: 519127 -> 519115 (-0.00%)
Fill count: 594283 -> 594290 (+0.00%); split: -0.00%, +0.00%
Max live registers: 33708764 -> 33708739 (-0.00%)
Non SSA regs after NIR: 169377234 -> 169377007 (-0.00%); split: -0.00%, +0.00%
Totals from 174 (0.03% of 648725) affected shaders:
Instrs: 160391 -> 159173 (-0.76%)
Send messages: 14354 -> 14350 (-0.03%)
Cycle count: 24776486 -> 24837703 (+0.25%); split: -0.07%, +0.32%
Spill count: 332 -> 320 (-3.61%)
Fill count: 587 -> 594 (+1.19%); split: -0.17%, +1.36%
Max live registers: 12709 -> 12684 (-0.20%)
Non SSA regs after NIR: 166557 -> 166330 (-0.14%); split: -0.16%, +0.02%
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32493 >
2024-12-05 21:39:07 +00:00
Alyssa Rosenzweig
ca9bf43d0b
nir,asahi: make argument alignment configurable
...
this is more flexible. Mali needs 32-bit alignment, for example.
I added an option struct in case we need to make this a callback or something
later.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
0d77e91ca3
nir/opt_load_store_vectorize: match amul like imul
...
for AGX, we preserve amul all the way until fusing address modes in order to be
able to fuse effectively. so the load/store vectorizer wouldn't vectorize before
fusing.
however, after fusing we get fused intrinsics which are tricky to teach the
vectorizer about as their semantics are pretty subtle. so we can't vectorize
after, either.
the easiest solution is to teach the vectorize about amul, which can always be
replaced by imul for our pattern matches.
this fixes certain cases of vectorization in OpenCL kernels on asahi.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
77d4ed0a01
nir/opt_algebraic: optimize sign bit manipulation
...
libclc loves to generate the iand(0x7fffffff) pattern. ior/ixor patterns are
added for completeness.
Shaves 4 instructions off libclc vec4 normalize.
v2: Loop over the bit sizes (Georg).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Marek Olšák <marek.olsak@amd.com > [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig
be049e1c14
nir/search_helpers: handle bcsel in is_only_used_as_float
...
this lets algebraic see through chains of instructions.
v2: Limit recursion depth (Georg).
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Marek Olšák <marek.olsak@amd.com > [v1]
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398 >
2024-12-05 10:58:51 +00:00
Boris Brezillon
98e3c1e6fb
nir: Let nir_lower_texcoord_replace_late() report progress
...
Useful if we want to wrap this pass with a NIR_PASS() to enforce
validation.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com >
Reviewed-by: Chia-I Wu <olvaffe@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32480 >
2024-12-05 08:49:45 +00:00
Kai Wasserbäch
8a453669e2
fix(FTBFS): clc/clover: pass a VFS instance explicitly
...
This just replicates what upstream did before breaking mesa with commit
df9a14d7bbf and requiring a VFS instance.
Reported-by: @Lone_Wolf
Reference: <df9a14d7bb >
Closes: <https://gitlab.freedesktop.org/mesa/mesa/-/issues/12223 >
Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org >
Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de >
Reviewed-by: Karol Herbst <kherbst@redhat.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32439 >
2024-12-04 19:55:56 +00:00
Marek Olšák
3effa3d53b
nir/lower_io_passes: lower indirect IO for TCS
...
nir_lower_io_to_temporaries doesn't do anything and gives up when it gets TCS.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
f5a0cde125
nir/opt_varyings: fix compile failures in the disabled PRINT code
...
linkage is a pointer, but it was used as a structure.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
dd788d0a7f
nir/opt_varyings: remove rare dead output stores after inter-shader code motion
...
Backward inter-shader code motion left dead output stores in the producer
in rare cases. Those dead stores would then make their way into drivers
and hw.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
f0c4e71d58
nir/opt_varyings: fix getting deref variables for sysvals
...
This might fix array system values. Noticed by luck.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
dcc679ab3a
nir/opt_varyings: add inter-shader code motion for uniform/UBO indexing
...
If input_value, index, index1 or index2 is an input, here are examples of
code that this commit moves from consumers to producers:
* input_value * uniform_array[index]
* uniform_array[index]
* ubo[0].array[index]
* ubo[index].var
* ubo[index1].array[index2]
If the array index is computed from an input, it must be flat or convergent
within a primitive to be moved. If the array index is not an input, it must
be a uniform expression.
dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment
has UBO indexing that is moved to the producer by this.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
f52ae35d73
nir/opt_varyings: propagate indirect uniform/UBO loads into the next shader
...
Uniform and UBO loads with non-constant indices are now propagated.
The majority of this code implements cloning deref chains.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
c0de78f120
nir/opt_varyings: change try_move_postdominator param to nir_instr type
...
We want more instructions to be movable, like
load_deref(var, index = load_input).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
8e39e8ed4d
nir/opt_varyings: make top-level compaction code for TES, TCS, GS separate
...
Add a separate "if" block for each and use a helper for repeated code.
There will be more code added here that keeping TES, TCS, and GS compaction
code unified would be a mess.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
d20e07dbad
nir/opt_varyings: fix max_slot for color varying compaction
...
It should be in units of slots. This was unlikely to break anything.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
69b1853ecf
nir/opt_varyings: count the number of unused components for compaction correctly
...
Holes due to indirectly-indexed inputs were ignored, making the compaction
worse when such inputs were present alongside convergent inputs.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
1aa9fec542
nir/opt_varyings: fix compaction with sparse indirect FS inputs
...
Without this, compaction can put inputs into vec4 slots already occupied
by indirectly-accessed inputs while ignoring their interpolation qualifier,
which is incorrect.
All input components sharing the same vec4 slot must use interpolation
qualifiers that are compatible with each other.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
b01f3cea7a
nir/opt_varyings: remove redundant conditions from a while loop
...
Most of these conditions are repeated below with a continue statement.
This just puts break at the end where all of them are false.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Marek Olšák
a618a2aa8b
nir/linking_helpers: don't promote interpolated varyings to flat
...
Even the most flexible interpolation that we have in NIR options
(nir_io_has_flexible_input_interpolation_except_flat) doesn't allow
mixing flat and non-flat in the same vec4. This (legacy) optimization
can't promote interpolated inputs to flat if it doesn't consider
the interpolation mode of the whole vec4 slot.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424 >
2024-12-04 13:40:41 +00:00
Timothy Arceri
fcebbfc399
glsl: drop unused array refcount code and tests
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32450 >
2024-12-04 11:50:57 +00:00
Georg Lehmann
34a47e4b14
nir/opt_algebraic: mark a - ffract(a) as nan incorrect.
...
Inf + fract(Inf) -> Inf + NaN -> NaN
floor(Inf) -> Inf
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393 >
2024-12-03 14:42:18 +00:00
Georg Lehmann
2ee96cf514
nir/opt_algebraic: optimize d3d9 ceil
...
No Foz-DB changes.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393 >
2024-12-03 14:42:18 +00:00
Georg Lehmann
34caed8adb
nir/opt_algebraic: optimize d3d9 ftrunc
...
Foz-DB Navi21:
Totals from 85 (0.11% of 79395) affected shaders:
MaxWaves: 1972 -> 1968 (-0.20%)
Instrs: 48682 -> 47067 (-3.32%)
CodeSize: 255664 -> 247172 (-3.32%)
VGPRs: 3752 -> 3768 (+0.43%)
Latency: 154414 -> 150360 (-2.63%)
InvThroughput: 37186 -> 35081 (-5.66%)
VClause: 847 -> 865 (+2.13%); split: -0.24%, +2.36%
SClause: 768 -> 796 (+3.65%)
Copies: 2763 -> 2869 (+3.84%); split: -0.14%, +3.98%
VALU: 28133 -> 26781 (-4.81%)
SALU: 7182 -> 6939 (-3.38%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393 >
2024-12-03 14:42:18 +00:00
Georg Lehmann
ea4aa8e5a6
nir/opt_algebraic: optimize ffma(b2f, b2f, c)
...
Foz-DB Navi21:
Totals from 134 (0.17% of 79395) affected shaders:
Instrs: 153297 -> 153326 (+0.02%); split: -0.03%, +0.05%
CodeSize: 829520 -> 828444 (-0.13%); split: -0.13%, +0.00%
Latency: 900489 -> 899964 (-0.06%); split: -0.07%, +0.01%
InvThroughput: 267838 -> 267478 (-0.13%); split: -0.14%, +0.00%
VClause: 2452 -> 2454 (+0.08%)
Copies: 8331 -> 8353 (+0.26%); split: -0.25%, +0.52%
PreSGPRs: 4974 -> 4964 (-0.20%)
PreVGPRs: 6209 -> 6218 (+0.14%)
VALU: 112317 -> 112092 (-0.20%); split: -0.21%, +0.01%
SALU: 12451 -> 12694 (+1.95%)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393 >
2024-12-03 14:42:18 +00:00
Timothy Arceri
fd431a5b71
glsl: drop unused ir_equals.cpp
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32448 >
2024-12-03 02:46:39 +00:00
Kenneth Graunke
5712fc48a9
nir: Allow large overfetching holes in the load store vectorizer
...
The load_*_uniform_block_intel intrinsics always load either 8x or 16x
32-bit components worth of data (so 32 byte increments). This leads to
cases where we load a few components from one vec8, followed by a few
components of an adjacent vec8. We want to combine those into a vec16
load, as that loads a whole cacheline at a time, and requires less hoops
to calculate addresses and request memory loads.
So, we allow 7 * 4 = 28 bytes of holes, which handles vec8+vec8 where
only the .x component is read.
Most drivers and intrinsics will not want such large holes. I thought
about adding a per-intrinsic max_hole to the core code, but decided that
since we already have driver callbacks, we can just rely on them to
reject what makes sense to them.
No driver callbacks currently allow holes, so this should not currently
affect any drivers. But any work in progress branches may need to be
updated to reject larger holes.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315 >
2024-12-03 02:02:33 +00:00
Marek Olšák
8752401e03
nir/algebraic: optimize (a & b) | (a | c) => a | c, (a & b) & (a | c) => a & b
...
No change in shader-db with ACO, but it doesn't seem to be optimized by
any other patterns.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449 >
2024-12-03 01:24:27 +00:00
Marek Olšák
3670d42c74
nir/algebraic: optimize (a | b) | (a | c) ==> (a | b) | c
...
shader-db with ACO:
3 shaders have -0.11% average decrease in the code size
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449 >
2024-12-03 01:24:27 +00:00
Marek Olšák
978ad93375
nir/algebraic: optimize (a & b) & (a & c) ==> (a & b) & c
...
shader-db with ACO:
3 shaders have -0.57% average decrease in the code size
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449 >
2024-12-03 01:24:27 +00:00
Marek Olšák
83b093f95e
nir/algebraic: use is_used_once in a few iand/ior patterns
...
shader-db with ACO:
1 shader has -4 decrease in the code size
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449 >
2024-12-03 01:24:27 +00:00
Antonino Maniscalco
2b9738ce6d
nir,zink,asahi: support passing through gl_PrimitiveID
...
When this pass is used with Zink, gl_PrimitiveID needs to be passed
through, however this is unnecessary for other divers.
Analogous to previous commit
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Fixes: d0342e28b3 ("nir: Add helper to create passthrough GS shader")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32397 >
2024-12-03 00:24:04 +00:00
Kenneth Graunke
92797c6878
nir/algebraic: Reassociate fadd into fmul in DP4-like pattern
...
This extends the optimization from commit 09705747d7 ("nir/algebraic:
Reassociate fadd into fmul in DPH-like pattern") to a chain of 4 ffmas
for a DP4-style pattern.
Moving the add to the other end of the sequence allows it to be fused
into an FMA.
fossil-db results from Alchemist:
Totals:
Instrs: 158544142 -> 158490516 (-0.03%); split: -0.04%, +0.00%
Subgroup size: 7808912 -> 7808920 (+0.00%); split: +0.00%, -0.00%
Cycle count: 17859550672 -> 17859491966 (-0.00%); split: -0.01%, +0.01%
Spill count: 84652 -> 84494 (-0.19%); split: -0.37%, +0.18%
Fill count: 160728 -> 160623 (-0.07%); split: -0.29%, +0.23%
Scratch Memory Size: 4278272 -> 4272128 (-0.14%); split: -0.29%, +0.14%
Max live registers: 32411695 -> 32409789 (-0.01%); split: -0.01%, +0.00%
Max dispatch width: 5627856 -> 5627920 (+0.00%); split: +0.00%, -0.00%
Non SSA regs after NIR: 185359099 -> 185307703 (-0.03%); split: -0.03%, +0.00%
Totals from 16378 (2.56% of 640872) affected shaders:
Instrs: 9818723 -> 9765097 (-0.55%); split: -0.58%, +0.04%
Subgroup size: 194056 -> 194064 (+0.00%); split: +0.01%, -0.01%
Cycle count: 294967108 -> 294908402 (-0.02%); split: -0.58%, +0.56%
Spill count: 10088 -> 9930 (-1.57%); split: -3.09%, +1.53%
Fill count: 24738 -> 24633 (-0.42%); split: -1.90%, +1.48%
Scratch Memory Size: 439296 -> 433152 (-1.40%); split: -2.80%, +1.40%
Max live registers: 1297204 -> 1295298 (-0.15%); split: -0.22%, +0.07%
Max dispatch width: 133232 -> 133296 (+0.05%); split: +0.14%, -0.10%
Non SSA regs after NIR: 11999084 -> 11947688 (-0.43%); split: -0.43%, +0.00%
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32197 >
2024-12-02 13:15:16 +00:00
Rhys Perry
9f3607de76
nir/tests: fix SSA dominance in opt_if_merge tests
...
It isn't necessary for these ALU instructions to be used in the next IF.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Matt Turner <mattst88@gmail.com >
Fixes: c437f2e79c ("nir/tests: Add tests for opt_if_merge")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12211
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32391 >
2024-12-02 09:38:22 +00:00
Timothy Arceri
6ca81adffc
nir: allow loops with unknown induction var initialiser to unroll
...
If the condition of the loop terminator is based on an unsigned value we
can in some cases find the max number of possible loop trips. With the
max loop trips know a complex unroll can unroll the loop.
For example:
uniform uint x;
uint i = x;
while (true) {
if (i >= 4)
break;
i += 6;
}
The above loop can be unrolled even though we don't know the initial
value of the induction variable because it can have at most 1 iteration.
There were no changes with my shader-db collection. Change was inspired
by MR #31312 where builtin shader code failed to unroll.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31701 >
2024-12-02 11:44:33 +11:00
Dave Airlie
fcaf0f2590
vulkan: update to 302 headers for av1 encode
...
Some of the spirv AMDX stuff probably broke things, but it should
still build.
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32401 >
2024-12-02 06:29:00 +10:00
Job Noorman
d5d0628728
nir/lower_subgroups: add option to only lower clustered rotates
...
On ir3, we have native support for full rotates but not for clustered
ones.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731 >
2024-11-29 16:22:48 +00:00
Job Noorman
5dbd2b08f4
nir/lower_subgroups: disable boolean reduce when not supported
...
lower_boolean_reduce only supports ballot_components == 1. Fall back to
lower_scan_reduce when this is not the case.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731 >
2024-11-29 16:22:48 +00:00
Job Noorman
493f7b8084
nir/lower_subgroups: add extra filter data to options
...
It might be convenient for filter implementations to have access to
extra information. This will be used, for example, by ir3 to access
compiler features.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731 >
2024-11-29 16:22:48 +00:00
Job Noorman
e6c63a88fb
nir: add read_getlast_ir3 intrinsic
...
Like read_first_invocation but using getlast. Note that I intentionally
used the name of the ir3 instruction in the name as its semantics are
tricky to exactly describe otherwise.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731 >
2024-11-29 16:22:47 +00:00
Job Noorman
60e1615ced
nir/lower_subgroups: support unknown subgroup size
...
Some targets (e.g., ir3) don't always know the exact subgroup size.
Calculate the maximum subgroup size in that case by multiplying
ballot_components and ballot_bit_size.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Reviewed-by: Connor Abbott <cwabbott0@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731 >
2024-11-29 16:22:47 +00:00
Timothy Arceri
05d2fe2372
glsl: remove glsl/program.h
...
It is now unused.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32402 >
2024-11-29 14:31:30 +11:00
Timothy Arceri
8142797721
glsl: move _mesa_glsl_compile_shader() declaration
...
The function is in glsl_parser_extras.cpp so move the declaration to
glsl_parser_extras.h
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32402 >
2024-11-29 14:30:03 +11:00
Alyssa Rosenzweig
f4a3ba5302
asahi,vtn: precompile kernels
...
switch libagx to the precompilation pipeline. see the big comment in the
previous commit for why we're doing this.
while doing so, we move some dispatch stuff. there was so much churn from
precompile that this avoids doing the churn twice. that new header will be used
for DGC down the road.
there's also a small vtn/bindgen patch in here to skip bindgen'ing entrypoints,
as that conflicts with the new dispatch macros. this is the sane behaviour, we
just need to do the full precomp switch across the tree at once.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32339 >
2024-11-28 17:34:12 +00:00
Alyssa Rosenzweig
e3001352ad
nir: add helpers for precompiled shaders
...
v2: generalize function signatures.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com > [v1]
Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com > [v1]
Acked-by: Mary Guillemard <mary.guillemard@collabora.com > [v2]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32339 >
2024-11-28 17:34:12 +00:00