AlexIndustrial/mesa

Author	SHA1	Message	Date
Alyssa Rosenzweig	3d35ea6a6b	mesa_clc: add depfile support This allows the tool to tell ninja what headers it read, so ninja can correctly rebuild when necessary. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Dylan Baker <dylan.c.baker@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505>	2024-12-06 13:48:26 -05:00
Dylan Baker	33a1acb0da	clc: Tell clang to track imported dependencies Clang is capable of tacking what headers it imports, as long as we set it up to do that. While that isn't important for rusticl, it would be useful for the various `_clc` tools, as they can then tell Ninja which headers they read to make rebuilds more reliable. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Dylan Baker <dylan.c.baker@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32505>	2024-12-06 13:48:26 -05:00
Karmjit Mahil	047049dcb5	nir: Fix the spelling of compare Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>	2024-12-06 08:42:36 +00:00
Karmjit Mahil	b79994e92d	nir,ir3: Add icsel_eqz In IR3 `sel.b32` works based on the 0 so add `icsel_eqz` to fuse the cmp and sel that we'd otherwise need. total Instruction Count in shared programs: 1112814 -> 1110473 (-0.21%) Instruction Count in affected programs: 162701 -> 160360 (-1.44%) helped: 81 HURT: 29 Instruction count are helped. total MOV Count in shared programs: 86777 -> 88671 (2.18%) MOV Count in affected programs: 28119 -> 30013 (6.74%) helped: 1 HURT: 292 Mov count are HURT. total COV Count in shared programs: 15070 -> 14962 (-0.72%) COV Count in affected programs: 5770 -> 5662 (-1.87%) helped: 76 HURT: 2 Cov count are helped. total Last helper instruction in shared programs: 592729 -> 590638 (-0.35%) Last helper instruction in affected programs: 91331 -> 89240 (-2.29%) helped: 30 HURT: 1 Last helper instruction are helped. total Instructions with SS sync bit in shared programs: 29336 -> 29546 (0.72%) Instructions with SS sync bit in affected programs: 4702 -> 4912 (4.47%) helped: 8 HURT: 43 Instructions with ss sync bit are HURT. total Estimated cycles stalled on SS in shared programs: 111590 -> 112401 (0.73%) Estimated cycles stalled on SS in affected programs: 27708 -> 28519 (2.93%) helped: 21 HURT: 61 Estimated cycles stalled on ss are HURT. total cat1 instructions in shared programs: 101933 -> 103695 (1.73%) cat1 instructions in affected programs: 35804 -> 37566 (4.92%) helped: 18 HURT: 290 Cat1 instructions are HURT. total cat2 instructions in shared programs: 380299 -> 377499 (-0.74%) cat2 instructions in affected programs: 128609 -> 125809 (-2.18%) helped: 322 HURT: 0 Cat2 instructions are helped. Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>	2024-12-06 08:42:36 +00:00
Karmjit Mahil	aad0aa0a9c	nir/algebraic: turn `u{ge,lt} a, 1` to `i{ne,eq} a, 0` Signed-off-by: Karmjit Mahil <karmjit.mahil@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32189>	2024-12-06 08:42:36 +00:00
Ian Romanick	e1bb53bb3c	nir/algebraic: Optimize some trivial bfi In fossil-db, one big compute shader on Hogwarts Legacy is helped for spills and fills. It has a lot of instances of bfi(0x3f, a, a). On Tiger Lake and Skylake, a compute shader in Unicom that has a single instance of this pattern is hurt for spills and fills. I think this is just due to non-determinism in the register allocation algorithm. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total instructions in shared programs: 16992643 -> 16992548 (<.01%) instructions in affected programs: 17533 -> 17438 (-0.54%) helped: 33 / HURT: 0 total cycles in shared programs: 914313986 -> 914316238 (<.01%) cycles in affected programs: 3734544 -> 3736796 (0.06%) helped: 26 / HURT: 6 fossil-db: Lunar Lake, Meteor Lake, DG2, and Ice Lake had similar results. (Lunar Lake shown) Totals: Instrs: 208952780 -> 208952537 (-0.00%) Send messages: 10934879 -> 10934875 (-0.00%) Cycle count: 30988230904 -> 30988228660 (-0.00%); split: -0.00%, +0.00% Spill count: 534864 -> 534843 (-0.00%) Fill count: 667081 -> 667068 (-0.00%) Max live registers: 65686656 -> 65686624 (-0.00%) Non SSA regs after NIR: 244185358 -> 244185335 (-0.00%) Totals from 3 (0.00% of 704834) affected shaders: Instrs: 4708 -> 4465 (-5.16%) Send messages: 234 -> 230 (-1.71%) Cycle count: 264382 -> 262138 (-0.85%); split: -0.88%, +0.03% Spill count: 91 -> 70 (-23.08%) Fill count: 73 -> 60 (-17.81%) Max live registers: 647 -> 615 (-4.95%) Non SSA regs after NIR: 3957 -> 3934 (-0.58%) Tiger Lake Totals: Instrs: 230516919 -> 230515185 (-0.00%); split: -0.00%, +0.00% Send messages: 12657684 -> 12657680 (-0.00%) Cycle count: 23060318600 -> 23060279758 (-0.00%); split: -0.00%, +0.00% Spill count: 548462 -> 548446 (-0.00%); split: -0.00%, +0.00% Fill count: 582304 -> 582294 (-0.00%); split: -0.00%, +0.00% Scratch Memory Size: 19538944 -> 19539968 (+0.01%) Max live registers: 41713622 -> 41713593 (-0.00%) Non SSA regs after NIR: 260667939 -> 260667712 (-0.00%); split: -0.00%, +0.00% Totals from 174 (0.02% of 794323) affected shaders: Instrs: 158346 -> 156612 (-1.10%); split: -1.13%, +0.04% Send messages: 14330 -> 14326 (-0.03%) Cycle count: 24859875 -> 24821033 (-0.16%); split: -0.32%, +0.16% Spill count: 183 -> 167 (-8.74%); split: -9.29%, +0.55% Fill count: 284 -> 274 (-3.52%); split: -7.39%, +3.87% Scratch Memory Size: 9216 -> 10240 (+11.11%) Max live registers: 12587 -> 12558 (-0.23%) Non SSA regs after NIR: 164466 -> 164239 (-0.14%); split: -0.16%, +0.02% Skylake Totals: Instrs: 158904982 -> 158903764 (-0.00%) Send messages: 8490500 -> 8490496 (-0.00%) Cycle count: 19732284279 -> 19732345496 (+0.00%); split: -0.00%, +0.00% Spill count: 519127 -> 519115 (-0.00%) Fill count: 594283 -> 594290 (+0.00%); split: -0.00%, +0.00% Max live registers: 33708764 -> 33708739 (-0.00%) Non SSA regs after NIR: 169377234 -> 169377007 (-0.00%); split: -0.00%, +0.00% Totals from 174 (0.03% of 648725) affected shaders: Instrs: 160391 -> 159173 (-0.76%) Send messages: 14354 -> 14350 (-0.03%) Cycle count: 24776486 -> 24837703 (+0.25%); split: -0.07%, +0.32% Spill count: 332 -> 320 (-3.61%) Fill count: 587 -> 594 (+1.19%); split: -0.17%, +1.36% Max live registers: 12709 -> 12684 (-0.20%) Non SSA regs after NIR: 166557 -> 166330 (-0.14%); split: -0.16%, +0.02% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32493>	2024-12-05 21:39:07 +00:00
Alyssa Rosenzweig	ca9bf43d0b	nir,asahi: make argument alignment configurable this is more flexible. Mali needs 32-bit alignment, for example. I added an option struct in case we need to make this a callback or something later. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>	2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig	0d77e91ca3	nir/opt_load_store_vectorize: match amul like imul for AGX, we preserve amul all the way until fusing address modes in order to be able to fuse effectively. so the load/store vectorizer wouldn't vectorize before fusing. however, after fusing we get fused intrinsics which are tricky to teach the vectorizer about as their semantics are pretty subtle. so we can't vectorize after, either. the easiest solution is to teach the vectorize about amul, which can always be replaced by imul for our pattern matches. this fixes certain cases of vectorization in OpenCL kernels on asahi. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>	2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig	77d4ed0a01	nir/opt_algebraic: optimize sign bit manipulation libclc loves to generate the iand(0x7fffffff) pattern. ior/ixor patterns are added for completeness. Shaves 4 instructions off libclc vec4 normalize. v2: Loop over the bit sizes (Georg). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1] Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>	2024-12-05 10:58:51 +00:00
Alyssa Rosenzweig	be049e1c14	nir/search_helpers: handle bcsel in is_only_used_as_float this lets algebraic see through chains of instructions. v2: Limit recursion depth (Georg). Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> [v1] Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32398>	2024-12-05 10:58:51 +00:00
Boris Brezillon	98e3c1e6fb	nir: Let nir_lower_texcoord_replace_late() report progress Useful if we want to wrap this pass with a NIR_PASS() to enforce validation. Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32480>	2024-12-05 08:49:45 +00:00
Kai Wasserbäch	8a453669e2	fix(FTBFS): clc/clover: pass a VFS instance explicitly This just replicates what upstream did before breaking mesa with commit df9a14d7bbf and requiring a VFS instance. Reported-by: @Lone_Wolf Reference: <`df9a14d7bb`> Closes: <https://gitlab.freedesktop.org/mesa/mesa/-/issues/12223> Signed-off-by: Kai Wasserbäch <kai@dev.carbon-project.org> Tested-by: Dieter Nützel <Dieter@nuetzel-hh.de> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32439>	2024-12-04 19:55:56 +00:00
Marek Olšák	3effa3d53b	nir/lower_io_passes: lower indirect IO for TCS nir_lower_io_to_temporaries doesn't do anything and gives up when it gets TCS. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	f5a0cde125	nir/opt_varyings: fix compile failures in the disabled PRINT code linkage is a pointer, but it was used as a structure. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	dd788d0a7f	nir/opt_varyings: remove rare dead output stores after inter-shader code motion Backward inter-shader code motion left dead output stores in the producer in rare cases. Those dead stores would then make their way into drivers and hw. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	f0c4e71d58	nir/opt_varyings: fix getting deref variables for sysvals This might fix array system values. Noticed by luck. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	dcc679ab3a	nir/opt_varyings: add inter-shader code motion for uniform/UBO indexing If input_value, index, index1 or index2 is an input, here are examples of code that this commit moves from consumers to producers: * input_value * uniform_array[index] * uniform_array[index] * ubo[0].array[index] * ubo[index].var * ubo[index1].array[index2] If the array index is computed from an input, it must be flat or convergent within a primitive to be moved. If the array index is not an input, it must be a uniform expression. dEQP-GLES31.functional.shaders.opaque_type_indexing.ubo.dynamically_uniform_fragment has UBO indexing that is moved to the producer by this. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	f52ae35d73	nir/opt_varyings: propagate indirect uniform/UBO loads into the next shader Uniform and UBO loads with non-constant indices are now propagated. The majority of this code implements cloning deref chains. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	c0de78f120	nir/opt_varyings: change try_move_postdominator param to nir_instr type We want more instructions to be movable, like load_deref(var, index = load_input). Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	8e39e8ed4d	nir/opt_varyings: make top-level compaction code for TES, TCS, GS separate Add a separate "if" block for each and use a helper for repeated code. There will be more code added here that keeping TES, TCS, and GS compaction code unified would be a mess. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	d20e07dbad	nir/opt_varyings: fix max_slot for color varying compaction It should be in units of slots. This was unlikely to break anything. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	69b1853ecf	nir/opt_varyings: count the number of unused components for compaction correctly Holes due to indirectly-indexed inputs were ignored, making the compaction worse when such inputs were present alongside convergent inputs. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	1aa9fec542	nir/opt_varyings: fix compaction with sparse indirect FS inputs Without this, compaction can put inputs into vec4 slots already occupied by indirectly-accessed inputs while ignoring their interpolation qualifier, which is incorrect. All input components sharing the same vec4 slot must use interpolation qualifiers that are compatible with each other. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	b01f3cea7a	nir/opt_varyings: remove redundant conditions from a while loop Most of these conditions are repeated below with a continue statement. This just puts break at the end where all of them are false. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Marek Olšák	a618a2aa8b	nir/linking_helpers: don't promote interpolated varyings to flat Even the most flexible interpolation that we have in NIR options (nir_io_has_flexible_input_interpolation_except_flat) doesn't allow mixing flat and non-flat in the same vec4. This (legacy) optimization can't promote interpolated inputs to flat if it doesn't consider the interpolation mode of the whole vec4 slot. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32424>	2024-12-04 13:40:41 +00:00
Timothy Arceri	fcebbfc399	glsl: drop unused array refcount code and tests Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32450>	2024-12-04 11:50:57 +00:00
Georg Lehmann	34a47e4b14	nir/opt_algebraic: mark a - ffract(a) as nan incorrect. Inf + fract(Inf) -> Inf + NaN -> NaN floor(Inf) -> Inf Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>	2024-12-03 14:42:18 +00:00
Georg Lehmann	2ee96cf514	nir/opt_algebraic: optimize d3d9 ceil No Foz-DB changes. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>	2024-12-03 14:42:18 +00:00
Georg Lehmann	34caed8adb	nir/opt_algebraic: optimize d3d9 ftrunc Foz-DB Navi21: Totals from 85 (0.11% of 79395) affected shaders: MaxWaves: 1972 -> 1968 (-0.20%) Instrs: 48682 -> 47067 (-3.32%) CodeSize: 255664 -> 247172 (-3.32%) VGPRs: 3752 -> 3768 (+0.43%) Latency: 154414 -> 150360 (-2.63%) InvThroughput: 37186 -> 35081 (-5.66%) VClause: 847 -> 865 (+2.13%); split: -0.24%, +2.36% SClause: 768 -> 796 (+3.65%) Copies: 2763 -> 2869 (+3.84%); split: -0.14%, +3.98% VALU: 28133 -> 26781 (-4.81%) SALU: 7182 -> 6939 (-3.38%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>	2024-12-03 14:42:18 +00:00
Georg Lehmann	ea4aa8e5a6	nir/opt_algebraic: optimize ffma(b2f, b2f, c) Foz-DB Navi21: Totals from 134 (0.17% of 79395) affected shaders: Instrs: 153297 -> 153326 (+0.02%); split: -0.03%, +0.05% CodeSize: 829520 -> 828444 (-0.13%); split: -0.13%, +0.00% Latency: 900489 -> 899964 (-0.06%); split: -0.07%, +0.01% InvThroughput: 267838 -> 267478 (-0.13%); split: -0.14%, +0.00% VClause: 2452 -> 2454 (+0.08%) Copies: 8331 -> 8353 (+0.26%); split: -0.25%, +0.52% PreSGPRs: 4974 -> 4964 (-0.20%) PreVGPRs: 6209 -> 6218 (+0.14%) VALU: 112317 -> 112092 (-0.20%); split: -0.21%, +0.01% SALU: 12451 -> 12694 (+1.95%) Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32393>	2024-12-03 14:42:18 +00:00
Timothy Arceri	fd431a5b71	glsl: drop unused ir_equals.cpp Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32448>	2024-12-03 02:46:39 +00:00
Kenneth Graunke	5712fc48a9	nir: Allow large overfetching holes in the load store vectorizer The load__uniform_block_intel intrinsics always load either 8x or 16x 32-bit components worth of data (so 32 byte increments). This leads to cases where we load a few components from one vec8, followed by a few components of an adjacent vec8. We want to combine those into a vec16 load, as that loads a whole cacheline at a time, and requires less hoops to calculate addresses and request memory loads. So, we allow 7 4 = 28 bytes of holes, which handles vec8+vec8 where only the .x component is read. Most drivers and intrinsics will not want such large holes. I thought about adding a per-intrinsic max_hole to the core code, but decided that since we already have driver callbacks, we can just rely on them to reject what makes sense to them. No driver callbacks currently allow holes, so this should not currently affect any drivers. But any work in progress branches may need to be updated to reject larger holes. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32315>	2024-12-03 02:02:33 +00:00
Marek Olšák	8752401e03	nir/algebraic: optimize (a & b) \| (a \| c) => a \| c, (a & b) & (a \| c) => a & b No change in shader-db with ACO, but it doesn't seem to be optimized by any other patterns. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>	2024-12-03 01:24:27 +00:00
Marek Olšák	3670d42c74	nir/algebraic: optimize (a \| b) \| (a \| c) ==> (a \| b) \| c shader-db with ACO: 3 shaders have -0.11% average decrease in the code size Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>	2024-12-03 01:24:27 +00:00
Marek Olšák	978ad93375	nir/algebraic: optimize (a & b) & (a & c) ==> (a & b) & c shader-db with ACO: 3 shaders have -0.57% average decrease in the code size Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>	2024-12-03 01:24:27 +00:00
Marek Olšák	83b093f95e	nir/algebraic: use is_used_once in a few iand/ior patterns shader-db with ACO: 1 shader has -4 decrease in the code size Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32449>	2024-12-03 01:24:27 +00:00
Antonino Maniscalco	2b9738ce6d	nir,zink,asahi: support passing through gl_PrimitiveID When this pass is used with Zink, gl_PrimitiveID needs to be passed through, however this is unnecessary for other divers. Analogous to previous commit Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Fixes: `d0342e28b3` ("nir: Add helper to create passthrough GS shader") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32397>	2024-12-03 00:24:04 +00:00
Kenneth Graunke	92797c6878	nir/algebraic: Reassociate fadd into fmul in DP4-like pattern This extends the optimization from commit `09705747d7` ("nir/algebraic: Reassociate fadd into fmul in DPH-like pattern") to a chain of 4 ffmas for a DP4-style pattern. Moving the add to the other end of the sequence allows it to be fused into an FMA. fossil-db results from Alchemist: Totals: Instrs: 158544142 -> 158490516 (-0.03%); split: -0.04%, +0.00% Subgroup size: 7808912 -> 7808920 (+0.00%); split: +0.00%, -0.00% Cycle count: 17859550672 -> 17859491966 (-0.00%); split: -0.01%, +0.01% Spill count: 84652 -> 84494 (-0.19%); split: -0.37%, +0.18% Fill count: 160728 -> 160623 (-0.07%); split: -0.29%, +0.23% Scratch Memory Size: 4278272 -> 4272128 (-0.14%); split: -0.29%, +0.14% Max live registers: 32411695 -> 32409789 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 5627856 -> 5627920 (+0.00%); split: +0.00%, -0.00% Non SSA regs after NIR: 185359099 -> 185307703 (-0.03%); split: -0.03%, +0.00% Totals from 16378 (2.56% of 640872) affected shaders: Instrs: 9818723 -> 9765097 (-0.55%); split: -0.58%, +0.04% Subgroup size: 194056 -> 194064 (+0.00%); split: +0.01%, -0.01% Cycle count: 294967108 -> 294908402 (-0.02%); split: -0.58%, +0.56% Spill count: 10088 -> 9930 (-1.57%); split: -3.09%, +1.53% Fill count: 24738 -> 24633 (-0.42%); split: -1.90%, +1.48% Scratch Memory Size: 439296 -> 433152 (-1.40%); split: -2.80%, +1.40% Max live registers: 1297204 -> 1295298 (-0.15%); split: -0.22%, +0.07% Max dispatch width: 133232 -> 133296 (+0.05%); split: +0.14%, -0.10% Non SSA regs after NIR: 11999084 -> 11947688 (-0.43%); split: -0.43%, +0.00% Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32197>	2024-12-02 13:15:16 +00:00
Rhys Perry	9f3607de76	nir/tests: fix SSA dominance in opt_if_merge tests It isn't necessary for these ALU instructions to be used in the next IF. Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Fixes: `c437f2e79c` ("nir/tests: Add tests for opt_if_merge") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12211 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32391>	2024-12-02 09:38:22 +00:00
Timothy Arceri	6ca81adffc	nir: allow loops with unknown induction var initialiser to unroll If the condition of the loop terminator is based on an unsigned value we can in some cases find the max number of possible loop trips. With the max loop trips know a complex unroll can unroll the loop. For example: uniform uint x; uint i = x; while (true) { if (i >= 4) break; i += 6; } The above loop can be unrolled even though we don't know the initial value of the induction variable because it can have at most 1 iteration. There were no changes with my shader-db collection. Change was inspired by MR #31312 where builtin shader code failed to unroll. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31701>	2024-12-02 11:44:33 +11:00
Dave Airlie	fcaf0f2590	vulkan: update to 302 headers for av1 encode Some of the spirv AMDX stuff probably broke things, but it should still build. Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32401>	2024-12-02 06:29:00 +10:00
Job Noorman	d5d0628728	nir/lower_subgroups: add option to only lower clustered rotates On ir3, we have native support for full rotates but not for clustered ones. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>	2024-11-29 16:22:48 +00:00
Job Noorman	5dbd2b08f4	nir/lower_subgroups: disable boolean reduce when not supported lower_boolean_reduce only supports ballot_components == 1. Fall back to lower_scan_reduce when this is not the case. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>	2024-11-29 16:22:48 +00:00
Job Noorman	493f7b8084	nir/lower_subgroups: add extra filter data to options It might be convenient for filter implementations to have access to extra information. This will be used, for example, by ir3 to access compiler features. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>	2024-11-29 16:22:48 +00:00
Job Noorman	e6c63a88fb	nir: add read_getlast_ir3 intrinsic Like read_first_invocation but using getlast. Note that I intentionally used the name of the ir3 instruction in the name as its semantics are tricky to exactly describe otherwise. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>	2024-11-29 16:22:47 +00:00
Job Noorman	60e1615ced	nir/lower_subgroups: support unknown subgroup size Some targets (e.g., ir3) don't always know the exact subgroup size. Calculate the maximum subgroup size in that case by multiplying ballot_components and ballot_bit_size. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Connor Abbott <cwabbott0@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31731>	2024-11-29 16:22:47 +00:00
Timothy Arceri	05d2fe2372	glsl: remove glsl/program.h It is now unused. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32402>	2024-11-29 14:31:30 +11:00
Timothy Arceri	8142797721	glsl: move _mesa_glsl_compile_shader() declaration The function is in glsl_parser_extras.cpp so move the declaration to glsl_parser_extras.h Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32402>	2024-11-29 14:30:03 +11:00
Alyssa Rosenzweig	f4a3ba5302	asahi,vtn: precompile kernels switch libagx to the precompilation pipeline. see the big comment in the previous commit for why we're doing this. while doing so, we move some dispatch stuff. there was so much churn from precompile that this avoids doing the churn twice. that new header will be used for DGC down the road. there's also a small vtn/bindgen patch in here to skip bindgen'ing entrypoints, as that conflicts with the new dispatch macros. this is the sane behaviour, we just need to do the full precomp switch across the tree at once. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32339>	2024-11-28 17:34:12 +00:00
Alyssa Rosenzweig	e3001352ad	nir: add helpers for precompiled shaders v2: generalize function signatures. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> [v1] Reviewed-by: Mary Guillemard <mary.guillemard@collabora.com> [v1] Acked-by: Mary Guillemard <mary.guillemard@collabora.com> [v2] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32339>	2024-11-28 17:34:12 +00:00

1 2 3 4 5 ...

10031 Commits