AlexIndustrial/mesa

Author	SHA1	Message	Date
Lionel Landwerlin	255d1e883d	nir/divergence: fix handling of intel uniform block load Those are normally uniform always, but for the purpose of fused threads handling, we need to check their sources. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `ca1533cd03` ("nir/divergence: add a new mode to cover fused threads on Intel HW") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37929>	2025-10-21 06:13:10 +00:00
Emma Anholt	0781edc30f	nir/copy_prop_vars: Mask out no-op writes to variables. The pass previously supported removing complete no-op writes, but we can do better by noticing if any channel being written is the current channel's value and masking off those writes. I noticed this happening in Stray's fragment shader, where it looked like some translation layer had turned var[x].zw = vec2(a, b) into var[x] = vec4(var[x].x, var[x].y, a, b). This in turn lets nir_shrink_vec_array_vars be more effective. Totals: MaxWaves: 22158876 -> 22156696 (-0.01%); split: +0.00%, -0.01% Instrs: 401167243 -> 401007996 (-0.04%); split: -0.04%, +0.00% CodeSize: 1004397302 -> 1004133728 (-0.03%); split: -0.03%, +0.00% STPs: 369810 -> 234618 (-36.56%) LDPs: 209430 -> 172011 (-17.87%) Totals from 1884 (0.12% of 1560230) affected shaders: MaxWaves: 12686 -> 10506 (-17.18%); split: +6.97%, -24.15% Instrs: 2099486 -> 1940239 (-7.59%); split: -7.64%, +0.06% CodeSize: 4570472 -> 4306898 (-5.77%); split: -5.81%, +0.05% NOPs: 334399 -> 270881 (-18.99%); split: -20.58%, +1.58% MOVs: 131003 -> 148034 (+13.00%); split: -11.59%, +24.59% COVs: 14512 -> 16921 (+16.60%); split: -0.23%, +16.83% Full: 58120 -> 72399 (+24.57%); split: -6.75%, +31.31% (ss): 79215 -> 45331 (-42.77%); split: -48.46%, +5.68% (sy): 33081 -> 11119 (-66.39%); split: -66.56%, +0.18% (ss)-stall: 302152 -> 115528 (-61.76%); split: -64.34%, +2.57% (sy)-stall: 2706110 -> 498998 (-81.56%); split: -81.68%, +0.12% STPs: 212045 -> 76853 (-63.76%) LDPs: 47337 -> 9918 (-79.05%) Preamble Instrs: 413954 -> 413630 (-0.08%); split: -0.21%, +0.13% Cat0: 370362 -> 306844 (-17.15%); split: -18.58%, +1.43% Cat1: 145629 -> 165003 (+13.30%); split: -10.51%, +23.81% Cat2: 687947 -> 683992 (-0.57%); split: -0.61%, +0.04% Cat3: 362919 -> 360690 (-0.61%); split: -0.72%, +0.11% Cat6: 461411 -> 352375 (-23.63%) Cat7: 16857 -> 16974 (+0.69%); split: -0.35%, +1.04% Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Emma Anholt	537cc4e0ff	nir/shrink_stores: Don't shrink stores to an invalid num_components. Avoids a regression in the CL CTS on the next commit. Fixes: `2dba7e6056` ("nir: split nir_opt_shrink_stores from nir_opt_shrink_vectors") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Emma Anholt	d8690f9c60	nir/link_opt_varyings: Make it participate in NIR_DEBUG=print. It's a pass with major effects on shaders, and it's otherwise weird to see your varying disappear between two passes that shouldn't affect them. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37313>	2025-10-20 19:24:45 +00:00
Aitor Camacho	f711c3afed	nir: Add KosmicKrisp required utilities Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37520>	2025-10-20 16:22:00 +00:00
Job Noorman	ad421cdf2e	nir: mark fneg distribution through fadd/ffma as nsz `df1876f615` ("nir: Mark negative re-distribution on fadd as imprecise") fixed the fadd case by marking it as imprecise. This commit fixes the ffma case for the same reason. However, "imprecise" isn't necessary and nowadays we have "nsz" which is more accurate here. Use that for both fadd and ffma. Signed-off-by: Job Noorman <jnoorman@igalia.com> Fixes: `62795475e8` ("nir/algebraic: Distribute source modifiers into instructions") Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37930>	2025-10-17 08:58:59 +00:00
Mary Guillemard	6f73533094	asahi,nir: Stop relying on zero and scratch page in GS/TESS code Introduce new NIR intrinsics to handle getting a "sink" read-only address and another intrinsic to handle conversion of address to read-write (allowing implementation to replace the "sink" read-only with another address like required for Asahi) Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Mary Guillemard	71c4943c37	compiler: rename vs.tes_agx bit to vs.tes_poly Preparing to move AGX's GS/TESS lowering code. Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Mary Guillemard	1e0c18d6cf	nir: Rename stat_query_address_agx to stat_query_address_poly This is used by the geometry lowering that we are going to move to common code. Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37914>	2025-10-16 19:25:35 +00:00
Alyssa Rosenzweig	84d8e6824b	treewide: don't check before free This was something that came up in the slop MR. Not sure it's actually a good idea or not but kind of curious what people think, given we have a sound tool (Coccinelle) to do the transform. Saves a redundant branch but means extra noninlined function calls.. likely no actual perf impact but saves some code. Via Coccinelle patches: @@ expression ptr; @@ -if (ptr) { -free(ptr); -} +free(ptr); @@ expression ptr; @@ -if (ptr) { -FREE(ptr); -} +FREE(ptr); @@ expression ptr; @@ -if (ptr) { -ralloc_free(ptr); -} +ralloc_free(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -free(ptr); -} - +free(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -FREE(ptr); -} - +FREE(ptr); @@ expression ptr; @@ -if (ptr != NULL) { -ralloc_free(ptr); -} - +ralloc_free(ptr); Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> [v3d] Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> [venus] Reviewed-by: Frank Binns <frank.binns@imgtec.com> [powervr] Reviewed-by: Janne Grunau <j@jannau.net> [asahi] Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> [radv] Reviewed-by: Job Noorman <jnoorman@igalia.com> [ir3] Acked-by: Marek Olšák <maraeo@gmail.com> Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Acked-by: Job Noorman <jnoorman@igalia.com> Acked-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Christian Gmeiner <cgmeiner@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37892>	2025-10-15 23:01:33 +00:00
Dave Airlie	543c9be87a	nir/coopmat: fix non square load/store lowering for flexible dimensions This shouldn't affect radv, but we should do the calculations correctly for when non-square matters. Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37879>	2025-10-16 07:19:28 +10:00
Timothy Arceri	45a8cce5d6	util/range_remap: switch to using sorted array Here we switch to using a sorted arrays for the binary search which significantly speeds up the lookups. For a shader with ~8000 uniforms its up to 10x faster and the godot-tps-gles3-high.trace in issue #13894 returns to its original runtime length before we switched to using range remap in `e052254066` Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>	2025-10-15 02:16:36 +00:00
Timothy Arceri	9536de6eac	util/range_remap: split list node from range entry This will allow us to create an array of entries without the node member in a following patch. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>	2025-10-15 02:16:34 +00:00
Timothy Arceri	754c6d0f54	glsl/util: update util_range_remap to use range_remap struct This will allow us to use the linked list for validation of inserts during linking then switch to using an array for fast binary searches. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>	2025-10-15 02:16:34 +00:00
Timothy Arceri	35d5e903ac	util/range_remap: dont overwrite entry if ptr is NULL If the ptr value is NULL and we match an existing entry during an insert call we now just return the existing value. This allows us to drop an extra lookup, this will become important as the following patches change `util_range_remap()` lookups to use a sorted array that is not created until after we have added all entries to our linked list. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37754>	2025-10-15 02:16:33 +00:00
Faith Ekstrand	f91f86dded	compiler/rust: Implement dfs() non-recursively Recursion in Rust can be annoyingly expensive. The recursive DFS implementation we have now is probably not terrible but these things bloat in debug builds more than you'd expect. This replaceas it with a non-recursive implementation which uses a Vec instead. Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev> Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>	2025-10-14 22:47:40 +00:00
Faith Ekstrand	97febfde28	compiler/rust/cfg: Use DepthFirstSearch for finding reaches sets Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev> Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>	2025-10-14 22:47:40 +00:00
Faith Ekstrand	76bba3ffba	compiler/rust/cfg: Use DepthFirstSearch for find_back_edges() Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev> Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>	2025-10-14 22:47:40 +00:00
Faith Ekstrand	030005a772	compiler/rust/cfg: Use DepthFirstSearch for calc_dominance() Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev> Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>	2025-10-14 22:47:40 +00:00
Faith Ekstrand	7f73a027ae	compiler/rust/cfg: Use DepthFirstSearch for rev_post_order_sort() Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev> Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>	2025-10-14 22:47:39 +00:00
Faith Ekstrand	993f5c9e30	compiler/rust: Add a DepthFirstSearch trait There are four depth first searches in cfg.rs. This adds a DFS abstraction which we can later optimize. Reviewed-by: Mel Henning <mhenning@darkrefraction.com> Reviewed-by: Lorenzo Rossi <git@rossilorenzo.dev> Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37536>	2025-10-14 22:47:38 +00:00
Daniel Schürmann	fad10b91a6	nir/divergence: don't assume that load_sample_positions_amd is always uniform Sample positions aren't uniform when the sample id is divergent. This was a regression when we started lowering fragment shader barycentrics in NIR. Fixes: `7f444fc72c` ("nir: add nir_intrinsic_load_sample_positions_amd") Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>	2025-10-14 16:33:10 +00:00
Job Noorman	0b82b803d9	nir,ir3: rename umul_low to umul_16x16 This is more in line with similar opcodes like umul_32x16. Also change its const expr: the masking based on bit size was unnecessary as it is only defined for 32 bits. Use simple casts instead. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37863>	2025-10-14 12:54:54 +00:00
Georg Lehmann	92d670021a	nir/opt_intrinsics: optimize atomics to atomic load/store Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:17 +00:00
Georg Lehmann	e98b218ddd	nir/opt_intrinsics: don't pass nir options around Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:17 +00:00
Georg Lehmann	142079759b	nir/opt_uniform_atomics: optimize xchg with uniform address and data Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37822>	2025-10-14 06:24:16 +00:00
Georg Lehmann	e08911dff4	nir: fix nir_get_io_offset_src for global_atomic_swap_amd Fixes: `354df09c88` ("nir: add global_amd to nir_get_io_offset_src/nir_get_io_index_src") Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37821>	2025-10-13 09:41:41 +00:00
Ian Romanick	1e691e68e2	nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17181254 -> 17181046 (<.01%) instructions in affected programs: 35834 -> 35626 (-0.58%) helped: 130 / HURT: 2 total cycles in shared programs: 888543370 -> 888554248 (<.01%) cycles in affected programs: 7443984 -> 7454862 (0.15%) helped: 95 / HURT: 87 fossil-db: Lunar Lake Totals: Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00% Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00% Totals from 6842 (0.87% of 790721) affected shaders: Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00% Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19% Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00% Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01% Spill count: 530380 -> 530084 (-0.06%) Fill count: 613416 -> 612900 (-0.08%) Scratch Memory Size: 20089856 -> 20075520 (-0.07%) Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 8034616 -> 8034584 (-0.00%) Totals from 6653 (0.73% of 905545) affected shaders: Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00% Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40% Spill count: 1953 -> 1657 (-15.16%) Fill count: 3556 -> 3040 (-14.51%) Scratch Memory Size: 92160 -> 77824 (-15.56%) Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 55768 -> 55736 (-0.06%) No shader-db or fossil-db changes on any previous Intel platforms. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:11 +00:00
Ian Romanick	f7939f2fdc	nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub I noticed some things related to this while implementing support for bitfield_select / BFN in BRW. shader-db: Lunar Lake total instructions in shared programs: 17183140 -> 17183128 (<.01%) instructions in affected programs: 3830 -> 3818 (-0.31%) helped: 6 / HURT: 0 total cycles in shared programs: 889936934 -> 889936056 (<.01%) cycles in affected programs: 253758 -> 252880 (-0.35%) helped: 4 / HURT: 2 No shader-db changes on any other Intel platform. fossil-db: Lunar Lake Totals: Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738646 -> 71738626 (-0.00%) Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%) Totals from 177 (0.02% of 790723) affected shaders: Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00% Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01% Max live registers: 39364 -> 39344 (-0.05%) Non SSA regs after NIR: 70453 -> 70455 (+0.00%) Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095611 -> 264095358 (-0.00%) Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00% Fill count: 613233 -> 613231 (-0.00%) Totals from 123 (0.01% of 905547) affected shaders: Instrs: 334830 -> 334577 (-0.08%) Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22% Fill count: 4145 -> 4143 (-0.05%) Tiger Lake and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 269733849 -> 269733590 (-0.00%) Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01% Totals from 123 (0.01% of 903812) affected shaders: Instrs: 338617 -> 338358 (-0.08%) Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40% Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	aa53735b66	nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert Intel platforms will soon implement both bfi and bitfield_select. bfi is more efficient for bitfield_insert. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	08ec408061	nir/algebraic: Optimize f2u of negative value to zero The eliminated SENDs are from a single app that has a bunch of fragment shaders with a sequence like: con 32 %495 = fmul! %203.i, %1 (0.000000) con 32 %496 = ffma! %203.j, %1 (0.000000), %495 con 32 %497 = ffma! %203.k, %1 (0.000000), %496 con 32 %498 = ffma! %203.l, %1 (0.000000), %497 con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0) con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0) con 32 %501 = f2u32 %498 con 32 %502 = umin %501, %172 (0x4) con 32 %503 = ishl %502, %172 (0x4) con 32 %504 = load_const (0x00000040 = 64) con 32 %505 = umin %503, %504 (0x40) con 32 %506 = iadd %500, %505 The `f2u` is replaced with 0, and that makes the `ffma` dot-product sequence be unused. Since it is unused, most of the preceeding block gets eliminated. A lot of instructions after the `f2u` are also eliminated by other algebraic optimizations. Most importantly, %203 is the result of a `load_ubo_uniform_block_intel` that is eliminated. No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00% Send messages: 40892036 -> 40887569 (-0.01%) Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00% Max live registers: 190030365 -> 190030367 (+0.00%) Max dispatch width: 47415040 -> 47415024 (-0.00%) Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00% Totals from 2234 (0.11% of 1955134) affected shaders: Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00% Send messages: 44179 -> 39712 (-10.11%) Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00% Max live registers: 367357 -> 367359 (+0.00%) Max dispatch width: 39184 -> 39168 (-0.04%) Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	5667459ff1	nir/algebraic: Don't introduce undefined behavior in f2u conversion If the source -1.0 < x < 0.0, simply removing the ftrun will introduce undefined behavior. By chance of how at least Intel and NVIDIA GPUs implement f2u, this has Just Worked. No shader-db changes on any Intel platform. fossil-db: Lunar Lake Totals: Instrs: 913264354 -> 913264366 (+0.00%) Cycle count: 104953995530 -> 104953996854 (+0.00%) Max live registers: 189266026 -> 189266058 (+0.00%) Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%) Totals from 24 (0.00% of 1984794) affected shaders: Instrs: 4669 -> 4681 (+0.26%) Cycle count: 50610 -> 51934 (+2.62%) Max live registers: 1222 -> 1254 (+2.62%) Non SSA regs after NIR: 1174 -> 1126 (-4.09%) Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 1001288026 -> 1001288038 (+0.00%) Cycle count: 92813392671 -> 92813392791 (+0.00%) Max live registers: 121935383 -> 121935399 (+0.00%) Max dispatch width: 19949928 -> 19949912 (-0.00%) Totals from 2 (0.00% of 2284670) affected shaders: Instrs: 1380 -> 1392 (+0.87%) Cycle count: 18940 -> 19060 (+0.63%) Max live registers: 136 -> 152 (+11.76%) Max dispatch width: 32 -> 16 (-50.00%) No fossil-db changes on Skylake. Suggested-by: Georg Lehmann Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	4338f7d033	nir/algebraic: Remove useless ftrunc inside f2i/f2u Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	c49d6e0480	nir/algebraic: Elide range clamping of f2u sources There are no shader-db changes on ELK platforms because those platforms don't support 8- or 16-bit integer types. v2: Restrict patterns generated such that the integer limits are exactly representable in the specified floating point format. With the exception of the value 0, this requires that float_sz > int_sz. This had no impact on shader-db or fossil-db on any Intel platform. Noticed by Georg. v3: Add a missing is_a_number. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total cycles in shared programs: 889936056 -> 889934082 (<.01%) cycles in affected programs: 65806 -> 63832 (-3.00%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00% Spill count: 519861 -> 519813 (-0.01%) Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01% Max live registers: 71738626 -> 71738696 (+0.00%) Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%) Totals from 1236 (0.16% of 790723) affected shaders: Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01% Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25% Spill count: 16509 -> 16461 (-0.29%) Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15% Max live registers: 250112 -> 250182 (+0.03%) Non SSA regs after NIR: 302368 -> 302114 (-0.08%) Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00% Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01% Spill count: 530603 -> 530615 (+0.00%) Fill count: 613231 -> 613273 (+0.01%) Max live registers: 46559041 -> 46559087 (+0.00%) Totals from 1237 (0.14% of 905547) affected shaders: Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01% Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37% Spill count: 17518 -> 17530 (+0.07%) Fill count: 32273 -> 32315 (+0.13%) Max live registers: 128360 -> 128406 (+0.04%) Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00% Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00% Max live registers: 46878430 -> 46878462 (+0.00%) Totals from 1233 (0.14% of 905427) affected shaders: Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00% Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10% Max live registers: 130782 -> 130814 (+0.02%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	986086c846	nir: Add saturating float to integer conversion opcodes v2: Add a comment around has_f2[ui]_sat explaining which opcodes it enables. Suggested by Georg. Cast u_uintN_max and friends to double in nir_opcodes.py. This ensures that an exact conversion is made. Eliminate duplicate conversions from half float to double. Both noticed by Georg. v3: Apply "NaN should be zero" fix suggested by Georg. Co-authored-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:05 +00:00
Lionel Landwerlin	301b71a19f	compiler: add an access flag for intel EU fusion Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	c7ac46a1d8	nir/lower_io: add get_io_index_src_number support for image intrinsics Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	ca1533cd03	nir/divergence: add a new mode to cover fused threads on Intel HW The Intel Gfx12.x generation of GPU has an architecture feature called EU fusion in which 2 subgroups run lock step. A typical case where this happens is a compute shader with 1x1x1 local workgroup size and a dispatch command of 2x1x1. In that case 2 threads will be run in lock step for each of the workgroup. This has been the sources of some troubles in the backend because one subgroup can run with all lanes disabled, requiring care for SEND messages using the NoMask flag (execution regardless of the lane mask). We found out that other things are happening when 2 subgroups run together : - the HW will use the surface/sampler handle from only one subgroup - the HW will use the sampler header from only one subgroup So one of the fused subgroup can access the wrong surface/sampler if the value is different between the 2 subgroups and that can happen even with subgroup uniform values. Fortunately we can flag SEND instructions to disable the fusion behavior (most likely at a performance cost). This change introduce a new divergence mode that tries to compute things divergent between subgroups so that we can flag instructions accordingly. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Simon Perretta	79923115e7	nir/unlower_io_to_vars: keep io bases intact when keeping intrinsics nir_recompute_io_bases will modify i/o intrinsics, which is not the expected behaviour when the keep_intrinsics flag is set. Fixes: `83aecc8f3f` ("mesa/st, nir: commonize unlower_io_to_vars pass") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>	2025-10-10 11:53:24 +01:00
Juan A. Suarez Romero	9f45f09b86	glsl: use array element type to validate assignment When comparing an vec3 and a vec4 array, scalar type is the same for both (float). Instead use the array element type to compare (that is, vec3 vs vec4). Fixes spec@glsl-1.20@compiler@invalid-vec4-array-to-vec3-array-conversion.vert piglit test. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37783>	2025-10-10 09:19:55 +00:00
Job Noorman	6d59a3e3e7	nir/lower_alu: use Knuth's Algorithm M for [iu]mul_high This significantly simplifies the handling of signed numbers as the same code path can handle signed and unsigned numbers by simply using ishr instead of ushr for some of the shifts. For both cases, the number of additions and shifts are also reduced. Note that LLVM uses the same algorithm. fossil-db stats for Turnip: Totals from 4849 (2.94% of 164705) affected shaders: MaxWaves: 52318 -> 52332 (+0.03%); split: +0.04%, -0.02% Instrs: 5262458 -> 5218922 (-0.83%); split: -0.87%, +0.05% CodeSize: 10831900 -> 10655170 (-1.63%); split: -1.64%, +0.01% NOPs: 829481 -> 836010 (+0.79%); split: -0.95%, +1.74% MOVs: 176187 -> 173788 (-1.36%); split: -3.27%, +1.91% COVs: 104096 -> 86543 (-16.86%); split: -16.87%, +0.01% Full: 90434 -> 90158 (-0.31%); split: -0.33%, +0.03% (ss): 131091 -> 130866 (-0.17%); split: -0.87%, +0.70% (sy): 55550 -> 55769 (+0.39%); split: -0.92%, +1.32% (ss)-stall: 406003 -> 407194 (+0.29%); split: -1.10%, +1.39% (sy)-stall: 1668213 -> 1678082 (+0.59%); split: -1.31%, +1.90% Preamble Instrs: 1105270 -> 1067290 (-3.44%); split: -3.50%, +0.06% Constlen: 423776 -> 423560 (-0.05%) Last helper: 1038202 -> 1035540 (-0.26%); split: -0.42%, +0.16% Last baryf: 38908 -> 38632 (-0.71%) Subgroup size: 336640 -> 336832 (+0.06%) Cat0: 916209 -> 922848 (+0.72%); split: -0.87%, +1.59% Cat1: 282813 -> 262845 (-7.06%); split: -7.49%, +0.43% Cat2: 2198715 -> 2183012 (-0.71%); split: -0.72%, +0.01% Cat3: 1390914 -> 1376421 (-1.04%) Cat7: 123127 -> 123116 (-0.01%); split: -0.24%, +0.23% Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>	2025-10-10 05:31:17 +00:00
Job Noorman	18f69890d1	nir: add nir_shr builder Sometimes we need to select between ishr/ushr based some condition; this builder makes this less verbose. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>	2025-10-10 05:31:17 +00:00
Olivia Lee	d600cd4c91	vtn_bindgen2: use anonymous namespace to avoid name collisions All vtn_bindgen2-generated files use the same 'vtn_bindgen_dummy' struct name. When linking more than one file (like in panfrost), the constructor and destructor symbols collide and every instance ends up running the same initialization. In panfrost, this results in us dropping any printf format strings that don't occur in v6. Signed-off-by: Olivia Lee <olivia.lee@collabora.com> Fixes: `b7447a94c8` ("vtn: add vtn_bindgen2 tool") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37798>	2025-10-09 19:18:58 +00:00
Emma Anholt	d01aae2fb1	nir: Add a shader bisect tool. When you're trying to figure out what shader some NIR pass broke, use nir_shader_bisect_select() to decide between NIR pass behaviors, and then nir_shader_bisect.py will help you automatically bisect down to which source_blake3 is at fault. Once it's identified, it prints you a C call you can use for selecting that shader specifically, which you can use for continuing on in your debugging. On a test I was looking at, this took 10 steps to bisect 134 shaders down to the source_blake3 of the NIR shader in question. This idea is heavily lifted from Job Noorman's ir3_shader_bisect. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37468>	2025-10-09 17:56:30 +00:00
Alyssa Rosenzweig	c1d75c6e51	treewide: use BITSET_CALLOC Via Coccinelle patch: @@ expression count; type T; @@ -calloc(BITSET_WORDS(count), sizeof(T)) +BITSET_CALLOC(count) @@ expression count; type T; @@ -calloc(sizeof(T), BITSET_WORDS(count)) +BITSET_CALLOC(count) Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37779>	2025-10-09 12:29:55 +00:00
Romaric Jodin	cb86341829	meson: remove '--outdir' argument in script Usage of '--outdir' argument in python scripts makes it very complicated for tools like ninja-to-soong to generate the Android equivalent build file. This is because the option is less clear on what will be generated. Instead, change it for '--out' where we give the full path of the file to generate. This has the good point of deduplicating the locations of the file name to have it only in 'meson.build'. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37741>	2025-10-08 20:51:20 +00:00
Mike Blumenkrantz	8f0ac427b3	glsl: fix gl_ViewID_OVR type to uint the spec defines this as a uint, and having it as an int breaks anyone trying to actually use it cc: mesa-stable Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37766>	2025-10-08 16:58:52 +00:00
Marek Olšák	3fe651f607	nir: remove load_smem_amd replaced by load_global_amd + ACCESS_SMEM_AMD Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>	2025-10-08 08:54:11 +00:00
Daniel Schürmann	7593667b0a	nir/divergence_analysis: check ACCESS_SMEM_AMD Revert "nir/divergence: make smem load_global_amd uniform" This reverts commit `2d0f93631c`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>	2025-10-08 08:53:55 +00:00
Daniel Schürmann	cacb390ec9	nir/load_store_vectorize: Fix parsing offsets through u2u64 Fixes: `cfba417316` ('nir/load_store_vectorize: optimize accesses with u2u64(ishl.nuw(iadd))') Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36936>	2025-10-08 08:53:51 +00:00

1 2 3 4 5 ...

11248 Commits