AlexIndustrial/mesa

Author	SHA1	Message	Date
Simon Perretta	e7c409cd29	pvr: amend num temps calculation when wg_size is not provided Fixes: `7a32dc673b` ("pvr: add device info and functions for calculating ava...") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>	2025-10-11 20:28:16 +01:00
Simon Perretta	1c1bc876fb	pvr: amend tile buffer size calculation for eot Fixes: `a67120cda3` ("pvr, pco: full support for tile buffer eot handling") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>	2025-10-11 20:28:16 +01:00
Simon Perretta	b0609a30b1	pco: improve early and late algebraic pass ordering Ensures early algebraic passes aren't called again following late algebraic passes, so that the latter's opts aren't undone (e.g. unfusing ffmas). Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>	2025-10-11 20:28:16 +01:00
Simon Perretta	e637d01ef2	pco: tidy and commonize conversion ops Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>	2025-10-11 20:28:16 +01:00
Simon Perretta	34b4b35ca8	pco: apply rounding mode to relevant conversion ops The rounding behaviour on [iu]2f32 ops needs to be explicitly set in order to match the implicit behaviour described in the KHR_shader_float_controls properties. Fixes: `e306abc6e6` ("pvr: implement KHR_shader_float_controls") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Acked-by: Frank Binns <frank.binns@imgtec.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37724>	2025-10-11 20:28:16 +01:00
Mel Henning	a89ab2993a	nvk: Reduce subc switches with events Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	a3ed200300	nvk/cmd_copy: Pipeline user copy_rect operations Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	e9432eb3e0	nvk/cmd_copy: Use PIPELINED for user transfers Vulkan requires applications to insert any necessary pipeline barriers. Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	08861bad46	nvk: WFI on the most recent subc This should be a bit faster. It also matches what the proprietary driver generates, based on the reverse engineering done here: https://gitlab.freedesktop.org/mhenning/re/-/tree/main/vk_test_overlap_exec Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	8447dba5b3	nvk: INVALIDATE_SHADER_CACHES on most recent subc This should be a bit faster. Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mohamed Ahmed	7a0e7d24bb	nvk: Use the compute MME for compute dispatch Switching from compute to 3D and vice versa leads to a long stall which destroys compute performance. This switches to the compute MME on Ampere onwards (which was where it was added) for compute dispatches which eliminates stalling from sub-channel switching in these cases. Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mohamed Ahmed	146a64524d	nouveau/mme: Add unit tests for sharing between compute and 3D scratch registers Co-developed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Tested-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Faith Ekstrand	0bfe27553d	nvk: Actually reserve 1/2 for FALCON In `03f785083f` ("nvk: Reserve MME scratch area for communicating with FALCON"), we said we reserved these but actually only reserved 0. Only 0 is actually used today but if we're going to claim to reserve registers we should actually do it. Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mohamed Ahmed	17ab1d463f	nouveau/headers: Add AMPERE_B compute subchannel definition Reviewed-by: Karol Herbst <kherbst@redhat.com> Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	0e3781df7f	vulkan: Drop vk_pipeline_stage_flags2_has_*_shader These are no longer used anywhere. Moreover, it's not clear that they can be used for a correct implementation of pipeline barriers since a correct implementation cannot ignore execution deps in non-shader stages. Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	2eeef34e35	nvk/cmd_buffer: Remove redundant tests for access In each of these cases, the spec mandates that apps pair a memory barrier specified with access with a relevant exec barrrier specified by stages. We therefore don't need to wfi based on access - the tests on stage are sufficient. Acked-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	515793d5bb	nvk: Fix execution deps in pipeline barriers We were under-synchronizing before. In particular, `stages` form execution barriers even in the absence of a memory barrier in the `access` flags. The particular issue that prompted this was one where we weren't waiting on a pipeline barrier in Baldur's Gate 3 with: srcStageMask == VK_PIPELINE_STAGE_2_FRAGMENT_SHADER_BIT && srcAccessMask == VK_ACCESS_2_SHADER_READ_BIT && dstStageMask == (VK_PIPELINE_STAGE_2_EARLY_FRAGMENT_TESTS_BIT \| VK_PIPELINE_STAGE_2_LATE_FRAGMENT_TESTS_BIT) && dstAccessMask == (VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_READ_BIT \| VK_ACCESS_2_DEPTH_STENCIL_ATTACHMENT_WRITE_BIT) Based on the spec and discussion in https://github.com/KhronosGroup/Vulkan-Docs/issues/131 the read bit in srcAccessMask doesn't really matter here - what matters is that there's an execution barrier on the fragment stage which needs to prevent the fragment shader exection from overlapping with the later call's fragment tests (which write to the depth attachment). Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13909 Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	895bbb7601	nvk: Combine BARRIER_{COMPUTE,RENDER}_WFI When we want to WFI, we only need to do so on a single channel. The others will implicitly get a WFI from the channel switch. Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:24 +00:00
Mel Henning	6c44390e80	nvk: Only run one INVALIDATE_SHADER_CACHES This is presumably the same cache across compute and 3d, so we only need to run one of these, not two. Reviewed-by: Mary Guillemard <mary@mary.zone> Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37671>	2025-10-11 16:58:23 +00:00
Lorenzo Rossi	b56b5b90f7	nvk: Fix QMD buffer length on upload Current code allocates the maximum QMD data for all generations and uploads everything, even on generations where a smaller QMD buffer suffices. This is not only wasteful, but actually crashes Kepler GPUs due to complications with the QMD queue. Only upload the useful bytes of the QMD buffer. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14070 Fixes: `0e268dad00` ("nvk: Allow for larger QMDs") Signed-off-by: Lorenzo Rossi <git@rossilorenzo.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37815>	2025-10-11 08:20:22 +00:00
Surafel Assefa	a219308867	wsi: Implements scaling controls for DRI3 presentation. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30701>	2025-10-11 06:59:37 +00:00
Caio Oliveira	74859c19fb	intel/executor: Add a matrix multiplication example Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>	2025-10-11 01:02:45 +00:00
Caio Oliveira	1e0ee84841	intel/executor: Add DPAS examples for HF/F, UB/UD and BF/F Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>	2025-10-11 01:02:45 +00:00
Caio Oliveira	62f07dc5e3	intel/executor: Add script directory to `package.path` In Lua, modules (i.e. files with lua code) are loaded by using the standard library require(), e.g. ``` local mylib = require("mylib") mylib.do_something() ``` The require() will decide where to look by peeking at `package.path` table. By default it doesn't include the scripts directory, so running executor from the script directory vs. from the root of the repo would yield different results (require works vs. require fail to find the module). This patch includes the script directory to avoid this issue. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>	2025-10-11 01:02:45 +00:00
Caio Oliveira	86947062e9	intel/executor: Expose a devinfo table So we can pull other values from devinfo struct. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>	2025-10-11 01:02:44 +00:00
Caio Oliveira	5987269750	intel/executor: Drop check_ver and check_verx10 functions Favor explicit version checks, that can use different types of comparisons other than equality on a list. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37805>	2025-10-11 01:02:44 +00:00
Emma Anholt	f8729ee920	ir3: Use bitset range operations. This sped up the debugoptimized compile of a fossil I was looking at by 7%. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37777>	2025-10-10 23:13:04 +00:00
Emma Anholt	aa85e3331f	ir3/parser: Make sure relative accesses have a size set. This will avoid assertion failures about a size==0 in the upcoming change to regmask bitset handling, when collect_info() usees them to track references into the current alias table. We know that relative accesses won't go to the alias table, but that code doesn't. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37777>	2025-10-10 23:13:04 +00:00
Emma Anholt	30b7772ae4	ir3: Move the big block of C support code out of the parser .y file. This way you get nice syntax highlighting and clang-formatting and all that when trying to edit the C code. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37777>	2025-10-10 23:13:04 +00:00
Lionel Landwerlin	febac6d9bd	anv: fix query copy with shaders First this is only possible on RCS or CCS engines. Second if on CCS, we need to use a compute shader, 3D won't work. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37818>	2025-10-10 21:31:09 +00:00
Jesse Natalie	c2d288bf97	microsoft/compiler: Respect write masks when lowering unaligned loads and stores Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37778>	2025-10-10 19:53:15 +00:00
Jesse Natalie	b3242516ad	microsoft/compiler: Use lower_mem_access_bit_sizes for scratch/shared Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37778>	2025-10-10 19:53:15 +00:00
Emma Anholt	f7cbc7b1c5	radv: Allocate BOs as implicit sync even if the WSI is doing implicit sync. As noted, the flag we allocate with controls whether anyone can implicit sync on the BO through amdgpu interfaces, not just whether our fd does. This restores radv to the behavior before the regressing commit. Fixes: `4dcf32c56e` ("wsi/drm: Don't request implicit sync if we're doing implicit sync ourselves.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37772>	2025-10-10 19:17:04 +00:00
Emma Anholt	38ac55ebff	radv: Restore marking WSI image's mem->buffer as uncached. Prior to `4dcf32c56e`, radv was getting a request for implicit sync, even when we were doing the work to do implicit sync in the WSI. Once that was turned off, we incidentally dropped flagging WSI's mem->buffer as uncached, due to it being under the wrong condition. Fixes: `4dcf32c56e` ("wsi/drm: Don't request implicit sync if we're doing implicit sync ourselves.") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37772>	2025-10-10 19:17:04 +00:00
Ian Romanick	ca493b5c45	brw: elk: Fix name of function in comment Trivial. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:11 +00:00
Ian Romanick	1e691e68e2	nir/algebraic: Optimize bfi with odd-valued mask to bitfield_select shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17181254 -> 17181046 (<.01%) instructions in affected programs: 35834 -> 35626 (-0.58%) helped: 130 / HURT: 2 total cycles in shared programs: 888543370 -> 888554248 (<.01%) cycles in affected programs: 7443984 -> 7454862 (0.15%) helped: 95 / HURT: 87 fossil-db: Lunar Lake Totals: Instrs: 233260196 -> 233259474 (-0.00%); split: -0.00%, +0.00% Cycle count: 32754567116 -> 32754515890 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738442 -> 71738398 (-0.00%); split: -0.00%, +0.00% Totals from 6842 (0.87% of 790721) affected shaders: Instrs: 5566926 -> 5566204 (-0.01%); split: -0.01%, +0.00% Cycle count: 512487046 -> 512435820 (-0.01%); split: -0.20%, +0.19% Max live registers: 1100656 -> 1100612 (-0.00%); split: -0.00%, +0.00% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 264071212 -> 264066944 (-0.00%); split: -0.00%, +0.00% Cycle count: 26552458051 -> 26553286277 (+0.00%); split: -0.00%, +0.01% Spill count: 530380 -> 530084 (-0.06%) Fill count: 613416 -> 612900 (-0.08%) Scratch Memory Size: 20089856 -> 20075520 (-0.07%) Max live registers: 46558852 -> 46558811 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 8034616 -> 8034584 (-0.00%) Totals from 6653 (0.73% of 905545) affected shaders: Instrs: 5750844 -> 5746576 (-0.07%); split: -0.08%, +0.00% Cycle count: 416414845 -> 417243071 (+0.20%); split: -0.20%, +0.40% Spill count: 1953 -> 1657 (-15.16%) Fill count: 3556 -> 3040 (-14.51%) Scratch Memory Size: 92160 -> 77824 (-15.56%) Max live registers: 566003 -> 565962 (-0.01%); split: -0.01%, +0.00% Max dispatch width: 55768 -> 55736 (-0.06%) No shader-db or fossil-db changes on any previous Intel platforms. Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:11 +00:00
Ian Romanick	b948e6d503	brw: Use BFN to implement nir_opt_bitfield_select shader-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) total instructions in shared programs: 17181559 -> 17181254 (<.01%) instructions in affected programs: 250921 -> 250616 (-0.12%) helped: 303 / HURT: 0 total cycles in shared programs: 888542568 -> 888543370 (<.01%) cycles in affected programs: 49861772 -> 49862574 (<.01%) helped: 181 / HURT: 110 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 233260591 -> 233260196 (-0.00%); split: -0.00%, +0.00% Cycle count: 32754501248 -> 32754567116 (+0.00%); split: -0.00%, +0.00% Max live registers: 71738476 -> 71738442 (-0.00%) Non SSA regs after NIR: 67837262 -> 67837108 (-0.00%); split: -0.00%, +0.00% Totals from 226 (0.03% of 790721) affected shaders: Instrs: 382227 -> 381832 (-0.10%); split: -0.15%, +0.05% Cycle count: 72863878 -> 72929746 (+0.09%); split: -0.65%, +0.74% Max live registers: 36557 -> 36523 (-0.09%) Non SSA regs after NIR: 60427 -> 60273 (-0.25%); split: -0.26%, +0.00% No shader-db or fossil-db changes on any previous Intel platforms. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:11 +00:00
Ian Romanick	4193895145	brw/cmod: Enable limited cmod propagation for BFN cmod propagation needs more work. Since the result type is always UD, BRW_CONDITION_G should be able to substitute for NZ. Either that or users of the condition could be rewritten to use an inverted condition. v2: Add a couple more unit tests. Suggested by Matt. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:11 +00:00
Ian Romanick	fb193ac190	brw/builder: Add BFN Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:10 +00:00
Ian Romanick	a947e0c4db	brw: Constant propagation and constant combining support for BFN v2: Commute immediate values out of src[1]. Reviewed-by: Matt Turner <mattst88@gmail.com> [v1] Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:10 +00:00
Ian Romanick	8a71f5e672	brw: BFN does not support source modifiers Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:10 +00:00
Ian Romanick	60c07e500d	brw: Basic validation for BFN Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:10 +00:00
Ian Romanick	d2077e24f6	brw/disasm: Pretty print the BFN equation as an annotation Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:09 +00:00
Ian Romanick	fdb01f2a5a	brw/disasm: Fix BFN disassembly of src1 and src2 The negate and abs bits of src1 and src2 are repurposed for some of the function control value bits. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:09 +00:00
Zach Battleman	ca2a067469	brw: Initial bits of BFN support v2 (idr): So much rebasing. Deleted a bunch of code that we're not going to need yet. v3 (Ken): bfn inst encoding fix v4 (idr): Add BFN to brw_get_lowered_simd_width. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:09 +00:00
Ian Romanick	f7939f2fdc	nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub I noticed some things related to this while implementing support for bitfield_select / BFN in BRW. shader-db: Lunar Lake total instructions in shared programs: 17183140 -> 17183128 (<.01%) instructions in affected programs: 3830 -> 3818 (-0.31%) helped: 6 / HURT: 0 total cycles in shared programs: 889936934 -> 889936056 (<.01%) cycles in affected programs: 253758 -> 252880 (-0.35%) helped: 4 / HURT: 2 No shader-db changes on any other Intel platform. fossil-db: Lunar Lake Totals: Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738646 -> 71738626 (-0.00%) Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%) Totals from 177 (0.02% of 790723) affected shaders: Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00% Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01% Max live registers: 39364 -> 39344 (-0.05%) Non SSA regs after NIR: 70453 -> 70455 (+0.00%) Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095611 -> 264095358 (-0.00%) Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00% Fill count: 613233 -> 613231 (-0.00%) Totals from 123 (0.01% of 905547) affected shaders: Instrs: 334830 -> 334577 (-0.08%) Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22% Fill count: 4145 -> 4143 (-0.05%) Tiger Lake and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 269733849 -> 269733590 (-0.00%) Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01% Totals from 123 (0.01% of 903812) affected shaders: Instrs: 338617 -> 338358 (-0.08%) Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40% Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	aa53735b66	nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert Intel platforms will soon implement both bfi and bitfield_select. bfi is more efficient for bitfield_insert. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	08ec408061	nir/algebraic: Optimize f2u of negative value to zero The eliminated SENDs are from a single app that has a bunch of fragment shaders with a sequence like: con 32 %495 = fmul! %203.i, %1 (0.000000) con 32 %496 = ffma! %203.j, %1 (0.000000), %495 con 32 %497 = ffma! %203.k, %1 (0.000000), %496 con 32 %498 = ffma! %203.l, %1 (0.000000), %497 con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0) con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0) con 32 %501 = f2u32 %498 con 32 %502 = umin %501, %172 (0x4) con 32 %503 = ishl %502, %172 (0x4) con 32 %504 = load_const (0x00000040 = 64) con 32 %505 = umin %503, %504 (0x40) con 32 %506 = iadd %500, %505 The `f2u` is replaced with 0, and that makes the `ffma` dot-product sequence be unused. Since it is unused, most of the preceeding block gets eliminated. A lot of instructions after the `f2u` are also eliminated by other algebraic optimizations. Most importantly, %203 is the result of a `load_ubo_uniform_block_intel` that is eliminated. No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00% Send messages: 40892036 -> 40887569 (-0.01%) Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00% Max live registers: 190030365 -> 190030367 (+0.00%) Max dispatch width: 47415040 -> 47415024 (-0.00%) Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00% Totals from 2234 (0.11% of 1955134) affected shaders: Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00% Send messages: 44179 -> 39712 (-10.11%) Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00% Max live registers: 367357 -> 367359 (+0.00%) Max dispatch width: 39184 -> 39168 (-0.04%) Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	5667459ff1	nir/algebraic: Don't introduce undefined behavior in f2u conversion If the source -1.0 < x < 0.0, simply removing the ftrun will introduce undefined behavior. By chance of how at least Intel and NVIDIA GPUs implement f2u, this has Just Worked. No shader-db changes on any Intel platform. fossil-db: Lunar Lake Totals: Instrs: 913264354 -> 913264366 (+0.00%) Cycle count: 104953995530 -> 104953996854 (+0.00%) Max live registers: 189266026 -> 189266058 (+0.00%) Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%) Totals from 24 (0.00% of 1984794) affected shaders: Instrs: 4669 -> 4681 (+0.26%) Cycle count: 50610 -> 51934 (+2.62%) Max live registers: 1222 -> 1254 (+2.62%) Non SSA regs after NIR: 1174 -> 1126 (-4.09%) Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 1001288026 -> 1001288038 (+0.00%) Cycle count: 92813392671 -> 92813392791 (+0.00%) Max live registers: 121935383 -> 121935399 (+0.00%) Max dispatch width: 19949928 -> 19949912 (-0.00%) Totals from 2 (0.00% of 2284670) affected shaders: Instrs: 1380 -> 1392 (+0.87%) Cycle count: 18940 -> 19060 (+0.63%) Max live registers: 136 -> 152 (+11.76%) Max dispatch width: 32 -> 16 (-50.00%) No fossil-db changes on Skylake. Suggested-by: Georg Lehmann Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	4338f7d033	nir/algebraic: Remove useless ftrunc inside f2i/f2u Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00

1 2 3 4 5 ...

197508 Commits