AlexIndustrial/mesa

Author	SHA1	Message	Date
Ian Romanick	60c07e500d	brw: Basic validation for BFN Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:10 +00:00
Ian Romanick	d2077e24f6	brw/disasm: Pretty print the BFN equation as an annotation Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:09 +00:00
Ian Romanick	fdb01f2a5a	brw/disasm: Fix BFN disassembly of src1 and src2 The negate and abs bits of src1 and src2 are repurposed for some of the function control value bits. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:09 +00:00
Zach Battleman	ca2a067469	brw: Initial bits of BFN support v2 (idr): So much rebasing. Deleted a bunch of code that we're not going to need yet. v3 (Ken): bfn inst encoding fix v4 (idr): Add BFN to brw_get_lowered_simd_width. Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:09 +00:00
Ian Romanick	f7939f2fdc	nir/range_analysis: Handle bfi and bitfield_select in get_alu_uub I noticed some things related to this while implementing support for bitfield_select / BFN in BRW. shader-db: Lunar Lake total instructions in shared programs: 17183140 -> 17183128 (<.01%) instructions in affected programs: 3830 -> 3818 (-0.31%) helped: 6 / HURT: 0 total cycles in shared programs: 889936934 -> 889936056 (<.01%) cycles in affected programs: 253758 -> 252880 (-0.35%) helped: 4 / HURT: 2 No shader-db changes on any other Intel platform. fossil-db: Lunar Lake Totals: Instrs: 233285343 -> 233284796 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756777978 -> 32756399804 (-0.00%); split: -0.00%, +0.00% Max live registers: 71738646 -> 71738626 (-0.00%) Non SSA regs after NIR: 67837900 -> 67837902 (+0.00%) Totals from 177 (0.02% of 790723) affected shaders: Instrs: 389849 -> 389302 (-0.14%); split: -0.14%, +0.00% Cycle count: 356341872 -> 355963698 (-0.11%); split: -0.11%, +0.01% Max live registers: 39364 -> 39344 (-0.05%) Non SSA regs after NIR: 70453 -> 70455 (+0.00%) Meteor Lake, DG2, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095611 -> 264095358 (-0.00%) Cycle count: 26555705299 -> 26554303407 (-0.01%); split: -0.01%, +0.00% Fill count: 613233 -> 613231 (-0.00%) Totals from 123 (0.01% of 905547) affected shaders: Instrs: 334830 -> 334577 (-0.08%) Cycle count: 326531667 -> 325129775 (-0.43%); split: -0.65%, +0.22% Fill count: 4145 -> 4143 (-0.05%) Tiger Lake and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 269733849 -> 269733590 (-0.00%) Cycle count: 25240548036 -> 25241435039 (+0.00%); split: -0.00%, +0.01% Totals from 123 (0.01% of 903812) affected shaders: Instrs: 338617 -> 338358 (-0.08%) Cycle count: 326605644 -> 327492647 (+0.27%); split: -0.13%, +0.40% Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	aa53735b66	nir/algebraic: Prefer bfi over bitfield_select for bitfield_insert Intel platforms will soon implement both bfi and bitfield_select. bfi is more efficient for bitfield_insert. Reviewed-by: Matt Turner <mattst88@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	08ec408061	nir/algebraic: Optimize f2u of negative value to zero The eliminated SENDs are from a single app that has a bunch of fragment shaders with a sequence like: con 32 %495 = fmul! %203.i, %1 (0.000000) con 32 %496 = ffma! %203.j, %1 (0.000000), %495 con 32 %497 = ffma! %203.k, %1 (0.000000), %496 con 32 %498 = ffma! %203.l, %1 (0.000000), %497 con 32 %499 = @load_reloc_const_intel (param_idx=1, base=0) con 32 %500 = @load_reloc_const_intel (param_idx=0, base=0) con 32 %501 = f2u32 %498 con 32 %502 = umin %501, %172 (0x4) con 32 %503 = ishl %502, %172 (0x4) con 32 %504 = load_const (0x00000040 = 64) con 32 %505 = umin %503, %504 (0x40) con 32 %506 = iadd %500, %505 The `f2u` is replaced with 0, and that makes the `ffma` dot-product sequence be unused. Since it is unused, most of the preceeding block gets eliminated. A lot of instructions after the `f2u` are also eliminated by other algebraic optimizations. Most importantly, %203 is the result of a `load_ubo_uniform_block_intel` that is eliminated. No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Lunar Lake shown) Totals: Instrs: 919895603 -> 919804051 (-0.01%); split: -0.01%, +0.00% Send messages: 40892036 -> 40887569 (-0.01%) Cycle count: 99176770712 -> 99174971806 (-0.00%); split: -0.00%, +0.00% Max live registers: 190030365 -> 190030367 (+0.00%) Max dispatch width: 47415040 -> 47415024 (-0.00%) Non SSA regs after NIR: 228872538 -> 228863608 (-0.00%); split: -0.00%, +0.00% Totals from 2234 (0.11% of 1955134) affected shaders: Instrs: 1989743 -> 1898191 (-4.60%); split: -4.60%, +0.00% Send messages: 44179 -> 39712 (-10.11%) Cycle count: 25416114 -> 23617208 (-7.08%); split: -7.08%, +0.00% Max live registers: 367357 -> 367359 (+0.00%) Max dispatch width: 39184 -> 39168 (-0.04%) Non SSA regs after NIR: 471173 -> 462243 (-1.90%); split: -1.90%, +0.00% Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:08 +00:00
Ian Romanick	5667459ff1	nir/algebraic: Don't introduce undefined behavior in f2u conversion If the source -1.0 < x < 0.0, simply removing the ftrun will introduce undefined behavior. By chance of how at least Intel and NVIDIA GPUs implement f2u, this has Just Worked. No shader-db changes on any Intel platform. fossil-db: Lunar Lake Totals: Instrs: 913264354 -> 913264366 (+0.00%) Cycle count: 104953995530 -> 104953996854 (+0.00%) Max live registers: 189266026 -> 189266058 (+0.00%) Non SSA regs after NIR: 227779417 -> 227779369 (-0.00%) Totals from 24 (0.00% of 1984794) affected shaders: Instrs: 4669 -> 4681 (+0.26%) Cycle count: 50610 -> 51934 (+2.62%) Max live registers: 1222 -> 1254 (+2.62%) Non SSA regs after NIR: 1174 -> 1126 (-4.09%) Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown) Totals: Instrs: 1001288026 -> 1001288038 (+0.00%) Cycle count: 92813392671 -> 92813392791 (+0.00%) Max live registers: 121935383 -> 121935399 (+0.00%) Max dispatch width: 19949928 -> 19949912 (-0.00%) Totals from 2 (0.00% of 2284670) affected shaders: Instrs: 1380 -> 1392 (+0.87%) Cycle count: 18940 -> 19060 (+0.63%) Max live registers: 136 -> 152 (+11.76%) Max dispatch width: 32 -> 16 (-50.00%) No fossil-db changes on Skylake. Suggested-by: Georg Lehmann Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	4338f7d033	nir/algebraic: Remove useless ftrunc inside f2i/f2u Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	c49d6e0480	nir/algebraic: Elide range clamping of f2u sources There are no shader-db changes on ELK platforms because those platforms don't support 8- or 16-bit integer types. v2: Restrict patterns generated such that the integer limits are exactly representable in the specified floating point format. With the exception of the value 0, this requires that float_sz > int_sz. This had no impact on shader-db or fossil-db on any Intel platform. Noticed by Georg. v3: Add a missing is_a_number. shader-db: All Intel platforms had similar results. (Lunar Lake shown) total cycles in shared programs: 889936056 -> 889934082 (<.01%) cycles in affected programs: 65806 -> 63832 (-3.00%) helped: 2 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 233284796 -> 233282917 (-0.00%); split: -0.00%, +0.00% Cycle count: 32756399804 -> 32754972188 (-0.00%); split: -0.01%, +0.00% Spill count: 519861 -> 519813 (-0.01%) Fill count: 663650 -> 663626 (-0.00%); split: -0.01%, +0.01% Max live registers: 71738626 -> 71738696 (+0.00%) Non SSA regs after NIR: 67837902 -> 67837648 (-0.00%) Totals from 1236 (0.16% of 790723) affected shaders: Instrs: 2134504 -> 2132625 (-0.09%); split: -0.09%, +0.01% Cycle count: 604922278 -> 603494662 (-0.24%); split: -0.48%, +0.25% Spill count: 16509 -> 16461 (-0.29%) Fill count: 32760 -> 32736 (-0.07%); split: -0.22%, +0.15% Max live registers: 250112 -> 250182 (+0.03%) Non SSA regs after NIR: 302368 -> 302114 (-0.08%) Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 264095370 -> 264094056 (-0.00%); split: -0.00%, +0.00% Cycle count: 26554146277 -> 26553027268 (-0.00%); split: -0.01%, +0.01% Spill count: 530603 -> 530615 (+0.00%) Fill count: 613231 -> 613273 (+0.01%) Max live registers: 46559041 -> 46559087 (+0.00%) Totals from 1237 (0.14% of 905547) affected shaders: Instrs: 2262517 -> 2261203 (-0.06%); split: -0.07%, +0.01% Cycle count: 518219799 -> 517100790 (-0.22%); split: -0.59%, +0.37% Spill count: 17518 -> 17530 (+0.07%) Fill count: 32273 -> 32315 (+0.13%) Max live registers: 128360 -> 128406 (+0.04%) Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 269849640 -> 269848198 (-0.00%); split: -0.00%, +0.00% Cycle count: 26718329643 -> 26718289020 (-0.00%); split: -0.00%, +0.00% Max live registers: 46878430 -> 46878462 (+0.00%) Totals from 1233 (0.14% of 905427) affected shaders: Instrs: 2324225 -> 2322783 (-0.06%); split: -0.06%, +0.00% Cycle count: 531467501 -> 531426878 (-0.01%); split: -0.11%, +0.10% Max live registers: 130782 -> 130814 (+0.02%) Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:07 +00:00
Ian Romanick	073ffceef6	elk: Enable saturating float to integer conversion opcodes Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:06 +00:00
Ian Romanick	65e8220180	brw: Enable saturating float to integer conversion opcodes Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:06 +00:00
Ian Romanick	986086c846	nir: Add saturating float to integer conversion opcodes v2: Add a comment around has_f2[ui]_sat explaining which opcodes it enables. Suggested by Georg. Cast u_uintN_max and friends to double in nir_opcodes.py. This ensures that an exact conversion is made. Eliminate duplicate conversions from half float to double. Both noticed by Georg. v3: Apply "NaN should be zero" fix suggested by Georg. Co-authored-by: Georg Lehmann <dadschoorse@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37186>	2025-10-10 17:25:05 +00:00
Pohsiang (John) Hsu	5ce8b34a10	mediafoundation: update version to 1.07 Reviewed-by: Yubo Xie <yuboxie@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>	2025-10-10 09:36:53 -07:00
Pohsiang (John) Hsu	03baa8ac72	mediafoundation: remove extra ';' Reviewed-by: Yubo Xie <yuboxie@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>	2025-10-10 09:36:44 -07:00
Pohsiang (John) Hsu	eb088e339f	mediafoundation: periodic clang format - no code changes Reviewed-by: Yubo Xie <yuboxie@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>	2025-10-10 09:36:13 -07:00
Pohsiang (John) Hsu	d35735b32d	mediafoundation: create sample allocator for SW input sample on demand to save video memory Reviewed-by: Yubo Xie <yuboxie@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>	2025-10-10 09:35:58 -07:00
Silvio Vilerino	5061b7ba1a	mediafoundation: mftransform async slices parsing, avoid heap allocation inside loop Reviewed-by: Pohsiang (John) Hsu <pohhsu@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37820>	2025-10-10 09:34:39 -07:00
Ashish Chauhan	6e189ba6c1	pvr: Drop '-experimental' suffix from the 'imagination' build option The imagination-experimental flag has been replaced with the imagination flag, as the driver is now Vulkan conformant. Signed-off-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>	2025-10-10 15:29:04 +00:00
Ashish Chauhan	1143363a4f	pvr: Drop broken driver environment variable check for BXS-4-64 The Imagination PowerVR Vulkan driver is now conformant on BXS-4-64, so drop the PVR_I_WANT_A_BROKEN_VULKAN_DRIVER runtime check for this GPU. This eliminates the need for the user to explicitly opt in via an environment variable. Signed-off-by: Ashish Chauhan <Ashish.Chauhan@imgtec.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>	2025-10-10 15:29:04 +00:00
Frank Binns	206bef1560	docs/features: claim vk 1.2 for pvr Although the PowerVR driver isn't passing Vulkan 1.2 conformance yet, all the required support has been implemented and it's very close to passing all the tests now. Signed-off-by: Frank Binns <frank.binns@imgtec.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37761>	2025-10-10 15:29:03 +00:00
Ashley Smith	a8fb3671e8	panfrost,mesa: Fix versions for EXT_shader_clock ES version was missed from extension table Fixes: `2ce20170` ("mesa: Add support for GL_EXT_shader_clock") Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Signed-off-by: Ashley Smith <ashley.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37794>	2025-10-10 14:58:34 +00:00
Ashley Smith	09d86f9863	panfrost,mesa: Fix versions for EXT_shader_realtime_clock ES version was missed from extension table Fixes: `c5500cd1` ("mesa: Add support for GL_EXT_shader_realtime_clock") Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Signed-off-by: Ashley Smith <ashley.smith@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37794>	2025-10-10 14:58:34 +00:00
Hans-Kristian Arntzen	2848901722	radv: Actually fail custom border color sampler creation. Signed-off-by: Hans-Kristian Arntzen <post@arntzen-software.no> Fixes: `a52483d9e7` ("radv: fix capture/replay with sampler border color") Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37787>	2025-10-10 14:25:54 +00:00
Samuel Pitoiset	183ed8046c	radv: allow VK_FORMAT_S8_UINT with host image copy Depth/stencil formats still need to be properly implemented. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37748>	2025-10-10 13:46:51 +00:00
Samuel Pitoiset	ef900e93fc	ac/surface: fix host image copies with stencil-only Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37748>	2025-10-10 13:46:51 +00:00
Samuel Pitoiset	9a7f1401d8	ac/surface: fix host image copies with 96-bits formats Fixes dEQP-VK.image.host_image_copy.simple.r32g32b32_* with RADV_PERFTEST=hic on RADV. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37748>	2025-10-10 13:46:51 +00:00
Samuel Pitoiset	d063072182	radv: rename radv_mark_descriptor_sets_dirty() Descriptor heaps will be marked as dirty in this function too. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>	2025-10-10 13:22:05 +00:00
Samuel Pitoiset	34b3dae3b6	radv: make radv_descriptor_get_va() a static function Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>	2025-10-10 13:22:05 +00:00
Samuel Pitoiset	08dbab0600	radv: rename shader arg descriptor_sets to descriptors It's more generic and descriptor heaps will use it too. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>	2025-10-10 13:22:03 +00:00
Samuel Pitoiset	609ae4e647	radv: rename indirect_descriptor_sets to indirect_descriptors With descriptor heap the driver will also have to emit indirect descriptor heaps in some cases. Rename couple of things to make them more generic. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>	2025-10-10 13:22:03 +00:00
Samuel Pitoiset	0ff1ce4ac5	radv: use force_indirect_desc_sets when creating RT prologs This is cleaner and this field has been added exactly for that. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>	2025-10-10 13:22:02 +00:00
Samuel Pitoiset	055b10a75c	radv: do not initialize HiZ on transfer queue on RDNA4 Emitting compute dispatches on SDMA would just hang. This fixes pending depth/stencil copy tests on transfer queue with RADV_PERFTEST=transfer_queue. Fixes: `e6c485afb0` ("radv: initialize HiZ metadata during image layout transitions") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37795>	2025-10-10 12:50:02 +00:00
Martin Roukala (né Peres)	0fbd9e3894	zink/ci: run the a750 job in pre-merge In order to fit within the time budget, we parallelize the job. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37758>	2025-10-10 11:48:51 +00:00
Martin Roukala (né Peres)	33d7be0d9f	turnip/ci: squeeze a750-vk into 4 jobs The drop in coverage should allow us enable to pre-merge testing on zink. While we are at it, I used the result of a previous stress test to prune the flakes list. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37758>	2025-10-10 11:48:51 +00:00
Lionel Landwerlin	b8ae4ede60	brw: add serialize send stats Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	37a9c5411f	brw: serialize messages on Gfx12.x if required The Intel EU fusion feature needs to be disabled on SEND messages where either the texture handle, sampler handle, sampler header is not identical on fused threads. This is the case in particular with accesses on non-uniform texture/sampler handles but could also strike with dynamic programmable offsets (currently disabled). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	301b71a19f	compiler: add an access flag for intel EU fusion Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	c7ac46a1d8	nir/lower_io: add get_io_index_src_number support for image intrinsics Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Alyssa Anne Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Lionel Landwerlin	ca1533cd03	nir/divergence: add a new mode to cover fused threads on Intel HW The Intel Gfx12.x generation of GPU has an architecture feature called EU fusion in which 2 subgroups run lock step. A typical case where this happens is a compute shader with 1x1x1 local workgroup size and a dispatch command of 2x1x1. In that case 2 threads will be run in lock step for each of the workgroup. This has been the sources of some troubles in the backend because one subgroup can run with all lanes disabled, requiring care for SEND messages using the NoMask flag (execution regardless of the lane mask). We found out that other things are happening when 2 subgroups run together : - the HW will use the surface/sampler handle from only one subgroup - the HW will use the sampler header from only one subgroup So one of the fused subgroup can access the wrong surface/sampler if the value is different between the 2 subgroups and that can happen even with subgroup uniform values. Fortunately we can flag SEND instructions to disable the fusion behavior (most likely at a performance cost). This change introduce a new divergence mode that tries to compute things divergent between subgroups so that we can flag instructions accordingly. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37394>	2025-10-10 11:19:39 +00:00
Simon Perretta	79923115e7	nir/unlower_io_to_vars: keep io bases intact when keeping intrinsics nir_recompute_io_bases will modify i/o intrinsics, which is not the expected behaviour when the keep_intrinsics flag is set. Fixes: `83aecc8f3f` ("mesa/st, nir: commonize unlower_io_to_vars pass") Signed-off-by: Simon Perretta <simon.perretta@imgtec.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37725>	2025-10-10 11:53:24 +01:00
Kenneth Graunke	dd9e002129	brw: Fix mesh shader asserts in clip/cull distance setting mesh doesn't use brw_vue_prog_data. Also, I had been catching TCS shaders here, and shouldn't. Fixes: `bf76e86bc8` ("brw: Refactor clip/cull distance mask setting into a helper") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37809>	2025-10-10 09:51:26 +00:00
Juan A. Suarez Romero	9f45f09b86	glsl: use array element type to validate assignment When comparing an vec3 and a vec4 array, scalar type is the same for both (float). Instead use the array element type to compare (that is, vec3 vs vec4). Fixes spec@glsl-1.20@compiler@invalid-vec4-array-to-vec3-array-conversion.vert piglit test. Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com> Reviewed-by: Marek Olšák <maraeo@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37783>	2025-10-10 09:19:55 +00:00
Job Noorman	6d59a3e3e7	nir/lower_alu: use Knuth's Algorithm M for [iu]mul_high This significantly simplifies the handling of signed numbers as the same code path can handle signed and unsigned numbers by simply using ishr instead of ushr for some of the shifts. For both cases, the number of additions and shifts are also reduced. Note that LLVM uses the same algorithm. fossil-db stats for Turnip: Totals from 4849 (2.94% of 164705) affected shaders: MaxWaves: 52318 -> 52332 (+0.03%); split: +0.04%, -0.02% Instrs: 5262458 -> 5218922 (-0.83%); split: -0.87%, +0.05% CodeSize: 10831900 -> 10655170 (-1.63%); split: -1.64%, +0.01% NOPs: 829481 -> 836010 (+0.79%); split: -0.95%, +1.74% MOVs: 176187 -> 173788 (-1.36%); split: -3.27%, +1.91% COVs: 104096 -> 86543 (-16.86%); split: -16.87%, +0.01% Full: 90434 -> 90158 (-0.31%); split: -0.33%, +0.03% (ss): 131091 -> 130866 (-0.17%); split: -0.87%, +0.70% (sy): 55550 -> 55769 (+0.39%); split: -0.92%, +1.32% (ss)-stall: 406003 -> 407194 (+0.29%); split: -1.10%, +1.39% (sy)-stall: 1668213 -> 1678082 (+0.59%); split: -1.31%, +1.90% Preamble Instrs: 1105270 -> 1067290 (-3.44%); split: -3.50%, +0.06% Constlen: 423776 -> 423560 (-0.05%) Last helper: 1038202 -> 1035540 (-0.26%); split: -0.42%, +0.16% Last baryf: 38908 -> 38632 (-0.71%) Subgroup size: 336640 -> 336832 (+0.06%) Cat0: 916209 -> 922848 (+0.72%); split: -0.87%, +1.59% Cat1: 282813 -> 262845 (-7.06%); split: -7.49%, +0.43% Cat2: 2198715 -> 2183012 (-0.71%); split: -0.72%, +0.01% Cat3: 1390914 -> 1376421 (-1.04%) Cat7: 123127 -> 123116 (-0.01%); split: -0.24%, +0.23% Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>	2025-10-10 05:31:17 +00:00
Job Noorman	18f69890d1	nir: add nir_shr builder Sometimes we need to select between ishr/ushr based some condition; this builder makes this less verbose. Signed-off-by: Job Noorman <jnoorman@igalia.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37793>	2025-10-10 05:31:17 +00:00
Olivia Lee	10a8defecc	util/macros: coerce likely/unlikely to bool even without __builtin_expect Coercing the argument to a bool when we have __builtin_expect but leaving it unmodified otherwise is a recipe for really subtle bugs. I don't know if any bugs like that exist currently, but I almost introduced one in panfrost. Signed-off-by: Olivia Lee <olivia.lee@collabora.com> Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37801>	2025-10-10 01:37:28 +00:00
Lionel Landwerlin	196c7903b9	anv: fix companion usage for emulated image We need to return true if we need the companion batch. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `e60416b4e4` ("anv: use companion batch for operations with HIZ/STC_CCS destination") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lucas Fryzek <lfryzek@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37797>	2025-10-09 21:38:33 +00:00
Kenneth Graunke	bb096b0f12	brw: Use BITFIELD_{MASK,RANGE} in clip/cull distance mask handling code Suggested by Alyssa. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>	2025-10-09 13:20:04 -07:00
Kenneth Graunke	bf76e86bc8	brw: Refactor clip/cull distance mask setting into a helper This was copy pasted between 4 different stages. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>	2025-10-09 13:20:03 -07:00
Kenneth Graunke	b3c511592a	brw: Replace type_size_xvec4 with glsl_count_attribute_slots This is nearly identical, except for bindless sampler/texture/image handling. But we only use it for inputs/outputs, not uniforms, where there are no bindless handles to worry about. Deletes a lot of mostly-duplicated code. Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37784>	2025-10-09 13:20:00 -07:00

1 2 3 4 5 ...

213311 Commits