AlexIndustrial/mesa

Author	SHA1	Message	Date
Lionel Landwerlin	ba3ff8b3bb	brw: move barycentric_mode enum to intel_shader_enums.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Lionel Landwerlin	bfcb9bf276	brw: rename brw_sometimes to intel_sometimes Moving it to intel_shader_enums.h The plan is to make it visible to OpenCL shaders. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Lionel Landwerlin	9016a5458a	brw: change fs_msaa flags checks to test compiled flag first There should be no functional change here. This is just trying to make things more clear, we use the compiled value if != BRW_SOMETIMES and otherwise use the dynamically computed flags. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32329>	2024-11-26 13:05:30 +00:00
Caio Oliveira	0c0b61b029	intel/brw: Dump IR after lower scoreboard pass Acked-by: Iván Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32269>	2024-11-22 21:47:46 +00:00
Caio Oliveira	90343f452d	intel/brw: Fix SWSB output when printing IR The printing routine was ignoring dependencies that were only unordered. Acked-by: Iván Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32269>	2024-11-22 21:47:46 +00:00
Caio Oliveira	1d704af515	intel/brw: Fix decoding of cond_modifier and saturate in EU validation These fields are only valid in certain formats, so set them accordingly. Note the check if !is_send is used because FORMAT_BASIC is reused for SEND/SENDS in some platforms. If we start to see more cases like that, we can create a new FORMAT for it. The cond_modifier is trickier because on top of that, it is not valid for 64-bit immediates in some platforms. Found when EU validation complained about moving 64-bit immediates with higher bits. Fixes: `e4440df2d8` ("intel/brw: Add pred/cmod/sat to brw_hw_decoded_inst") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32287>	2024-11-22 21:15:46 +00:00
Caio Oliveira	3e2599d475	intel/brw: Use <V,W,H> notation for FIXED_GRF and ARF source when printing IR Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32260>	2024-11-21 17:36:34 -08:00
Caio Oliveira	71d362db66	intel/brw: Omit type and region in payload sources when printing IR These are not really used since SEND messages deal with full GRFs. Before ``` send(8) (mlen: 1) (ex_mlen: 1) (null):UD, 0u, 0u, g1:UD, g8:UD send(8) (mlen: 1) g5:UD, 0u, 0u, g4:UD, (null):UD send(8) (mlen: 1) (ex_mlen: 1) (null):UD, 0u, 16777216u, g1:D, g6:UD send(8) (mlen: 1) (EOT) (null):UD, 0u, 0u, g126:UD, (null):UD NoMask ``` and after ``` send(8) (mlen: 1) (ex_mlen: 1) (null), 0u, 0u, g1, g8 send(8) (mlen: 1) g5, 0u, 0u, g4, (null) send(8) (mlen: 1) (ex_mlen: 1) (null), 0u, 16777216u, g1, g6 send(8) (mlen: 1) (EOT) (null), 0u, 0u, g126, (null) NoMask ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32259>	2024-11-22 00:50:40 +00:00
Caio Oliveira	8474dc853d	intel/brw: Add SHADER_OPCODE_QUAD_SWAP For the horizontal, vertical and diagonal variants. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31053>	2024-11-22 00:27:01 +00:00
Caio Oliveira	2bd7592b0b	intel/brw: Add SHADER_OPCODE_BALLOT Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31052>	2024-11-21 19:32:59 +00:00
Marek Olšák	25d4943481	nir: make use_interpolated_input_intrinsics a nir_lower_io parameter This will need to be set to true when the GLSL linker lowers IO, which can later be unlowered by st/mesa, and then drivers can lower it again without load_interpolated_input. Therefore, it can't be a global immutable option. Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32229>	2024-11-20 02:45:37 +00:00
Caio Oliveira	0b66cb1f82	intel/brw: Allow extra SWSB encodings for Xe2 There are new combinations of ordered and unordered dependencies available for the instructions to use, which among others include: - combining FLOAT and INT pipe deps in SENDs; - combining SRC mode deps in regular instructions for the inferred type. This patch enables a couple of tests checking for the first case. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31375>	2024-11-19 04:27:00 +00:00
Caio Oliveira	1b13eea642	intel/brw: Add test for combining SWSB dependencies in SENDs These are currently DISABLED_ since they fail. A later patch will enable them. Reviewed-by: Francisco Jerez <currojerez@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31375>	2024-11-19 04:27:00 +00:00
Kenneth Graunke	5848035443	brw: Fix try_rebuild_source's ult32/ushr handling to use unsigned types We were accidentally doing a signed integer comparison here for ult32, or a sign-extending shift for ushr. One notable bit of fallout was that load_global_uniform_block_intel address calculations broke on platforms that don't have native 64-bit integer support, as the iadd64 lowering for "do I need to carry?" was using ult32...and performing the wrong comparison. We spotted this in Borderlands 3 on Alchemist once we turned on other optimizations. Thanks to Lionel Landwerlin for helping spot the problem! Fixes: `c7b312ad45` ("brw: factor out source extraction for rematerialization") Fixes: `339630ab05` ("brw: enable A64 loads source rematerialization") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>	2024-11-18 12:55:47 +00:00
Kenneth Graunke	0a376a672a	brw: Fix emit_a64_oword_block_header UNIFORM -> VGRF copies This was triggering an assertion in the fs_builder::MOV helper that the destination stride can't be 0 when dispatch_width > 1. What we want to do is copy the single 64-bit channel of data from the UNIFORM file to a VGRF. We can use a SIMD1 builder for that. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>	2024-11-18 12:55:47 +00:00
Francisco Jerez	0ad835a929	intel/fs/xe2: Fix up subdword integer region restriction with strided byte src and packed byte dst. This fixes a corner case of the LNL sub-dword integer restrictions that wasn't being detected by has_subdword_integer_region_restriction(), specifically: > if(Src.Type==Byte && Dst.Type==Byte && Dst.Stride==1 && W!=2) { > // ... > if(Src.Stride == 2) && (Src.UniformStride) && (Dst.SubReg%32 == Src.SubReg/2 ) { Allowed } > // ... > } All the other restrictions that require agreement between the SubReg number of source and destination only affect sources with a stride greater than a dword, which is why has_subdword_integer_region_restriction() was returning false except when "byte_stride(srcs[i]) >= 4" evaluated to true, but as implied by the pseudocode above, in the particular case of a packed byte destination, the restriction applies for source strides as narrow as 2B. The form of the equation that relates the subreg numbers is consistent with the existing calculations in brw_fs_lower_regioning (see required_src_byte_offset()), we just need to enable lowering for this corner case, and change lower_dst_region() to call lower_instruction() recursively, since some of the cases where we break this restriction are copy instructions introduced by brw_fs_lower_regioning() itself trying to lower other instructions with byte destinations. This fixes some Vulkan CTS test-cases that were hitting these restrictions with byte data types. Fixes: `217d412360` ("intel/fs/gfx20+: Implement sub-dword integer regioning restrictions.") Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30630>	2024-11-15 07:39:33 +00:00
Lionel Landwerlin	a21cd8c5b6	brw: allocate physical register sizes for spilling All of the spilling code should work with physical register units because for example SEND messages will expect a physical register as destination. So always allocate a full physical register for the spilled/unspilled values and adjust the offsets of the registers to physical sizes too. Cc: mesa-stable Fixes: `aa494cba` ("brw: align spilling offsets to physical register sizes") Closes: mesa/mesa#11967 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Found-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32124>	2024-11-14 08:44:03 +00:00
Rhys Perry	45c1280d2c	nir_lower_mem_access_bit_sizes: pass access to callback Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Rhys Perry	61752152f7	nir_lower_mem_access_bit_sizes: add nir_mem_access_shift_method Signed-off-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Georg Lehmann <dadschoorse@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31904>	2024-11-13 12:59:26 +00:00
Iván Briano	aee04bf4fb	intel/rt: fix ray_query stack address calculation While the documentation says to use NUM_SIMD_LANES_PER_DSS for the stack address calculation, what the HW actually uses is NUM_SYNC_STACKID_PER_DSS. The former may vary depending on the platform, while the latter is fixed to 2048 for all current platforms. Fixes: `6c84cbd8c9` ("intel/dev/xe: Set max_eus_per_subslice using topology query") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32049>	2024-11-08 18:31:52 +00:00
Ian Romanick	7aad19ccd2	brw/lower: Lower invalid source conversion to better code There are two fragment shaders from RDR2 that is hurt for spills and fills on Lunar Lake. Totals from 2 (0.00% of 551413) affected shaders: Spill count: 1252 -> 1317 (+5.19%) Fill count: 2518 -> 2642 (+4.92%) Those shaders... have a lot of room for improvement. There are some patterns in those shaders that we handle very, very poorly. Improving those patterns would likely improve the spills and fills in these shaders quite dramatically. Given how much other platforms are helped, I don't this should block this commit. No shader-db or fossil-db changes on any pre-Gfx12.5 Intel platforms. v2: Add some comments and an additional assertion. Suggested by Ken. shader-db: Lunar Lake total instructions in shared programs: 18094517 -> 18094511 (<.01%) instructions in affected programs: 809 -> 803 (-0.74%) helped: 6 / HURT: 0 total cycles in shared programs: 921532158 -> 921532168 (<.01%) cycles in affected programs: 2266 -> 2276 (0.44%) helped: 0 / HURT: 3 Meteor Lake and DG2 had similar results. (Meteor Lake shown) total instructions in shared programs: 19820845 -> 19820839 (<.01%) instructions in affected programs: 803 -> 797 (-0.75%) helped: 6 / HURT: 0 total cycles in shared programs: 906372999 -> 906372949 (<.01%) cycles in affected programs: 3216 -> 3166 (-1.55%) helped: 6 / HURT: 0 fossil-db: Lunar Lake Totals: Instrs: 141887377 -> 141884465 (-0.00%); split: -0.00%, +0.00% Cycle count: 21990301498 -> 21990267232 (-0.00%); split: -0.00%, +0.00% Spill count: 69732 -> 69797 (+0.09%) Fill count: 128521 -> 128645 (+0.10%) Totals from 349 (0.06% of 551413) affected shaders: Instrs: 506117 -> 503205 (-0.58%); split: -0.79%, +0.21% Cycle count: 32362996 -> 32328730 (-0.11%); split: -0.52%, +0.41% Spill count: 1951 -> 2016 (+3.33%) Fill count: 4899 -> 5023 (+2.53%) Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152773732 -> 152761383 (-0.01%); split: -0.01%, +0.00% Cycle count: 17187529968 -> 17187450663 (-0.00%); split: -0.00%, +0.00% Spill count: 79279 -> 79003 (-0.35%) Fill count: 148803 -> 147942 (-0.58%) Scratch Memory Size: 3949568 -> 3946496 (-0.08%) Max live registers: 31879325 -> 31879230 (-0.00%) Totals from 366 (0.06% of 633185) affected shaders: Instrs: 557377 -> 545028 (-2.22%); split: -2.22%, +0.01% Cycle count: 26171205 -> 26091900 (-0.30%); split: -0.54%, +0.24% Spill count: 3238 -> 2962 (-8.52%) Fill count: 10018 -> 9157 (-8.59%) Scratch Memory Size: 257024 -> 253952 (-1.20%) Max live registers: 28187 -> 28092 (-0.34%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Ian Romanick	2a57568ebd	brw/build: Add scalar_group() helper Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems different, so I decided to keep it. The next commit wants to use this. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Ian Romanick	5dfea87623	brw/opt: Always do both kinds of copy propagation before lower_load_payload shader-db: All Intel platforms except Skylake had similar results. (Lunar Lake shown) total instructions in shared programs: 18092932 -> 18092713 (<.01%) instructions in affected programs: 139290 -> 139071 (-0.16%) helped: 103 HURT: 18 helped stats (abs) min: 1 max: 8 x̄: 2.43 x̃: 2 helped stats (rel) min: 0.02% max: 9.09% x̄: 0.73% x̃: 0.29% HURT stats (abs) min: 1 max: 5 x̄: 1.72 x̃: 1 HURT stats (rel) min: 0.02% max: 0.55% x̄: 0.10% x̃: 0.08% 95% mean confidence interval for instructions value: -2.17 -1.45 95% mean confidence interval for instructions %-change: -0.83% -0.38% Instructions are helped. total cycles in shared programs: 922792268 -> 921495900 (-0.14%) cycles in affected programs: 400296984 -> 399000616 (-0.32%) helped: 765 HURT: 635 helped stats (abs) min: 2 max: 77018 x̄: 6739.33 x̃: 60 helped stats (rel) min: <.01% max: 35.59% x̄: 1.98% x̃: 0.32% HURT stats (abs) min: 2 max: 88658 x̄: 6077.51 x̃: 152 HURT stats (rel) min: <.01% max: 51.33% x̄: 2.75% x̃: 0.63% 95% mean confidence interval for cycles value: -1620.41 -231.54 95% mean confidence interval for cycles %-change: -0.10% 0.44% Inconclusive result (%-change mean confidence interval includes 0). LOST: 4 GAINED: 3 Skylake total instructions in shared programs: 18658324 -> 18579715 (-0.42%) instructions in affected programs: 2089957 -> 2011348 (-3.76%) helped: 9842 HURT: 23 helped stats (abs) min: 1 max: 24 x̄: 7.99 x̃: 8 helped stats (rel) min: 0.05% max: 40.00% x̄: 5.37% x̃: 4.52% HURT stats (abs) min: 1 max: 5 x̄: 1.57 x̃: 1 HURT stats (rel) min: 0.02% max: 1.28% x̄: 0.36% x̃: 0.24% 95% mean confidence interval for instructions value: -7.98 -7.95 95% mean confidence interval for instructions %-change: -5.43% -5.29% Instructions are helped. total cycles in shared programs: 860031654 -> 860237548 (0.02%) cycles in affected programs: 449175235 -> 449381129 (0.05%) helped: 7895 HURT: 4416 helped stats (abs) min: 1 max: 14129 x̄: 113.70 x̃: 22 helped stats (rel) min: <.01% max: 40.95% x̄: 1.31% x̃: 0.56% HURT stats (abs) min: 1 max: 33397 x̄: 249.89 x̃: 34 HURT stats (rel) min: <.01% max: 67.47% x̄: 2.65% x̃: 0.65% 95% mean confidence interval for cycles value: 1.46 31.98 95% mean confidence interval for cycles %-change: 0.02% 0.19% Cycles are HURT. LOST: 557 GAINED: 900 fossil-db: Lunar Lake Totals: Instrs: 141933621 -> 141884681 (-0.03%); split: -0.03%, +0.00% Cycle count: 21990657282 -> 21990200212 (-0.00%); split: -0.14%, +0.14% Spill count: 69754 -> 69732 (-0.03%); split: -0.05%, +0.02% Fill count: 128559 -> 128521 (-0.03%); split: -0.05%, +0.02% Scratch Memory Size: 5934080 -> 5925888 (-0.14%) Max live registers: 48021653 -> 48051253 (+0.06%); split: -0.00%, +0.06% Totals from 13510 (2.45% of 551410) affected shaders: Instrs: 19497180 -> 19448240 (-0.25%); split: -0.25%, +0.00% Cycle count: 2455370202 -> 2454913132 (-0.02%); split: -1.25%, +1.23% Spill count: 10975 -> 10953 (-0.20%); split: -0.32%, +0.12% Fill count: 21709 -> 21671 (-0.18%); split: -0.28%, +0.10% Scratch Memory Size: 674816 -> 666624 (-1.21%) Max live registers: 2502653 -> 2532253 (+1.18%); split: -0.01%, +1.19% Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 152763523 -> 152772716 (+0.01%); split: -0.00%, +0.01% Cycle count: 17188701887 -> 17187510768 (-0.01%); split: -0.10%, +0.09% Spill count: 79280 -> 79279 (-0.00%); split: -0.00%, +0.00% Fill count: 148809 -> 148803 (-0.00%) Max live registers: 31879240 -> 31879093 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5559984 -> 5559712 (-0.00%); split: +0.00%, -0.01% Totals from 20524 (3.24% of 633183) affected shaders: Instrs: 20366964 -> 20376157 (+0.05%); split: -0.01%, +0.05% Cycle count: 2406162382 -> 2404971263 (-0.05%); split: -0.68%, +0.63% Spill count: 19935 -> 19934 (-0.01%); split: -0.02%, +0.01% Fill count: 34487 -> 34481 (-0.02%) Max live registers: 1745598 -> 1745451 (-0.01%); split: -0.01%, +0.01% Max dispatch width: 117992 -> 117720 (-0.23%); split: +0.03%, -0.26% Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 150694108 -> 150683859 (-0.01%); split: -0.01%, +0.00% Cycle count: 15526754059 -> 15529031079 (+0.01%); split: -0.10%, +0.12% Max live registers: 31791599 -> 31791441 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5569488 -> 5569296 (-0.00%); split: +0.00%, -0.01% Totals from 15000 (2.37% of 632406) affected shaders: Instrs: 10965577 -> 10955328 (-0.09%); split: -0.11%, +0.02% Cycle count: 2025347115 -> 2027624135 (+0.11%); split: -0.80%, +0.91% Max live registers: 983373 -> 983215 (-0.02%); split: -0.02%, +0.00% Max dispatch width: 83064 -> 82872 (-0.23%); split: +0.12%, -0.35% Skylake Totals: Instrs: 140588784 -> 140413758 (-0.12%); split: -0.13%, +0.00% Cycle count: 14724286265 -> 14723402393 (-0.01%); split: -0.04%, +0.04% Fill count: 100130 -> 100129 (-0.00%) Max live registers: 31418029 -> 31417146 (-0.00%); split: -0.00%, +0.00% Max dispatch width: 5513400 -> 5535192 (+0.40%); split: +0.89%, -0.49% Totals from 39733 (6.35% of 625986) affected shaders: Instrs: 17240737 -> 17065711 (-1.02%); split: -1.02%, +0.01% Cycle count: 1994668203 -> 1993784331 (-0.04%); split: -0.31%, +0.27% Fill count: 44481 -> 44480 (-0.00%) Max live registers: 2766781 -> 2765898 (-0.03%); split: -0.03%, +0.00% Max dispatch width: 210600 -> 232392 (+10.35%); split: +23.23%, -12.89% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Ian Romanick	be26012f1d	brw/opt: Always do copy prop, DCE, and register coalesce after lower_regioning shader-db: Lunar Lake total instructions in shared programs: 18100289 -> 18083853 (-0.09%) instructions in affected programs: 790048 -> 773612 (-2.08%) helped: 3058 / HURT: 1 total cycles in shared programs: 921691992 -> 921293816 (-0.04%) cycles in affected programs: 37210762 -> 36812586 (-1.07%) helped: 2329 / HURT: 624 LOST: 27 GAINED: 26 Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown) total instructions in shared programs: 19825635 -> 19821391 (-0.02%) instructions in affected programs: 138675 -> 134431 (-3.06%) helped: 877 / HURT: 0 total cycles in shared programs: 907900598 -> 907885713 (<.01%) cycles in affected programs: 7127161 -> 7112276 (-0.21%) helped: 318 / HURT: 242 total spills in shared programs: 5790 -> 5758 (-0.55%) spills in affected programs: 660 -> 628 (-4.85%) helped: 8 / HURT: 0 total fills in shared programs: 6744 -> 6712 (-0.47%) fills in affected programs: 708 -> 676 (-4.52%) helped: 8 / HURT: 0 LOST: 10 GAINED: 0 Skylake total instructions in shared programs: 18722197 -> 18637637 (-0.45%) instructions in affected programs: 2757553 -> 2672993 (-3.07%) helped: 12290 / HURT: 1 total cycles in shared programs: 859716039 -> 859432560 (-0.03%) cycles in affected programs: 113731837 -> 113448358 (-0.25%) helped: 9555 / HURT: 2422 LOST: 265 GAINED: 714 fossil-db: Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown) Totals: Instrs: 142000618 -> 141928331 (-0.05%); split: -0.05%, +0.00% Subgroup size: 10995136 -> 10995072 (-0.00%) Cycle count: 21994723230 -> 21990481140 (-0.02%); split: -0.08%, +0.06% Spill count: 69911 -> 69754 (-0.22%); split: -0.23%, +0.00% Fill count: 128723 -> 128559 (-0.13%); split: -0.15%, +0.02% Scratch Memory Size: 5936128 -> 5934080 (-0.03%) Max live registers: 48006880 -> 48020936 (+0.03%); split: -0.01%, +0.04% Totals from 17450 (3.16% of 551410) affected shaders: Instrs: 14984149 -> 14911862 (-0.48%); split: -0.48%, +0.00% Subgroup size: 365744 -> 365680 (-0.02%) Cycle count: 2585095128 -> 2580853038 (-0.16%); split: -0.71%, +0.54% Spill count: 20893 -> 20736 (-0.75%); split: -0.76%, +0.00% Fill count: 44181 -> 44017 (-0.37%); split: -0.44%, +0.07% Scratch Memory Size: 995328 -> 993280 (-0.21%) Max live registers: 2378069 -> 2392125 (+0.59%); split: -0.20%, +0.79% Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown) Totals: Instrs: 150719758 -> 150676269 (-0.03%); split: -0.04%, +0.01% Subgroup size: 7764560 -> 7764632 (+0.00%) Cycle count: 15526689814 -> 15525687740 (-0.01%); split: -0.03%, +0.02% Spill count: 60120 -> 59472 (-1.08%); split: -1.17%, +0.10% Fill count: 105973 -> 104675 (-1.22%); split: -1.40%, +0.17% Scratch Memory Size: 2396160 -> 2381824 (-0.60%); split: -0.73%, +0.13% Max live registers: 31782879 -> 31788857 (+0.02%); split: -0.01%, +0.03% Max dispatch width: 5569200 -> 5569344 (+0.00%); split: +0.00%, -0.00% Totals from 10089 (1.60% of 632405) affected shaders: Instrs: 6389866 -> 6346377 (-0.68%); split: -0.87%, +0.19% Subgroup size: 102912 -> 102984 (+0.07%) Cycle count: 681310278 -> 680308204 (-0.15%); split: -0.65%, +0.51% Spill count: 19571 -> 18923 (-3.31%); split: -3.61%, +0.30% Fill count: 38229 -> 36931 (-3.40%); split: -3.88%, +0.48% Scratch Memory Size: 808960 -> 794624 (-1.77%); split: -2.15%, +0.38% Max live registers: 677473 -> 683451 (+0.88%); split: -0.45%, +1.33% Max dispatch width: 88672 -> 88816 (+0.16%); split: +0.27%, -0.11% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Ian Romanick	b2d7a823be	brw/lower: Don't emit spurious moves to or from NULL register Previously an instruction like cmp.l.f0.0(16) null:F, v359:F, 0f would get lowered to undef(16) v13703:UD cmp.l.f0.0(16) v13703:F, v359:F, 0f mov(16) null:UD, v13703:UD After copy propagation and dead-code elimination are run again, the original CMP gets turned back into its original form! Some cases can also emit MOVs from the original NULL register. It should be possible to not do any lowering here, but there are some interactions with source lowering passes for things like cmp.l.f0.0(16) null:HF, g89.1<16,16,1>:HF, 0hf What inspired this was... diff'ing step-by-step dumps from INTEL_DEBUG=optimizer had a lot of useless changes due to these MOVs and undefs. It was very annoying. This low-effort change gets the majority of the possible benefit. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Ian Romanick	9aba731d03	brw/cse: Don't eliminate instructions that write flags With other changes in my tree, I observed this code from dEQP-VK.subgroups.vote.compute.subgroupallequal_float have the second cmp.z removed. undef(8) %69:UD cmp.z.f0.0(8) %69:F, %37:F, %57+0.0<0>:F mov(1) v58+0.0:D, 0d NoMask group0 (+f0.0) mov(1) v58+0.0:D, -1d NoMask group0 cmp.nz.f0.0(8) null:D, v58+0.0<0>:D, 0d ... undef(8) %72:UD cmp.z.f0.0(8) %72:F, %37:F, %57+0.0<0>:F mov(1) v63+0.0:D, 0d NoMask group0 (+f0.0) mov(1) v63+0.0:D, -1d NoMask group0 This was also fixed by running dead-code elimination before CSE. That seems more like avoiding the problem than fixing it, though. I believe this affects shader-db results because leaving the second CMP in the shader can give more opportunities for cmod propagation. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `234c45c929` ("intel/brw: Write a new global CSE pass that works on defs") shader-db: All Intel platforms had similar results. (Lunar Lake shown) total cycles in shared programs: 922097690 -> 922260862 (0.02%) cycles in affected programs: 3178926 -> 3342098 (5.13%) helped: 130 HURT: 88 helped stats (abs) min: 2 max: 2194 x̄: 296.71 x̃: 16 helped stats (rel) min: <.01% max: 16.56% x̄: 1.86% x̃: 0.18% HURT stats (abs) min: 4 max: 11992 x̄: 2292.55 x̃: 47 HURT stats (rel) min: 0.04% max: 57.32% x̄: 11.82% x̃: 0.61% 95% mean confidence interval for cycles value: 320.36 1176.63 95% mean confidence interval for cycles %-change: 1.59% 5.73% Cycles are HURT. LOST: 2 GAINED: 1 fossil-db: Lunar Lake, Meteor Lake, Tiger Lake had similar results. (Lunar Lake shown) Totals: Instrs: 142022960 -> 142022928 (-0.00%); split: -0.00%, +0.00% Cycle count: 21995242782 -> 21995384040 (+0.00%); split: -0.00%, +0.00% Max live registers: 48013385 -> 48013343 (-0.00%) Totals from 507 (0.09% of 551441) affected shaders: Instrs: 886191 -> 886159 (-0.00%); split: -0.01%, +0.01% Cycle count: 69302492 -> 69443750 (+0.20%); split: -0.66%, +0.86% Max live registers: 94413 -> 94371 (-0.04%) DG2 Totals: Instrs: 152856370 -> 152856093 (-0.00%); split: -0.00%, +0.00% Cycle count: 17237159885 -> 17236804052 (-0.00%); split: -0.00%, +0.00% Fill count: 150673 -> 150631 (-0.03%) Max live registers: 31871520 -> 31871476 (-0.00%) Totals from 506 (0.08% of 633197) affected shaders: Instrs: 831795 -> 831518 (-0.03%); split: -0.04%, +0.01% Cycle count: 55578509 -> 55222676 (-0.64%); split: -1.38%, +0.74% Fill count: 2779 -> 2737 (-1.51%) Max live registers: 51383 -> 51339 (-0.09%) Ice Lake and Skylake had similar results. (Ice Lake shown) Totals: Instrs: 152017826 -> 152017793 (-0.00%); split: -0.00%, +0.00% Cycle count: 15180773451 -> 15180761166 (-0.00%); split: -0.00%, +0.00% Fill count: 106610 -> 106614 (+0.00%) Max live registers: 32195006 -> 32194966 (-0.00%) Totals from 411 (0.06% of 637268) affected shaders: Instrs: 705935 -> 705902 (-0.00%); split: -0.01%, +0.01% Cycle count: 47830019 -> 47817734 (-0.03%); split: -0.05%, +0.02% Fill count: 2865 -> 2869 (+0.14%) Max live registers: 42883 -> 42843 (-0.09%) Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Ian Romanick	80a5d158ae	brw/copy: Don't copy propagate through smaller entry dest size Copy propagation would incorrectly occur in this code mov(16) v4+2.0:UW, u0<0>:UW NoMask ... mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0 to create mov(16) v4+2.0:UW, u0<0>:UW NoMask ... mov(8) v6+2.0:UD, u0<0>:UD NoMask group0 This has different behavior. I think I just made a mistake when I changed this condition in `e3f502e007`. It seems like this condition could be relaxed to cover cases like (note the change of destination stride) mov(16) v4+2.0<2>:UW, u0<0>:UW NoMask ... mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0 I'm not sure it's worth it. No shader-db or fossil-db changes on any Intel platform. Even the code for the test case mentioned in the original commit did not change. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `e3f502e007` ("intel/fs: Allow copy propagation between MOVs of mixed sizes") Closes: #12116 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>	2024-11-08 17:46:45 +00:00
Ian Romanick	c1c09e3c4a	brw/emit: Add correct 3-source instruction assertions for each platform Specifically, allow two immediate sources for BFE on Gfx12+. I stumbled on this while trying some stuff with !31852. v2: Don't be lazy. Add proper assertions for all the things on all the platforms. Based on a suggestion by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Fixes: `7bed11fbde` ("intel/brw: Allow immediates in the BFE instruction on Gfx12+") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31858>	2024-11-08 16:48:57 +00:00
Kenneth Graunke	22b511ef02	intel: Set shader_spilling_rate=11 in intel_clc A while back Matt enabled shader_spilling_rate by default for anv. But intel_clc doesn't use the driconf mechanism that we use there. The GRL shaders spill a lot, and with us now compiling additional generations of the shaders, Mesa build time is getting prohibitively expensive. By setting this, we drop the time taken for a clean debug build by approximately 35% on my current laptop. Reviewed-by: Ivan Briano <ivan.briano@intel.com> Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31993>	2024-11-06 01:57:10 +00:00
Sviatoslav Peleshko	3a962a28e7	intel/elk_asm: Add BranchCtrl support We emit it for gfx8, so the assembler should support it too. Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>	2024-11-02 18:01:20 +00:00
Sviatoslav Peleshko	cd4c328408	intel/elk: List all instructions that have BranchCtrl bit Previously this bit was not clearly documented in PRMs, but gfx12 PRMs finally list all the instructions where it is present. Although it's unclear if it's functional for anything other than "if", "else", and "goto", we probably still should acknowledge its existence in other instructions. Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>	2024-11-02 18:01:20 +00:00
Sviatoslav Peleshko	445df8d611	intel/brw_asm: Add BranchCtrl support We emit it for gfx9, so the assembler should support it too. Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>	2024-11-02 18:01:19 +00:00
Sviatoslav Peleshko	aea7366613	intel/brw: List all instructions that have BranchCtrl bit Previously this bit was not clearly documented in PRMs, but gfx12 PRMs finally list all the instructions where it is present. Although it's unclear if it's functional for anything other than "if", "else", and "goto", we probably still should acknowledge its existence in other instructions. Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>	2024-11-02 18:01:19 +00:00
Paulo Zanoni	5ca883505e	brw: add a NOP in between WHILE instructions on LNL This is a workaround that is still in progress, see HSD 22020521218. If we don't have these NOPs we may see rendering corruption or even GPU hangs. While we still don't fully understand the issue from the hardware point of view, let's have this workaround so we can pass CTS and move things forward. If we need to change this later, we can. Besides, the impact is minimal. Shaderdb/fossilize report no changes for this patch. On our Blackops trace, the lack of this patch causes corruption in fog rendering (rectangles where fog was supposed to be shown don't show the fog). On dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters, without this patch we get a GPU hang. Backport-to: 24.2 Testcase: dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11813 Reviewed-by: Ivan Briano <ivan.briano@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31331>	2024-10-31 23:57:10 +00:00
Iván Briano	13db5fad27	brw: fix task/mesh push constant loading The InlineData passed to the shader is a fixed size unrelated to the register size. It happens to match pre-Xe2, but by considering it the same in Xe2, we ended up reading pushed constants from the wrong place when they didn't fit in the InlineData. Fixes: `97b17aa0b1` ("brw/nir: rework inline_data_intel to work with compute") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31856>	2024-10-26 18:12:41 +00:00
Jordan Justen	35ace9d4e2	intel/compiler: Xe2 and Xe3 use the same compaction tables Ref: bspec 56709 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>	2024-10-26 07:39:30 +00:00
Jordan Justen	688a673c5a	intel/brw: Allow Xe3 in brw_stage_has_packed_dispatch() Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>	2024-10-26 07:39:30 +00:00
Jordan Justen	cd33b7766a	intel/compiler: Add compiler enum for Xe3 Signed-off-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>	2024-10-26 07:39:30 +00:00
Ian Romanick	04e1783278	brw: Call brw_fs_opt_algebraic less often No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31729>	2024-10-25 23:39:36 +00:00
Ian Romanick	ac64b78f1f	brw/copy: Perform constant folding with constant propagation No shader-db or fossil-db changes on any Intel platform. v2: Simlify the logic for when to try constant folding. Do commute_immediates before constant folding. Both suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31729>	2024-10-25 23:39:36 +00:00
Ian Romanick	2cc1575a31	brw/algebraic: Refactor constant folding out of brw_fs_opt_algebraic Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31729>	2024-10-25 23:39:36 +00:00
Ian Romanick	5dcad54902	brw/sat: Convert nearly all tests to use new style builders v2: Use new style builder for second ADD in other_non_saturated_use too. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>	2024-10-25 20:31:45 +00:00
Ian Romanick	19ae7aceb5	brw/sat: Fix small typos, copy and paste, etc. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>	2024-10-25 20:31:45 +00:00
Ian Romanick	de45273307	brw/builder: Add new style ALU3 builder Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>	2024-10-25 20:31:45 +00:00
Ian Romanick	8329c04521	brw/copy: Don't remove instructions w/ conditional modifier Fixes: `9e750f00c3` ("intel/brw: Make opt_copy_propagation_defs clean up its own trash") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>	2024-10-25 20:31:44 +00:00
Kenneth Graunke	d949d47f09	brw/emit: Fix align16 3src subregister encodings for HF types Prior to Cherryview, align16 3src instruction sources had to have their subregister number be DWord-aligned. Cherryview added a discontiguous bit in the encoding to represent bit 1 of the subregister number. This allows us to use packed HF sources. Update the ISA encoding helpers to properly handle bit 1. While we're at it, make them take a full subregister number and adjust accordingly, rather than making the callers divide or multiply by some alignment. Note that the destination subregister must still be DWord aligned, so HF destinations must be strided. Thanks to Ian Romanick for discovering that we were botching this. BSpec: 12054, 12081 v2 (idr): Fix ordering of high and low bit parameters to brw_inst_bits. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Tested-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>	2024-10-25 20:31:44 +00:00
Kenneth Graunke	33cd5a49f1	brw/validate: Return an error for Align16 access mode on Icelake+ Gfx11+ doesn't support Align16 instructions anymore - only Align1 mode. Bailing early for Align16 is important so that brw_hw_decode_inst doesn't try to read Align16 related instruction fields on generations where they no longer exist (which could trigger assertions). Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31834>	2024-10-25 20:31:44 +00:00
Francisco Jerez	e2eba3c7da	intel/brw/xe2+: Adjust performance analysis divergence weight due to EU fusion removal. This reduces the penalty the heuristic gives to SIMD32 shaders relative to SIMD16 in presence of discard control flow on Xe2+. The penalty was meant to account for the inefficient divergence behavior of SIMD32 shaders on Gfx12.x platforms, since Gfx12 hardware had EUs bundled in groups of two, and each pair shared control flow logic so both EUs could only execute instructions in lockstep, which meant that SIMD32 shaders had an effective warp size of 64 on Gfx12.x. This change switches back to more optimistic modelling of discard divergence. With it we gain about 6% performance in a Shadow of the Tomb Raider trace (tested on BMG). One may wonder if there are still workloads that would suffer materially from enabling SIMD32 for all pixel shaders on Xe2 instead of using this heuristic, since Xe2 EUs have twice the GRF space, twice the FPU throughput and better divergence behavior than Xe, but the answer seems to be yes unfortunately: E.g. Superposition has some pixel shaders where SIMD32 has substantially worse scheduling due to the increased number of false dependencies due to higher register pressure, and using SIMD32 for them reduces performance significantly. The heuristic seems to model this correctly so it doesn't look like we can do without it at least right now on Xe2. Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31697>	2024-10-24 22:06:52 +00:00
Kenneth Graunke	7bed11fbde	intel/brw: Allow immediates in the BFE instruction on Gfx12+ We weren't allowing immediates in BFE at all. Gfx12+ supports immediates in src0 (value) and src2 (width), but not src1 (offset). Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31437>	2024-10-24 21:31:28 +00:00
Daniel Schürmann	87cb42f953	treewide: don't lower to LCSSA before calling nir_divergence_analysis() Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30787>	2024-10-24 10:06:17 +00:00

1 2 3 4 5 ...

3854 Commits