AlexIndustrial/mesa

Author	SHA1	Message	Date
Paulo Zanoni	20c19351b1	anv: be consistent regarding non-render engines on i915.ko Today, on i915.ko, if Sparse Resources is disabled and the Kernel is new enough to confirm to us that the GuC version is good, we'll expose non-render engines, otherwise we don't. Ever since we merged `5ca224aa0c` ("anv/trtt: make all contexts have the same TR-TT programming"), TR-TT is not anymore the reason why we're not enabling non-render engines. Our performance team has analyzed workloads and concluded enabling non-render engines is not worth it on i915.ko today. So here we adjust the code to do three things: - Stop blaming TR-TT - Unify the default behavior for i915.ko - Don't disable non-render engines when TR-TT is being used on xe.ko. v2: - Comments (José) Acked-by: Felix DeGrood <felix.j.degrood@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30627>	2024-08-14 01:09:19 +00:00
Michael Cheng	0324d4bcf5	anv: move trace logic to batch_emit_pipe_control_write Move trace logic from cmd_buffer_apply_pipe_flushes down to genX(batch_emit_pipe_control_write). Signed-off-by: Michael Cheng <michael.cheng@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30623>	2024-08-13 21:42:43 +00:00
José Roberto de Souza	600d88ab3c	intel: Remove INTEL_ENGINE_CLASS_COMPUTE and INTEL_ENGINE_CLASS_COPY parameters It has been a while that the GuC version with the compute engine fix was released, same for the KMD uAPI to query the GuC firmware version. So at this point this parameters do more harm than good. Also just setting those don't enable the async compute and copy engines this is not enabled by default on i915. If user wants to disable or enable usage of those engines a better approach would be use ANV_QUEUE_OVERRIDE. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>	2024-08-13 21:15:31 +00:00
José Roberto de Souza	61e3a680a4	anv: Extend ANV_QUEUE_OVERRIDE to blit count Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>	2024-08-13 21:15:31 +00:00
José Roberto de Souza	92f4008473	anv: Disable sparse even on Xe KMD with ANV_SPARSE ANV_SPARSE had no effect on Xe KMD. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>	2024-08-13 21:15:31 +00:00
Sagar Ghuge	83c2524124	intel/compiler: Adjust trace ray control field on Xe2 Bspec 64643: Structure_TraceRayPayload::Trace Ray Control Bit field moved from 9-8 to 10-8 on Xe2. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Sagar Ghuge	c3c62e493f	intel/compiler: Ray query requires write-back register Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable "When this bit is set in the header, Trace Ray Message behaves like a Ray Query. This message requires a write-back message indicating RayQuery for all valid Rays (SIMD lanes) have completed." If we don't pass the write-back register, somehow it was stepping on over R0 register and can mess up the scratch space accesses which could potentially lead to GPU hang. It can be noticed while running it under simulator trace. send.rta (16\|M0) null r124 r126:1 0x0 0x02000100 {$15} // wr:1+1, rd:0; simd16 trace ray R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig	5f437aa24d	elk: fix compute shader derivatives derivatives are not fs only so move to be with the rest of subgroup ops. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11674 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30634>	2024-08-13 12:19:30 +00:00
Lionel Landwerlin	aaff191356	brw/rt: fix ray_object_(direction\|origin) for closest-hit shaders When closest hit shader is called, the BVH object level brw_nir_rt_load_mem_ray origin/direction is 0. What we should be using is the ray origin/direction and apply the transform of the current instance. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ba7d459a3` ("intel/rt: Implement the new ray-tracing system values") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30578>	2024-08-13 10:28:50 +00:00
Ian Romanick	119801e647	intel/brw: Move fsat instructions closer to the source Intel GPUs have a saturate destination modifier, and brw_fs_opt_saturate_propagation tries to replace explicit saturate operations with this destination modifier. That pass is limited in several ways. If the source of the explicit saturate is in a different block or if the source of the explicit saturate is live after the explicit saturate, brw_fs_opt_saturate_propagation will be unable to make progress. This optimization exists to help brw_fs_opt_saturate_propagation make more progress. It tries to move NIR fsat instructions to the same block that contains the definition of its source. It does this only in cases where it will not create additional live values. It also attempts to do this only in cases where the explicit saturate will ultimiately be converted to a destination modifier. v2: Fix metadata_preserve when theres no progress and use nir_metadata_control_flow when there is progress. All suggested by Alyssa. v3: Fix a typo in the file header comment. Noticed by Ken. Don't require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead of open-coding something slightly more specific. Both suggested by Ken. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19733645 -> 19733028 (<.01%) instructions in affected programs: 193300 -> 192683 (-0.32%) helped: 246 HURT: 1 helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2 helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for instructions value: -2.87 -2.13 95% mean confidence interval for instructions %-change: -0.34% -0.32% Instructions are helped. total cycles in shared programs: 916180971 -> 916264656 (<.01%) cycles in affected programs: 30197180 -> 30280865 (0.28%) helped: 194 HURT: 142 helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19 helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23% HURT stats (abs) min: 1 max: 28058 x̄: 1781.68 x̃: 399 HURT stats (rel) min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63% 95% mean confidence interval for cycles value: -196.84 694.97 95% mean confidence interval for cycles %-change: -0.17% 1.27% Inconclusive result (value mean confidence interval includes 0). fossil-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00% Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02% Max live registers: 32013312 -> 32013549 (+0.00%) Max dispatch width: 5512304 -> 5512136 (-0.00%) Totals from 774 (0.12% of 630172) affected shaders: Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01% Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30% Max live registers: 82195 -> 82432 (+0.29%) Max dispatch width: 6664 -> 6496 (-2.52%) Ice Lake Totals: Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00% Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01% Max live registers: 32471367 -> 32471603 (+0.00%) Max dispatch width: 5623752 -> 5623712 (-0.00%) Totals from 733 (0.12% of 635598) affected shaders: Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01% Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64% Max live registers: 72067 -> 72303 (+0.33%) Max dispatch width: 6216 -> 6176 (-0.64%) Skylake Totals: Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01% Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00% Totals from 659 (0.11% of 625670) affected shaders: Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00% Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41% Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:10 -07:00
Ian Romanick	f5815a003e	intel/brw: Use def analysis for simple cases of saturate propagation I had hoped this would improve compilation performance too. I tried several different long running fossils, and there was no difference. Fossil-db results are all over the place from platform to platform. All of the Tiger Lake shaders hurt for spills and fills are fragment shaders in rdr2. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19734088 -> 19733645 (<.01%) instructions in affected programs: 71200 -> 70757 (-0.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 1 helped stats (rel) min: 0.06% max: 2.79% x̄: 0.83% x̃: 0.48% 95% mean confidence interval for instructions value: -2.69 -2.07 95% mean confidence interval for instructions %-change: -0.93% -0.72% Instructions are helped. total cycles in shared programs: 916290473 -> 916180971 (-0.01%) cycles in affected programs: 3403719 -> 3294217 (-3.22%) helped: 89 HURT: 88 helped stats (abs) min: 1 max: 36685 x̄: 1424.13 x̃: 10 helped stats (rel) min: <.01% max: 26.75% x̄: 1.66% x̃: 0.46% HURT stats (abs) min: 1 max: 8750 x̄: 195.98 x̃: 7 HURT stats (rel) min: <.01% max: 17.12% x̄: 1.57% x̃: 0.19% 95% mean confidence interval for cycles value: -1199.88 -37.43 95% mean confidence interval for cycles %-change: -0.66% 0.56% Inconclusive result (%-change mean confidence interval includes 0). fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151458346 -> 151457413 (-0.00%) Cycle count: 17202426472 -> 17202406469 (-0.00%); split: -0.00%, +0.00% Max live registers: 31989626 -> 31989959 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5500560 -> 5500384 (-0.00%) Totals from 479 (0.08% of 628970) affected shaders: Instrs: 398836 -> 397903 (-0.23%) Cycle count: 18064565 -> 18044562 (-0.11%); split: -0.40%, +0.29% Max live registers: 36663 -> 36996 (+0.91%); split: -0.02%, +0.92% Max dispatch width: 4392 -> 4216 (-4.01%) Tiger Lake Totals: Instrs: 149913036 -> 149912182 (-0.00%); split: -0.00%, +0.00% Cycle count: 15560086488 -> 15560135139 (+0.00%); split: -0.00%, +0.00% Spill count: 61241 -> 61251 (+0.02%) Fill count: 107304 -> 107314 (+0.01%) Max live registers: 31964752 -> 31965119 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5517568 -> 5517248 (-0.01%) Totals from 486 (0.08% of 628673) affected shaders: Instrs: 396065 -> 395211 (-0.22%); split: -0.23%, +0.01% Cycle count: 17677691 -> 17726342 (+0.28%); split: -0.23%, +0.51% Spill count: 1302 -> 1312 (+0.77%) Fill count: 3746 -> 3756 (+0.27%) Max live registers: 37538 -> 37905 (+0.98%); split: -0.02%, +0.99% Max dispatch width: 4576 -> 4256 (-6.99%) Ice Lake Totals: Instrs: 151348422 -> 151347463 (-0.00%) Cycle count: 15155678386 -> 15155691726 (+0.00%); split: -0.00%, +0.00% Fill count: 108114 -> 108111 (-0.00%) Max live registers: 32444479 -> 32444814 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5611288 -> 5611256 (-0.00%) Totals from 483 (0.08% of 634352) affected shaders: Instrs: 393333 -> 392374 (-0.24%) Cycle count: 16706439 -> 16719779 (+0.08%); split: -0.14%, +0.22% Fill count: 3654 -> 3651 (-0.08%) Max live registers: 37246 -> 37581 (+0.90%); split: -0.02%, +0.92% Max dispatch width: 4312 -> 4280 (-0.74%) Skylake Totals: Instrs: 140741190 -> 140734481 (-0.00%); split: -0.00%, +0.00% Cycle count: 14659096516 -> 14659116346 (+0.00%); split: -0.00%, +0.00% Max live registers: 31757558 -> 31757725 (+0.00%) Max dispatch width: 5470040 -> 5469920 (-0.00%) Totals from 3542 (0.57% of 624449) affected shaders: Instrs: 3081309 -> 3074600 (-0.22%); split: -0.22%, +0.00% Cycle count: 228843073 -> 228862903 (+0.01%); split: -0.11%, +0.12% Max live registers: 304531 -> 304698 (+0.05%) Max dispatch width: 31016 -> 30896 (-0.39%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:05 -07:00
Ian Romanick	adcce2bba4	intel/brw: Small code refactor in brw_fs_opt_saturate_propagation This bit of code will have a second use in the next commit. v2: Fix some broken indentation. Noticed by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:03 -07:00
Ian Romanick	9125b7c1b4	intel/elk: Don't propagate saturate to an instruction that writes flags There are two problems. 1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'. 2. Ignoring the first problem, this only produces the desired flags for LE and G. All other cases can produce the wrong result. shader-db: All Intel platforms had similar results. (Broadwell shown) total instructions in shared programs: 18282314 -> 18282316 (<.01%) instructions in affected programs: 78 -> 80 (2.56%) helped: 0 HURT: 2 total cycles in shared programs: 952924234 -> 952924252 (<.01%) cycles in affected programs: 584 -> 602 (3.08%) helped: 0 HURT: 2 Fixes: `e6022281f2` ("intel/elk: Rename files to use elk prefix") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:01 -07:00
Ian Romanick	3d8fea0e09	intel/brw: Don't propagate saturate to an instruction that writes flags There are two problems. 1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'. 2. Ignoring the first problem, this only produces the desired flags for LE and G. All other cases can produce the wrong result. For example, batman_arkham_city_goty.foz 6a63c4caacaa0dae has the following code: mad.ge.f0.0(8) g51<1>F g50<8,8,1>F g46<8,8,1>F g11<1,1,1>F mov.sat(8) g52<1>F g51<1,1,0>F ... (+f0.0) sel(8) g54<1>UD g53<8,8,1>UD 0x3f000000UD Without this commit, the saturate is incorrectly propagated to the MAD. A similar case exists in witcher_3_dxvk_g2.foz 5b03243be667a275. There are even worse cases like total_war_warhammer3.dx12vk-g6.foz 78328466761ef7ab and ee920491573860fc. The former has the following code (and the latter has very similar code): mad.l.f0.0(16) g95<1>F g93<8,8,1>F g62<8,8,1>F g68<1,1,1>F ... mov.sat(16) g109<1>F -g95<1,1,0>F ... (+f0.0) sel(16) g68<1>UD g111<1,1,0>UD g54<1,1,0>UD (+f0.0) sel(16) g70<1>UD g113<1,1,0>UD g56<1,1,0>UD (+f0.0) sel(16) g72<1>UD g115<1,1,0>UD g58<1,1,0>UD Saturate propagation makes a hash of this code: mad.sat.l.f0.0(16) g106<1>F -g93<8,8,1>F -g62<8,8,1>F g68<1,1,1>F ... (+f0.0) sel(16) g70<1>UD g110<1,1,0>UD g56<1,1,0>UD (+f0.0) sel(16) g72<1>UD g112<1,1,0>UD g58<1,1,0>UD (+f0.0) sel(16) g68<1>UD g108<1,1,0>UD g54<1,1,0>UD Not only is the saturate incorrectly applied to the MAD, but the MAD result is negated without changing the conditional modifier to G! NOTE: Backports of this commit to stable branches may need to be more like the following commit to elk. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19729375 -> 19729377 (<.01%) instructions in affected programs: 112 -> 114 (1.79%) helped: 0 HURT: 2 total cycles in shared programs: 916234266 -> 916234288 (<.01%) cycles in affected programs: 636 -> 658 (3.46%) helped: 0 HURT: 2 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 151531594 -> 151531601 (+0.00%) Cycle count: 17209107419 -> 17209107474 (+0.00%); split: -0.00%, +0.00% Totals from 6 (0.00% of 630198) affected shaders: Instrs: 4550 -> 4557 (+0.15%) Cycle count: 194629 -> 194684 (+0.03%); split: -0.00%, +0.03% Fixes: `947c828d5c` ("i965/fs: Add a saturation propagation optimization pass.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:25:57 -07:00
Ian Romanick	6da4649191	intel/brw: Eliminate dead flag writes This prevents a couple small regressions in the next commit. The only changes in shader-db or fossil-db were on Skylake. This seems to eliminate an unused flags write that doesn't exist on other platforms. With that flag write eliminated, a later CMP can be scheduled better. I did not investigate this further. v2: Clean up some unnecessary bits and add some comments to can_elminate_conditional_mod. Suggested by Ken and Matt. Skylake Totals: Cycle count: 14665454524 -> 14665454444 (-0.00%) Totals from 10 (0.00% of 625685) affected shaders: Cycle count: 38630 -> 38550 (-0.21%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:25:54 -07:00
Alyssa Rosenzweig	bf9a17e2d5	elk: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>	2024-08-09 17:07:59 +00:00
Alyssa Rosenzweig	eec02246f8	brw: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>	2024-08-09 17:07:59 +00:00
Kenneth Graunke	b6f4f64b43	intel/brw: Drop image_{load,store}_raw_intel handling Gfx8 required us to emulate image load store with untyped messages, whereas Gfx9 just has typed message support for everything. brw no longer supports Gfx8, so all of this code is effectively dead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>	2024-08-09 07:20:08 +00:00
Tapani Pälli	7a4020e129	anv: implement workaround for Wa_18038825448 Description states that we need to enable PS_EXTRA state EnablePSdependencyonCPsizechange whenever PixelShaderIsPerCoarsePixel state changes. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Tapani Pälli	9582de9ee3	anv: refactor cmd_buffer_flush_gfx_runtime_state for dirty state Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Lionel Landwerlin	bbfafc71da	anv: limit some state dirtying after blorp/simpler-shaders Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Tapani Pälli	ff8953f666	anv: fix a cmd_buffer reference in simple shader In utrace timestamp copy case cmd_buffer is NULL. Fixes: `dbbcd5c32c` ("anv: factor out generation kernel dispatch into helper") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Tapani Pälli	8dbd38ae32	blorp: support new flag for setting cps dependency This is used with Wa_18038825448 implementation. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Tapani Pälli	91f9da524e	intel/dev: update mesa_defs.json from workaround database Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Caio Oliveira	2e2b83f72d	intel/brw: Use CSE for LOAD_SUBGROUP_INVOCATION Instead of emitting a single one at the top, and making reference to it, emit the virtual instruction as needed and let CSE do its job. Since load_subgroup_invocation now can appear not at the start of the shader, use UNDEF in all cases to ensure that the liveness of the destination doesn't extend to the first partial write done here (it was being used only for SIMD > 8 before). Note this option was considered in the past `6132992cdb` but at the time dismissed. The difference now is that the lowering of the virtual instruction happens earlier than the scheduling. The motivation for this change is to allow passes other than the NIR conversion to use this value. The alternative of storing a `brw_reg` in the shader (instead of NIR state) gets complicated by passes like compact_vgrfs, that move VGRFs around (and update the instructions). This and maybe other passes would have to care about the brw_reg. Fossil-db numbers, TGL ``` * Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/c683ea5067ee157d/fs.32/0, steam-native/shadow_of_the_tomb_raider/f4df450c3cef40b4/fs.32/0, steam-native/shadow_of_the_tomb_raider/94b708fb8e3d9597/fs.32/0, steam-native/shadow_of_the_tomb_raider/19d44c328edabd30/fs.32/0, steam-native/shadow_of_the_tomb_raider/8a7dcbd5a74a19bf/fs.32/0, and 366 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider * Shaders only in 'before' results are ignored: steam-dxvk/octopath_traveler/aaa3d10acb726906/fs.32/0, steam-dxvk/batman_arkham_origins/e6872ae23569c35f/fs.32/0, steam-dxvk/octopath_traveler/fd33a99fa5c271a8/fs.32/0, steam-dxvk/octopath_traveler/9a077cdc16f24520/fs.32/0, steam-dxvk/batman_arkham_city_goty/fac7b438ad52f622/fs.32/0, and 12 more from 4 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-dxvk/octopath_traveler, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 149752381 -> 149751337 (-0.00%); split: -0.00%, +0.00% Cycle count: 11553609349 -> 11549970294 (-0.03%); split: -0.06%, +0.03% Spill count: 42763 -> 42764 (+0.00%); split: -0.01%, +0.01% Fill count: 75650 -> 75651 (+0.00%); split: -0.00%, +0.01% Max live registers: 31725096 -> 31671792 (-0.17%) Max dispatch width: 5546008 -> 5551672 (+0.10%); split: +0.11%, -0.00% Totals from 52574 (8.34% of 630441) affected shaders: Instrs: 9535159 -> 9534115 (-0.01%); split: -0.03%, +0.02% Cycle count: 1006627109 -> 1002988054 (-0.36%); split: -0.65%, +0.29% Spill count: 11588 -> 11589 (+0.01%); split: -0.03%, +0.03% Fill count: 21057 -> 21058 (+0.00%); split: -0.01%, +0.02% Max live registers: 1992493 -> 1939189 (-2.68%) Max dispatch width: 559696 -> 565360 (+1.01%); split: +1.06%, -0.05% ``` and DG2 ``` * Shaders only in 'after' results are ignored: steam-native/shadow_of_the_tomb_raider/1f95a9d3db21df85/fs.32/0, steam-native/shadow_of_the_tomb_raider/56b87c4a46613a2a/fs.32/0, steam-native/shadow_of_the_tomb_raider/a74b4137f85dbbd3/fs.32/0, steam-native/shadow_of_the_tomb_raider/e07e38d3f48e8402/fs.32/0, steam-native/shadow_of_the_tomb_raider/206336789c48996c/fs.32/0, and 268 more from 4 apps: steam-dxvk/alan_wake, steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider * Shaders only in 'before' results are ignored: steam-native/shadow_of_the_tomb_raider/0420d7c3a2ea99ec/fs.32/0, steam-native/shadow_of_the_tomb_raider/2ff39f8bf7d24abb/fs.32/0, steam-native/shadow_of_the_tomb_raider/92d7be2824bd9659/fs.32/0, steam-native/shadow_of_the_tomb_raider/f09ca6d2ecf18015/fs.32/0, steam-native/shadow_of_the_tomb_raider/490f8ffd59e52949/fs.32/0, and 205 more from 3 apps: steam-dxvk/batman_arkham_city_goty, steam-dxvk/batman_arkham_origins, steam-native/shadow_of_the_tomb_raider Totals: Instrs: 151597619 -> 151599914 (+0.00%); split: -0.00%, +0.00% Subgroup size: 7699776 -> 7699784 (+0.00%) Cycle count: 12738501989 -> 12739841170 (+0.01%); split: -0.01%, +0.02% Spill count: 61283 -> 61274 (-0.01%) Fill count: 119886 -> 119849 (-0.03%) Max live registers: 31810432 -> 31758920 (-0.16%) Max dispatch width: 5540128 -> 5541136 (+0.02%); split: +0.08%, -0.06% Totals from 49286 (7.81% of 631231) affected shaders: Instrs: 8607753 -> 8610048 (+0.03%); split: -0.01%, +0.04% Subgroup size: 857752 -> 857760 (+0.00%) Cycle count: 305939495 -> 307278676 (+0.44%); split: -0.28%, +0.72% Spill count: 6339 -> 6330 (-0.14%) Fill count: 12571 -> 12534 (-0.29%) Max live registers: 1788346 -> 1736834 (-2.88%) Max dispatch width: 510920 -> 511928 (+0.20%); split: +0.85%, -0.66% ``` Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30489>	2024-08-08 18:20:49 +00:00
Lionel Landwerlin	10533e7b4c	anv/blorp: force CC_VIEWPORT reallocation when programming 3DSTATE_VIEWPORT_STATE_POINTERS_CC Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11647 Fixes: `fe1baa6481` ("anv: reduce blorp dynamic state emissions") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30532>	2024-08-08 14:13:39 +00:00
David Heidelberg	c2bbdda39b	intel/genxml: fix length of HCP_FQM_STATE for gen20 and 125 Fixes: `7f280e1e93` ("intel/genxml: fix some length of HCP_FQM_STATE") Acked-by: Hyunjun Ko <zzoon@igalia.com> Reviewed-by: Dave Airlie <airlied@redhat.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: David Heidelberg <david@ixit.cz> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30558>	2024-08-08 19:43:41 +09:00
Iván Briano	90defc0087	anv: handle VK_PIPELINE_CREATE_VIEW_INDEX_FROM_DEVICE_INDEX_BIT Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30329>	2024-08-07 19:09:55 +00:00
Paulo Zanoni	0e38b794e2	intel: fix compute SLM sizes on Xe2 and newer Before the patch, intel_device_info_get_max_preferred_slm_size() returns values in kilobytes, but then intel_device_info_get_max_slm_size() is multiplying it by 1024. As a result, LNL is reporting maxComputeSharedMemorySize to be 134217728, which is 128mb. Fix this by making intel_device_info_get_max_slm_size() not multiply it by 1024. This should fix at least the following dEQP tests: dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.128 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.16 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.2 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.4 dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.64 Some tests were failing with: deqp-vk: ../../src/intel/common/intel_compute_slm.c:24: slm_encode_lookup: Assertion `kbytes <= table[table_len - 1].size_in_kb' failed. while other tests were triggering the OOM. v2: - Make everybody return sizes in bytes (José). v3: - Rename variable to bytes (José, Jordan). Fixes: `fd368f5521` ("anv: Set maxComputeSharedMemorySize value for Xe2 platforms") Reviewed-by: Jordan Justen <jordan.l.justen@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30541>	2024-08-07 16:14:02 +00:00
Nanley Chery	54631ebc68	anv: Batch MCS and CCS aux-op flushes The PRMs suggest that certain classes of auxiliary surface operations will automatically synchronize when performed back-to-back: Any transition from any value in {Clear, Render, Resolve} to a different value in {Clear, Render, Resolve} requires end of pipe synchronization. Make use of this functionality by batching CCS and MCS flushes when compatible auxiliary surface operations are performed within a command buffer. Ref: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11325 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>	2024-08-07 15:25:37 +00:00
Nanley Chery	f854161928	anv,iris: Use WriteImmediate instead of Z flush for WA According to the HSD, this is an alternative option for Wa_14016712196. Taking this option allows us to combine this workaround with a couple other depth workarounds. Make sure to execute these workarounds before the workaround for the depth register mode, so that the stalling flush is not impacted. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>	2024-08-07 15:25:37 +00:00
Nanley Chery	db6ae41c65	intel/blorp: Use WA helpers for depth pipecontrol Instead of unconditionally emitting a pipe control on gfx11+, use the workaround helpers for workarounds 1408224581 and 14014097488. Also, add a check for workaround 14016712196, which is also impacted. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>	2024-08-07 15:25:37 +00:00
Nanley Chery	77e4f9690d	anv: Drop flush from unused depth workaround This flush was introduced with the following commits: `8949d27bb8` ("anv: implement gen9 post sync pipe control workaround") `bcb611361b` ("anv: implement gen12 post sync pipe control workaround") The flush was unsued with the following commit: `e79e1ca304` ("intel: Drop Tigerlake revision 0 workarounds") This prevents some extra pipecontrols caused by a following patch. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29922>	2024-08-07 15:25:37 +00:00
Aditya Swarup	ae85f59645	anv: Disable fast clear when surface height is 16k As suggested in WA_16021232440: Disable fast clear when surface height equals 16k. Signed-off-by: Aditya Swarup <aditya.swarup@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29182>	2024-08-06 19:14:04 +00:00
Lionel Landwerlin	6145798022	intel/mi_builder: enable control flow API on Gfx9+ Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:19 +00:00
Lionel Landwerlin	8cc492cb26	genxml: unify some bits between Gfx8/Gfx11/Gfx12.5 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	343e569ab7	anv: ensure max_plane_count is at least 1 This simplifies a bunch of checks throughout the driver. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	4f093b2e2b	anv: add missing MEDIA_STATE_FLUSH for internal shaders Replicating what we do in genX_cmd_compute.c Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `7ca5c84804` ("anv: add support for simple internal compute shaders") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	0bd96e868c	intel-clc: missing printf lowering Useful for printf() debugging in our opencl shader snippets. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	398e6cf38b	anv: reuse cs_prog_data pointer Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	f4a812a229	anv: remove some unused includes Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Lionel Landwerlin	cde72181b7	anv: prevent asserts with debug printf in internal shaders Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30539>	2024-08-06 17:55:18 +00:00
Kenneth Graunke	32cce2f397	intel/brw: Set appropriate types for 16-bit sampler trailing components 16-bit SIMD8 sampler writeback messages come with a bit of padding in them, requiring us to emit a LOAD_PAYLOAD to reorganize the data into the padding-free format expected by NIR. Additionally, we may reduce the response length on the sampler messages based on which components of the (always vec4) NIR destination are actually in use. When we do that, dest_size > read_size, and the trailing components are all empty BAD_FILE registers, indicating the contents are undefined. Unfortunately, we can't ignore those trailing components entirely. In the past, we left them default-initialized, giving us a BAD_FILE register with UD type (which didn't matter, since all sampler returns were 32-bit). But with 16-bit, this was confusing the LOAD_PAYLOAD. For example, writing RGB and skipping A (without sparse) would produce read_size = 3 and dest_size = 4 and nir_dest[5] containing: nir_dest[] = <R:hf, G:hf, B:hf, blank-A:ud, blank-sparse:ud> We'd then call LOAD_PAYLOAD on the first 4 sources, causing it to see 3 HF's and a UD, and try to copy the full 32-bit value at the end, instead of 16-bits of pad like we intended. This meant it would overflow the destination register's size, triggering validation errors. Thanks to Ian Romanick for noticing this, writing a test, and also coming up with a nearly identical fix. Fixes: `0116430d39` ("intel/brw: Handle 16-bit sampler return payloads") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11617 References: https://gitlab.freedesktop.org/mesa/crucible/-/merge_requests/152 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30529>	2024-08-06 17:26:05 +00:00
Alyssa Rosenzweig	d99c2ef059	nir/opt_uniform_atomics: add fs atomics predicated? flag on agx (and mali), we predicate atomics on "if (!helper)", so doing so again in this pass is redundant. and would cause a problem since we'd then have to lower the "is helper inv?" flag late. so just skip the extra lowering code. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Acked-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30488>	2024-08-06 11:48:17 -04:00
Alvin Wong	0413e1f7dc	hasvk: Conditionally expose VK_KHR_present_wait Gate it behind driconf query for now. Co-authored-by: Hans-Kristian Arntzen <post@arntzen-software.no> Acked-by: Hans-Kristian Arntzen <post@arntzen-software.no> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30480>	2024-08-06 11:39:38 +08:00
Kenneth Graunke	c19e5a0a75	intel/brw: Replace predicated break optimization with a simple peephole We can achieve most of what brw_fs_opt_predicated_break() does with simple peepholes at NIR -> BRW conversion time. For predicated break and continue, we can simply look at an IF ... ENDIF sequence after emitting it. If there's a single instruction between the two, and it's a BREAK or CONTINUE, then we can move the predicate from the IF onto the jump, and delete the IF/ENDIF. Because we haven't built the CFG at this stage, we only need to remove them from the linked list of instructions, which is trivial to do. For the predicated while optimization, we can rely on the fact that we already did the predicated break optimization, and simply look for a predicated BREAK just before the WHILE. If so, we move the predicate onto the WHILE, invert it, and remove the BREAK. There are a few cases where this approach does a worse job than the old one: nir_convert_from_ssa may introduce load_reg and store_reg in blocks containing break, and nir_trivialize_registers may decide it needs to insert movs into those blocks. So, at NIR -> BRW time, we'll actually emit some MOVs there, which might have been possible to copy propagate out after later optimizations. However, the fossil-db results show that it's still pretty competitive. For instructions, 1017 shaders were helped (average -1.87 instructions), while only 62 were hurt (average +2.19 instructions). In affected shaders, it was -0.08% for instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	fad63d6483	intel/brw: Delete the brw_fs_opt_dead_control_flow_eliminate() pass With the select peephole gone, this no longer does much of anything. No instruction changes in fossil-db on Alchemist. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	06e8335e11	intel/brw: Delete the brw_fs_opt_peephole_select() pass Now that we can handle load_ubo in NIR's peephole select pass, the backend pass isn't really useful anymore. fossil-db results on Alchemist show almost no impact: Totals: Instrs: 150646561 -> 150647106 (+0.00%); split: -0.00%, +0.00% Cycles: 12633748945 -> 12633760459 (+0.00%) Totals from 261 (0.04% of 630008) affected shaders: Instrs: 404946 -> 405491 (+0.13%); split: -0.00%, +0.14% Cycles: 23947172 -> 23958686 (+0.05%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Kenneth Graunke	7c579f448f	intel/brw: Mark all UBO access with a direct buffer index as speculative UBO loads with a non-indirect buffer index should be safe to perform speculatively. With a direct offset, we may sometimes turn them into push constants, at which point it's just reading a register with no cost at all. Otherwise, we access them via messages that use surface state, and automatically perform bounds checking. So we shouldn't have any issues with reading out of bounds and page faulting, for example. This allows nir_opt_peephole_sel() to operate on load_ubo intrinsics, so we can turn simple if's with loads on both sides to bcsels. In some cases this can collapse a surprising amount of control flow, allowing other optimizations to work better. The i965 OpenGL driver used load_uniform intrinsics, which are allowed in NIR's peephole select pass. But iris uses the Gallium NIR pass that translates uniforms to loads from UBO 0, so we haven't been able to take advantage of NIR's peephole select pass there. The backend pass was still able to handle this to some extent, however. fossil-db results on Alchemist: Totals: Instrs: 150656329 -> 150645307 (-0.01%); split: -0.01%, +0.00% Cycles: 12635230179 -> 12633696811 (-0.01%); split: -0.02%, +0.00% Send messages: 7416330 -> 7416261 (-0.00%) Spill count: 52471 -> 52473 (+0.00%) Fill count: 100818 -> 100803 (-0.01%); split: -0.02%, +0.00% Scratch Memory Size: 3197952 -> 3198976 (+0.03%) Totals from 1848 (0.29% of 630003) affected shaders: Instrs: 1412300 -> 1401278 (-0.78%); split: -0.80%, +0.02% Cycles: 1809789567 -> 1808256199 (-0.08%); split: -0.11%, +0.03% Send messages: 59829 -> 59760 (-0.12%) Spill count: 3870 -> 3872 (+0.05%) Fill count: 9693 -> 9678 (-0.15%); split: -0.18%, +0.02% Scratch Memory Size: 174080 -> 175104 (+0.59%) Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30498>	2024-08-05 19:17:55 -07:00
Felix DeGrood	0eed818588	anv/measure: ignore events from reused command buffers INTEL_MEASURE currently does not support measuring events in parallel from reused command buffers. When this case is detected, warn user and disallow. Fixes observed segfaults in such apps. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30523>	2024-08-05 23:45:41 +00:00

1 2 3 4 5 ...

12524 Commits