AlexIndustrial/mesa

Author	SHA1	Message	Date
Rohan Garg	1f06e70bdc	anv: migrate indirect mesh draws to indirect draws on ARL+ Backport-to: 24.2 Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>	2024-08-20 09:41:51 +00:00
Rohan Garg	f69c74b6d5	anv: dispatch indirect draws with a count buffer through the XI hardware on ARL+ ARL+ can dispatch indirect draws through the hardware. Backport-to: 24.2 Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>	2024-08-20 09:41:51 +00:00
Rohan Garg	74cd70841d	anv: refactor indirect draw support into it's own function ARL+ supports some form of indirect draws, instead of trying to mash support for indirect draws across various generations, let's make things cleaner by factoring out XI support into it's own function. Backport-to: 24.2 Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>	2024-08-20 09:41:51 +00:00
Rohan Garg	c1af71c9c2	anv,iris: prefix the argument format with XI for a upcoming refactor Backport-to: 24.2 Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>	2024-08-20 09:41:51 +00:00
Rohan Garg	dc23db2a0d	anv: program a custom byte stride on Xe2 for indirect draws Xe2 allows us to program in a custom byte stride for indirect draws Backport-to: 24.2 Signed-off-by: Rohan Garg <rohan.garg@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>	2024-08-20 09:41:50 +00:00
Tapani Pälli	d4e8c8f874	anv: move setting 3DSTATE_CLIP::MaximumVPIndex from loop Loop iterates viewports but for MaximumVPIndex we only need viewport count and last stage that writes viewport. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30732>	2024-08-20 06:48:50 +00:00
Jianxun Zhang	8c623b6a7e	Revert "anv: Disable PAT-based compression on depth images (xe2)" This reverts commit `6073f091bb`. With the progress on Xe2 platforms, we are not seeing many issues caused by compression on depth buffers. Backport-to: 24.2 Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com> Reviewed-by: Nanley Chery <nanley.g.chery@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30653>	2024-08-19 17:50:10 -07:00
José Roberto de Souza	12656571fd	anv/gfx20: Enable depth buffer write through for multi sampled images BSpec: 56419 Backport-to: 24.2 Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>	2024-08-19 20:04:36 +00:00
Nanley Chery	ebe3eabda6	anv: Add want_hiz_wt_for_image() Backport-to: 24.2 Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>	2024-08-19 20:04:36 +00:00
José Roberto de Souza	2553878fba	intel/isl/gfx20: Alow hierarchial depth buffer write through for multi sampled surfaces BSpec: 56419 Backport-to: 24.2 Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>	2024-08-19 20:04:36 +00:00
Lionel Landwerlin	e10cbb59a5	anv: add assert to detect problematic instruction merges We stick to a rule in the driver that each field is only set in a single place in the driver. Therefore when merging instructions, we should never have any bit set to 1 from both sides. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>	2024-08-19 11:02:44 +00:00
Lionel Landwerlin	982106e676	anv: only set 3DSTATE_CLIP::MaximumVPIndex once Currently we can end up merging 2 prepacked 3DSTATE_CLIP instructions where 2 different places in the driver fill the MaximumVPIndex. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `50f6903bd9` ("anv: add new low level emission & dirty state tracking") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>	2024-08-19 11:02:44 +00:00
Lionel Landwerlin	7c73346549	anv: remove unused macro Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>	2024-08-19 11:02:44 +00:00
Lionel Landwerlin	9eff285a46	anv: fix extended buffer flags usages Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `bcc0ec8e6c` ("anv: enable KHR_maintenance5") Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30714>	2024-08-19 10:13:09 +00:00
Caio Oliveira	40f77b6936	intel/brw: Avoid modifying the shader in assign_curb_setup if not needed If there are no uniforms to push, don't emit the AND or invalidate the shader analysis. This affects only compute shaders. Not a significant impact since lots of shaders end up pushing uniforms. Fossil-db numbers (restricted to compute pipelines only) for DG2 ``` Totals: Instrs: 3071016 -> 3070894 (-0.00%) Cycle count: 8320268863 -> 8320264519 (-0.00%) Totals from 122 (2.70% of 4520) affected shaders: Instrs: 10675 -> 10553 (-1.14%) Cycle count: 2060003 -> 2055659 (-0.21%) ``` Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30631>	2024-08-17 16:25:01 -07:00
José Roberto de Souza	38c989ada2	anv: Nuke anv_utrace_submit::trace_bo There is no usage for this bo. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>	2024-08-16 19:38:19 +00:00
José Roberto de Souza	f7b386bd6d	anv: Use batch_bo_pool in utrace anv_async_submit_init() calls In pratical the only change here is that batch_bo_pool are captured to error dumps. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>	2024-08-16 19:38:19 +00:00
José Roberto de Souza	168e26fc04	anv: Add trivial_batch and query-pool to the error capture Those are batch buffers that are not allocated from batch_bo_pool, so they were left out of error capture without the capture-all parameter. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>	2024-08-16 19:38:18 +00:00
Sagar Ghuge	c4f2a8d984	intel/compiler: Fix indirect offset in GS input read for Xe2+ Make sure to take new GRF size into consideration and adjust the indirect offset according to new size so that when we do the indirect load with address register, we load right values. This helps pass the following tests: - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.geom - dEQP-VK.ray_query.geometry_shader. Backport-to: 24.2 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>	2024-08-16 18:40:13 +00:00
Ian Romanick	c8038643b8	intel/brw: Make ifind_msb SSA friendly No shader-db changes on any Intel platform. v2: Use negate(tmp) instead of creating a new temporary. Suggested by Ken. fossil-db: Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown) Totals: Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00% Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06% Totals from 40 (0.01% of 633223) affected shaders: Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00% Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24% Tiger Lake and Ice Lake had similar results. (Tiger Lake shown) Totals: Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00% Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11% Spill count: 59795 -> 59794 (-0.00%) Fill count: 103513 -> 103509 (-0.00%) Totals from 40 (0.01% of 632445) affected shaders: Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00% Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43% Spill count: 16896 -> 16895 (-0.01%) Fill count: 27819 -> 27815 (-0.01%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Ian Romanick	e9c151fde6	intel/brw: Make 16-bit ishl, ishr, and ushr SSA friendly No shader-db changes on any Intel platform. fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 152536266 -> 152535897 (-0.00%); split: -0.00%, +0.00% Cycle count: 17124901233 -> 17112329592 (-0.07%); split: -0.07%, +0.00% Spill count: 78571 -> 78525 (-0.06%) Fill count: 148178 -> 148132 (-0.03%) Totals from 210 (0.03% of 633223) affected shaders: Instrs: 514525 -> 514156 (-0.07%); split: -0.16%, +0.08% Cycle count: 4003540698 -> 3990969057 (-0.31%); split: -0.32%, +0.00% Spill count: 15632 -> 15586 (-0.29%) Fill count: 26241 -> 26195 (-0.18%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>	2024-08-16 14:52:04 +00:00
Lionel Landwerlin	fbafa9cabd	intel/nir: remove load_global_const_block_intel intrinsic load_global_constant_uniform_block_intel is equivalent in terms of loading, then for the predicate we just do a bcsel afterward in places where that is required. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>	2024-08-16 11:12:39 +00:00
Connor Abbott	de1d36d054	ci: Uprev VK-CTS to 1.3.9.0 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29766>	2024-08-15 09:01:26 +00:00
Caio Oliveira	2150bc6d80	intel/brw: Use %td format for pointer difference Fixes build for 32-bit, again. Fixes: `e72bf2d02f` ("intel: Add executor tool") Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30669>	2024-08-14 17:28:41 -07:00
Caio Oliveira	8a44b4812a	intel/executor: Use PRIx64 to fix building in 32-bit Fixes: `e72bf2d02f` ("intel: Add executor tool") Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30668>	2024-08-14 21:41:28 +00:00
Tapani Pälli	a43f18dd04	intel/dev: update mesa_defs.json from workaround database Most importantly this enables 18038825448 for LNL. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30639>	2024-08-14 11:20:40 +00:00
Caio Oliveira	e72bf2d02f	intel: Add executor tool Add a tool that programs the hardware the minimum amount to be able to execute compute shaders and then executes a script that can perform data manipulation and dispatch execution of the shaders (written in Xe assembly). The goal is to have a tool to experiment directly with certain assembly instructions and the shared units without having to instrument the drivers. To make more convenient to write assembly, a few macros (indicated by the @-symbol) will be processed into the full instruction. For example, the script ``` local r = execute { data={ [42] = 0x100 }, src=[[ @mov g1 42 @read g2 g1 @id g3 add(8) g4<1>UD g2<8,8,1>UD g3<8,8,1>UD { align1 @1 1Q }; @write g3 g4 @eot ]] } dump(r, 4) ``` produces ``` [0x00000000] 0x00000100 0x00000101 0x00000102 0x00000103 ``` There's a help message inside the code that describes the script environment and the macros for assembly sources. Acked-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30062>	2024-08-14 03:03:46 +00:00
Caio Oliveira	6267585778	intel/brw: Also return the size of the assembled shader Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30062>	2024-08-14 03:03:46 +00:00
Paulo Zanoni	20c19351b1	anv: be consistent regarding non-render engines on i915.ko Today, on i915.ko, if Sparse Resources is disabled and the Kernel is new enough to confirm to us that the GuC version is good, we'll expose non-render engines, otherwise we don't. Ever since we merged `5ca224aa0c` ("anv/trtt: make all contexts have the same TR-TT programming"), TR-TT is not anymore the reason why we're not enabling non-render engines. Our performance team has analyzed workloads and concluded enabling non-render engines is not worth it on i915.ko today. So here we adjust the code to do three things: - Stop blaming TR-TT - Unify the default behavior for i915.ko - Don't disable non-render engines when TR-TT is being used on xe.ko. v2: - Comments (José) Acked-by: Felix DeGrood <felix.j.degrood@intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30627>	2024-08-14 01:09:19 +00:00
Michael Cheng	0324d4bcf5	anv: move trace logic to batch_emit_pipe_control_write Move trace logic from cmd_buffer_apply_pipe_flushes down to genX(batch_emit_pipe_control_write). Signed-off-by: Michael Cheng <michael.cheng@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30623>	2024-08-13 21:42:43 +00:00
José Roberto de Souza	600d88ab3c	intel: Remove INTEL_ENGINE_CLASS_COMPUTE and INTEL_ENGINE_CLASS_COPY parameters It has been a while that the GuC version with the compute engine fix was released, same for the KMD uAPI to query the GuC firmware version. So at this point this parameters do more harm than good. Also just setting those don't enable the async compute and copy engines this is not enabled by default on i915. If user wants to disable or enable usage of those engines a better approach would be use ANV_QUEUE_OVERRIDE. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>	2024-08-13 21:15:31 +00:00
José Roberto de Souza	61e3a680a4	anv: Extend ANV_QUEUE_OVERRIDE to blit count Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>	2024-08-13 21:15:31 +00:00
José Roberto de Souza	92f4008473	anv: Disable sparse even on Xe KMD with ANV_SPARSE ANV_SPARSE had no effect on Xe KMD. Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>	2024-08-13 21:15:31 +00:00
Sagar Ghuge	83c2524124	intel/compiler: Adjust trace ray control field on Xe2 Bspec 64643: Structure_TraceRayPayload::Trace Ray Control Bit field moved from 9-8 to 10-8 on Xe2. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Sagar Ghuge	c3c62e493f	intel/compiler: Ray query requires write-back register Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable "When this bit is set in the header, Trace Ray Message behaves like a Ray Query. This message requires a write-back message indicating RayQuery for all valid Rays (SIMD lanes) have completed." If we don't pass the write-back register, somehow it was stepping on over R0 register and can mess up the scratch space accesses which could potentially lead to GPU hang. It can be noticed while running it under simulator trace. send.rta (16\|M0) null r124 r126:1 0x0 0x02000100 {$15} // wr:1+1, rd:0; simd16 trace ray R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Suggested-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>	2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig	5f437aa24d	elk: fix compute shader derivatives derivatives are not fs only so move to be with the rest of subgroup ops. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11674 Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30634>	2024-08-13 12:19:30 +00:00
Lionel Landwerlin	aaff191356	brw/rt: fix ray_object_(direction\|origin) for closest-hit shaders When closest hit shader is called, the BVH object level brw_nir_rt_load_mem_ray origin/direction is 0. What we should be using is the ray origin/direction and apply the transform of the current instance. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9ba7d459a3` ("intel/rt: Implement the new ray-tracing system values") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30578>	2024-08-13 10:28:50 +00:00
Ian Romanick	119801e647	intel/brw: Move fsat instructions closer to the source Intel GPUs have a saturate destination modifier, and brw_fs_opt_saturate_propagation tries to replace explicit saturate operations with this destination modifier. That pass is limited in several ways. If the source of the explicit saturate is in a different block or if the source of the explicit saturate is live after the explicit saturate, brw_fs_opt_saturate_propagation will be unable to make progress. This optimization exists to help brw_fs_opt_saturate_propagation make more progress. It tries to move NIR fsat instructions to the same block that contains the definition of its source. It does this only in cases where it will not create additional live values. It also attempts to do this only in cases where the explicit saturate will ultimiately be converted to a destination modifier. v2: Fix metadata_preserve when theres no progress and use nir_metadata_control_flow when there is progress. All suggested by Alyssa. v3: Fix a typo in the file header comment. Noticed by Ken. Don't require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead of open-coding something slightly more specific. Both suggested by Ken. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19733645 -> 19733028 (<.01%) instructions in affected programs: 193300 -> 192683 (-0.32%) helped: 246 HURT: 1 helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2 helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34% HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1 HURT stats (rel) min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31% 95% mean confidence interval for instructions value: -2.87 -2.13 95% mean confidence interval for instructions %-change: -0.34% -0.32% Instructions are helped. total cycles in shared programs: 916180971 -> 916264656 (<.01%) cycles in affected programs: 30197180 -> 30280865 (0.28%) helped: 194 HURT: 142 helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19 helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23% HURT stats (abs) min: 1 max: 28058 x̄: 1781.68 x̃: 399 HURT stats (rel) min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63% 95% mean confidence interval for cycles value: -196.84 694.97 95% mean confidence interval for cycles %-change: -0.17% 1.27% Inconclusive result (value mean confidence interval includes 0). fossil-db: Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown) Totals: Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00% Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02% Max live registers: 32013312 -> 32013549 (+0.00%) Max dispatch width: 5512304 -> 5512136 (-0.00%) Totals from 774 (0.12% of 630172) affected shaders: Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01% Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30% Max live registers: 82195 -> 82432 (+0.29%) Max dispatch width: 6664 -> 6496 (-2.52%) Ice Lake Totals: Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00% Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01% Max live registers: 32471367 -> 32471603 (+0.00%) Max dispatch width: 5623752 -> 5623712 (-0.00%) Totals from 733 (0.12% of 635598) affected shaders: Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01% Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64% Max live registers: 72067 -> 72303 (+0.33%) Max dispatch width: 6216 -> 6176 (-0.64%) Skylake Totals: Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00% Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01% Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00% Totals from 659 (0.11% of 625670) affected shaders: Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00% Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41% Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48% Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:10 -07:00
Ian Romanick	f5815a003e	intel/brw: Use def analysis for simple cases of saturate propagation I had hoped this would improve compilation performance too. I tried several different long running fossils, and there was no difference. Fossil-db results are all over the place from platform to platform. All of the Tiger Lake shaders hurt for spills and fills are fragment shaders in rdr2. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19734088 -> 19733645 (<.01%) instructions in affected programs: 71200 -> 70757 (-0.62%) helped: 186 HURT: 0 helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 1 helped stats (rel) min: 0.06% max: 2.79% x̄: 0.83% x̃: 0.48% 95% mean confidence interval for instructions value: -2.69 -2.07 95% mean confidence interval for instructions %-change: -0.93% -0.72% Instructions are helped. total cycles in shared programs: 916290473 -> 916180971 (-0.01%) cycles in affected programs: 3403719 -> 3294217 (-3.22%) helped: 89 HURT: 88 helped stats (abs) min: 1 max: 36685 x̄: 1424.13 x̃: 10 helped stats (rel) min: <.01% max: 26.75% x̄: 1.66% x̃: 0.46% HURT stats (abs) min: 1 max: 8750 x̄: 195.98 x̃: 7 HURT stats (rel) min: <.01% max: 17.12% x̄: 1.57% x̃: 0.19% 95% mean confidence interval for cycles value: -1199.88 -37.43 95% mean confidence interval for cycles %-change: -0.66% 0.56% Inconclusive result (%-change mean confidence interval includes 0). fossil-db: Meteor Lake and DG2 had similar results. (Meteor Lake shown) Totals: Instrs: 151458346 -> 151457413 (-0.00%) Cycle count: 17202426472 -> 17202406469 (-0.00%); split: -0.00%, +0.00% Max live registers: 31989626 -> 31989959 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5500560 -> 5500384 (-0.00%) Totals from 479 (0.08% of 628970) affected shaders: Instrs: 398836 -> 397903 (-0.23%) Cycle count: 18064565 -> 18044562 (-0.11%); split: -0.40%, +0.29% Max live registers: 36663 -> 36996 (+0.91%); split: -0.02%, +0.92% Max dispatch width: 4392 -> 4216 (-4.01%) Tiger Lake Totals: Instrs: 149913036 -> 149912182 (-0.00%); split: -0.00%, +0.00% Cycle count: 15560086488 -> 15560135139 (+0.00%); split: -0.00%, +0.00% Spill count: 61241 -> 61251 (+0.02%) Fill count: 107304 -> 107314 (+0.01%) Max live registers: 31964752 -> 31965119 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5517568 -> 5517248 (-0.01%) Totals from 486 (0.08% of 628673) affected shaders: Instrs: 396065 -> 395211 (-0.22%); split: -0.23%, +0.01% Cycle count: 17677691 -> 17726342 (+0.28%); split: -0.23%, +0.51% Spill count: 1302 -> 1312 (+0.77%) Fill count: 3746 -> 3756 (+0.27%) Max live registers: 37538 -> 37905 (+0.98%); split: -0.02%, +0.99% Max dispatch width: 4576 -> 4256 (-6.99%) Ice Lake Totals: Instrs: 151348422 -> 151347463 (-0.00%) Cycle count: 15155678386 -> 15155691726 (+0.00%); split: -0.00%, +0.00% Fill count: 108114 -> 108111 (-0.00%) Max live registers: 32444479 -> 32444814 (+0.00%); split: -0.00%, +0.00% Max dispatch width: 5611288 -> 5611256 (-0.00%) Totals from 483 (0.08% of 634352) affected shaders: Instrs: 393333 -> 392374 (-0.24%) Cycle count: 16706439 -> 16719779 (+0.08%); split: -0.14%, +0.22% Fill count: 3654 -> 3651 (-0.08%) Max live registers: 37246 -> 37581 (+0.90%); split: -0.02%, +0.92% Max dispatch width: 4312 -> 4280 (-0.74%) Skylake Totals: Instrs: 140741190 -> 140734481 (-0.00%); split: -0.00%, +0.00% Cycle count: 14659096516 -> 14659116346 (+0.00%); split: -0.00%, +0.00% Max live registers: 31757558 -> 31757725 (+0.00%) Max dispatch width: 5470040 -> 5469920 (-0.00%) Totals from 3542 (0.57% of 624449) affected shaders: Instrs: 3081309 -> 3074600 (-0.22%); split: -0.22%, +0.00% Cycle count: 228843073 -> 228862903 (+0.01%); split: -0.11%, +0.12% Max live registers: 304531 -> 304698 (+0.05%) Max dispatch width: 31016 -> 30896 (-0.39%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:05 -07:00
Ian Romanick	adcce2bba4	intel/brw: Small code refactor in brw_fs_opt_saturate_propagation This bit of code will have a second use in the next commit. v2: Fix some broken indentation. Noticed by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:03 -07:00
Ian Romanick	9125b7c1b4	intel/elk: Don't propagate saturate to an instruction that writes flags There are two problems. 1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'. 2. Ignoring the first problem, this only produces the desired flags for LE and G. All other cases can produce the wrong result. shader-db: All Intel platforms had similar results. (Broadwell shown) total instructions in shared programs: 18282314 -> 18282316 (<.01%) instructions in affected programs: 78 -> 80 (2.56%) helped: 0 HURT: 2 total cycles in shared programs: 952924234 -> 952924252 (<.01%) cycles in affected programs: 584 -> 602 (3.08%) helped: 0 HURT: 2 Fixes: `e6022281f2` ("intel/elk: Rename files to use elk prefix") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:26:01 -07:00
Ian Romanick	3d8fea0e09	intel/brw: Don't propagate saturate to an instruction that writes flags There are two problems. 1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'. 2. Ignoring the first problem, this only produces the desired flags for LE and G. All other cases can produce the wrong result. For example, batman_arkham_city_goty.foz 6a63c4caacaa0dae has the following code: mad.ge.f0.0(8) g51<1>F g50<8,8,1>F g46<8,8,1>F g11<1,1,1>F mov.sat(8) g52<1>F g51<1,1,0>F ... (+f0.0) sel(8) g54<1>UD g53<8,8,1>UD 0x3f000000UD Without this commit, the saturate is incorrectly propagated to the MAD. A similar case exists in witcher_3_dxvk_g2.foz 5b03243be667a275. There are even worse cases like total_war_warhammer3.dx12vk-g6.foz 78328466761ef7ab and ee920491573860fc. The former has the following code (and the latter has very similar code): mad.l.f0.0(16) g95<1>F g93<8,8,1>F g62<8,8,1>F g68<1,1,1>F ... mov.sat(16) g109<1>F -g95<1,1,0>F ... (+f0.0) sel(16) g68<1>UD g111<1,1,0>UD g54<1,1,0>UD (+f0.0) sel(16) g70<1>UD g113<1,1,0>UD g56<1,1,0>UD (+f0.0) sel(16) g72<1>UD g115<1,1,0>UD g58<1,1,0>UD Saturate propagation makes a hash of this code: mad.sat.l.f0.0(16) g106<1>F -g93<8,8,1>F -g62<8,8,1>F g68<1,1,1>F ... (+f0.0) sel(16) g70<1>UD g110<1,1,0>UD g56<1,1,0>UD (+f0.0) sel(16) g72<1>UD g112<1,1,0>UD g58<1,1,0>UD (+f0.0) sel(16) g68<1>UD g108<1,1,0>UD g54<1,1,0>UD Not only is the saturate incorrectly applied to the MAD, but the MAD result is negated without changing the conditional modifier to G! NOTE: Backports of this commit to stable branches may need to be more like the following commit to elk. shader-db: All Intel platforms had similar results. (Meteor Lake shown) total instructions in shared programs: 19729375 -> 19729377 (<.01%) instructions in affected programs: 112 -> 114 (1.79%) helped: 0 HURT: 2 total cycles in shared programs: 916234266 -> 916234288 (<.01%) cycles in affected programs: 636 -> 658 (3.46%) helped: 0 HURT: 2 fossil-db: All Intel platforms had similar results. (Meteor Lake shown) Totals: Instrs: 151531594 -> 151531601 (+0.00%) Cycle count: 17209107419 -> 17209107474 (+0.00%); split: -0.00%, +0.00% Totals from 6 (0.00% of 630198) affected shaders: Instrs: 4550 -> 4557 (+0.15%) Cycle count: 194629 -> 194684 (+0.03%); split: -0.00%, +0.03% Fixes: `947c828d5c` ("i965/fs: Add a saturation propagation optimization pass.") Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:25:57 -07:00
Ian Romanick	6da4649191	intel/brw: Eliminate dead flag writes This prevents a couple small regressions in the next commit. The only changes in shader-db or fossil-db were on Skylake. This seems to eliminate an unused flags write that doesn't exist on other platforms. With that flag write eliminated, a later CMP can be scheduled better. I did not investigate this further. v2: Clean up some unnecessary bits and add some comments to can_elminate_conditional_mod. Suggested by Ken and Matt. Skylake Totals: Cycle count: 14665454524 -> 14665454444 (-0.00%) Totals from 10 (0.00% of 625685) affected shaders: Cycle count: 38630 -> 38550 (-0.21%) Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>	2024-08-09 14:25:54 -07:00
Alyssa Rosenzweig	bf9a17e2d5	elk: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>	2024-08-09 17:07:59 +00:00
Alyssa Rosenzweig	eec02246f8	brw: switch to derivative intrinsics Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>	2024-08-09 17:07:59 +00:00
Kenneth Graunke	b6f4f64b43	intel/brw: Drop image_{load,store}_raw_intel handling Gfx8 required us to emulate image load store with untyped messages, whereas Gfx9 just has typed message support for everything. brw no longer supports Gfx8, so all of this code is effectively dead. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>	2024-08-09 07:20:08 +00:00
Tapani Pälli	7a4020e129	anv: implement workaround for Wa_18038825448 Description states that we need to enable PS_EXTRA state EnablePSdependencyonCPsizechange whenever PixelShaderIsPerCoarsePixel state changes. Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Tapani Pälli	9582de9ee3	anv: refactor cmd_buffer_flush_gfx_runtime_state for dirty state Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Lionel Landwerlin	bbfafc71da	anv: limit some state dirtying after blorp/simpler-shaders Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00
Tapani Pälli	ff8953f666	anv: fix a cmd_buffer reference in simple shader In utrace timestamp copy case cmd_buffer is NULL. Fixes: `dbbcd5c32c` ("anv: factor out generation kernel dispatch into helper") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>	2024-08-09 07:30:03 +03:00

1 2 3 4 5 ...

12552 Commits