Commit Graph

12552 Commits

Author SHA1 Message Date
Rohan Garg 1f06e70bdc anv: migrate indirect mesh draws to indirect draws on ARL+
Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:51 +00:00
Rohan Garg f69c74b6d5 anv: dispatch indirect draws with a count buffer through the XI hardware on ARL+
ARL+ can dispatch indirect draws through the hardware.

Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:51 +00:00
Rohan Garg 74cd70841d anv: refactor indirect draw support into it's own function
ARL+ supports some form of indirect draws, instead of trying to mash
support for indirect draws across various generations, let's make things
cleaner by factoring out XI support into it's own function.

Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:51 +00:00
Rohan Garg c1af71c9c2 anv,iris: prefix the argument format with XI for a upcoming refactor
Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:51 +00:00
Rohan Garg dc23db2a0d anv: program a custom byte stride on Xe2 for indirect draws
Xe2 allows us to program in a custom byte stride for indirect draws

Backport-to: 24.2
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30690>
2024-08-20 09:41:50 +00:00
Tapani Pälli d4e8c8f874 anv: move setting 3DSTATE_CLIP::MaximumVPIndex from loop
Loop iterates viewports but for MaximumVPIndex we only need viewport
count and last stage that writes viewport.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30732>
2024-08-20 06:48:50 +00:00
Jianxun Zhang 8c623b6a7e Revert "anv: Disable PAT-based compression on depth images (xe2)"
This reverts commit 6073f091bb.

With the progress on Xe2 platforms, we are not seeing many issues
caused by compression on depth buffers.

Backport-to: 24.2
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30653>
2024-08-19 17:50:10 -07:00
José Roberto de Souza 12656571fd anv/gfx20: Enable depth buffer write through for multi sampled images
BSpec: 56419
Backport-to: 24.2
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>
2024-08-19 20:04:36 +00:00
Nanley Chery ebe3eabda6 anv: Add want_hiz_wt_for_image()
Backport-to: 24.2
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>
2024-08-19 20:04:36 +00:00
José Roberto de Souza 2553878fba intel/isl/gfx20: Alow hierarchial depth buffer write through for multi sampled surfaces
BSpec: 56419
Backport-to: 24.2
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29615>
2024-08-19 20:04:36 +00:00
Lionel Landwerlin e10cbb59a5 anv: add assert to detect problematic instruction merges
We stick to a rule in the driver that each field is only set in a
single place in the driver. Therefore when merging instructions, we
should never have any bit set to 1 from both sides.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>
2024-08-19 11:02:44 +00:00
Lionel Landwerlin 982106e676 anv: only set 3DSTATE_CLIP::MaximumVPIndex once
Currently we can end up merging 2 prepacked 3DSTATE_CLIP instructions
where 2 different places in the driver fill the MaximumVPIndex.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 50f6903bd9 ("anv: add new low level emission & dirty state tracking")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>
2024-08-19 11:02:44 +00:00
Lionel Landwerlin 7c73346549 anv: remove unused macro
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30684>
2024-08-19 11:02:44 +00:00
Lionel Landwerlin 9eff285a46 anv: fix extended buffer flags usages
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: bcc0ec8e6c ("anv: enable KHR_maintenance5")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30714>
2024-08-19 10:13:09 +00:00
Caio Oliveira 40f77b6936 intel/brw: Avoid modifying the shader in assign_curb_setup if not needed
If there are no uniforms to push, don't emit the AND or invalidate the
shader analysis.  This affects only compute shaders.

Not a significant impact since lots of shaders end up pushing
uniforms.  Fossil-db numbers (restricted to compute pipelines only) for DG2

```
Totals:
Instrs: 3071016 -> 3070894 (-0.00%)
Cycle count: 8320268863 -> 8320264519 (-0.00%)

Totals from 122 (2.70% of 4520) affected shaders:
Instrs: 10675 -> 10553 (-1.14%)
Cycle count: 2060003 -> 2055659 (-0.21%)
```

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30631>
2024-08-17 16:25:01 -07:00
José Roberto de Souza 38c989ada2 anv: Nuke anv_utrace_submit::trace_bo
There is no usage for this bo.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>
2024-08-16 19:38:19 +00:00
José Roberto de Souza f7b386bd6d anv: Use batch_bo_pool in utrace anv_async_submit_init() calls
In pratical the only change here is that batch_bo_pool
are captured to error dumps.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>
2024-08-16 19:38:19 +00:00
José Roberto de Souza 168e26fc04 anv: Add trivial_batch and query-pool to the error capture
Those are batch buffers that are not allocated from batch_bo_pool,
so they were left out of error capture without the capture-all
parameter.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30676>
2024-08-16 19:38:18 +00:00
Sagar Ghuge c4f2a8d984 intel/compiler: Fix indirect offset in GS input read for Xe2+
Make sure to take new GRF size into consideration and adjust the
indirect offset according to new size so that when we do the indirect
load with address register, we load right values.

This helps pass the following tests:
   - dEQP-VK.binding_model.descriptor_buffer.mutable_descriptor.*geom*
   - dEQP-VK.ray_query.*geometry_shader.*

Backport-to: 24.2
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30679>
2024-08-16 18:40:13 +00:00
Ian Romanick c8038643b8 intel/brw: Make ifind_msb SSA friendly
No shader-db changes on any Intel platform.

v2: Use negate(tmp) instead of creating a new temporary. Suggested by
Ken.

fossil-db:

Meteor Lake, DG2, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 152535897 -> 152535883 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17112329592 -> 17112406110 (+0.00%); split: -0.06%, +0.06%

Totals from 40 (0.01% of 633223) affected shaders:
Instrs: 458813 -> 458799 (-0.00%); split: -0.01%, +0.00%
Cycle count: 4358016282 -> 4358092800 (+0.00%); split: -0.23%, +0.24%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150560511 -> 150560465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15484534441 -> 15482372893 (-0.01%); split: -0.12%, +0.11%
Spill count: 59795 -> 59794 (-0.00%)
Fill count: 103513 -> 103509 (-0.00%)

Totals from 40 (0.01% of 632445) affected shaders:
Instrs: 368877 -> 368831 (-0.01%); split: -0.01%, +0.00%
Cycle count: 3918398264 -> 3916236716 (-0.06%); split: -0.49%, +0.43%
Spill count: 16896 -> 16895 (-0.01%)
Fill count: 27819 -> 27815 (-0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>
2024-08-16 14:52:04 +00:00
Ian Romanick e9c151fde6 intel/brw: Make 16-bit ishl, ishr, and ushr SSA friendly
No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 152536266 -> 152535897 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17124901233 -> 17112329592 (-0.07%); split: -0.07%, +0.00%
Spill count: 78571 -> 78525 (-0.06%)
Fill count: 148178 -> 148132 (-0.03%)

Totals from 210 (0.03% of 633223) affected shaders:
Instrs: 514525 -> 514156 (-0.07%); split: -0.16%, +0.08%
Cycle count: 4003540698 -> 3990969057 (-0.31%); split: -0.32%, +0.00%
Spill count: 15632 -> 15586 (-0.29%)
Fill count: 26241 -> 26195 (-0.18%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30650>
2024-08-16 14:52:04 +00:00
Lionel Landwerlin fbafa9cabd intel/nir: remove load_global_const_block_intel intrinsic
load_global_constant_uniform_block_intel is equivalent in terms of
loading, then for the predicate we just do a bcsel afterward in places
where that is required.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30659>
2024-08-16 11:12:39 +00:00
Connor Abbott de1d36d054 ci: Uprev VK-CTS to 1.3.9.0
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29766>
2024-08-15 09:01:26 +00:00
Caio Oliveira 2150bc6d80 intel/brw: Use %td format for pointer difference
Fixes build for 32-bit, again.

Fixes: e72bf2d02f ("intel: Add executor tool")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30669>
2024-08-14 17:28:41 -07:00
Caio Oliveira 8a44b4812a intel/executor: Use PRIx64 to fix building in 32-bit
Fixes: e72bf2d02f ("intel: Add executor tool")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30668>
2024-08-14 21:41:28 +00:00
Tapani Pälli a43f18dd04 intel/dev: update mesa_defs.json from workaround database
Most importantly this enables 18038825448 for LNL.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30639>
2024-08-14 11:20:40 +00:00
Caio Oliveira e72bf2d02f intel: Add executor tool
Add a tool that programs the hardware the minimum amount to be
able to execute compute shaders and then executes a script that
can perform data manipulation and dispatch execution of the shaders
(written in Xe assembly).

The goal is to have a tool to experiment directly with certain
assembly instructions and the shared units without having to
instrument the drivers.

To make more convenient to write assembly, a few macros (indicated by
the @-symbol) will be processed into the full instruction.

For example, the script

```
  local r = execute {
    data={ [42] = 0x100 },
    src=[[
      @mov     g1      42
      @read    g2      g1

      @id      g3

      add(8)   g4<1>UD  g2<8,8,1>UD  g3<8,8,1>UD  { align1 @1 1Q };

      @write   g3       g4
      @eot
    ]]
  }

  dump(r, 4)
```

produces

```
  [0x00000000] 0x00000100 0x00000101 0x00000102 0x00000103
```

There's a help message inside the code that describes the script
environment and the macros for assembly sources.

Acked-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30062>
2024-08-14 03:03:46 +00:00
Caio Oliveira 6267585778 intel/brw: Also return the size of the assembled shader
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30062>
2024-08-14 03:03:46 +00:00
Paulo Zanoni 20c19351b1 anv: be consistent regarding non-render engines on i915.ko
Today, on i915.ko, if Sparse Resources is disabled and the Kernel is
new enough to confirm to us that the GuC version is good, we'll expose
non-render engines, otherwise we don't.

Ever since we merged 5ca224aa0c ("anv/trtt: make all contexts have
the same TR-TT programming"), TR-TT is not anymore the reason why
we're not enabling non-render engines. Our performance team has
analyzed workloads and concluded enabling non-render engines is not
worth it on i915.ko today.

So here we adjust the code to do three things:
 - Stop blaming TR-TT
 - Unify the default behavior for i915.ko
 - Don't disable non-render engines when TR-TT is being used on xe.ko.

v2:
- Comments (José)

Acked-by: Felix DeGrood <felix.j.degrood@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30627>
2024-08-14 01:09:19 +00:00
Michael Cheng 0324d4bcf5 anv: move trace logic to batch_emit_pipe_control_write
Move trace logic from cmd_buffer_apply_pipe_flushes down to
genX(batch_emit_pipe_control_write).

Signed-off-by: Michael Cheng <michael.cheng@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30623>
2024-08-13 21:42:43 +00:00
José Roberto de Souza 600d88ab3c intel: Remove INTEL_ENGINE_CLASS_COMPUTE and INTEL_ENGINE_CLASS_COPY parameters
It has been a while that the GuC version with the compute engine fix
was released, same for the KMD uAPI to query the GuC firmware version.
So at this point this parameters do more harm than good.

Also just setting those don't enable the async compute and copy engines
this is not enabled by default on i915.

If user wants to disable or enable usage of those engines a better
approach would be use ANV_QUEUE_OVERRIDE.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>
2024-08-13 21:15:31 +00:00
José Roberto de Souza 61e3a680a4 anv: Extend ANV_QUEUE_OVERRIDE to blit count
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>
2024-08-13 21:15:31 +00:00
José Roberto de Souza 92f4008473 anv: Disable sparse even on Xe KMD with ANV_SPARSE
ANV_SPARSE had no effect on Xe KMD.

Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30593>
2024-08-13 21:15:31 +00:00
Sagar Ghuge 83c2524124 intel/compiler: Adjust trace ray control field on Xe2
Bspec 64643: Structure_TraceRayPayload::Trace Ray Control

Bit field moved from 9-8 to 10-8 on Xe2.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>
2024-08-13 20:02:24 +00:00
Sagar Ghuge c3c62e493f intel/compiler: Ray query requires write-back register
Bspec 57508: Structure_SIMD16TraceRayMessage:: RayQuery Enable

   "When this bit is set in the header, Trace Ray Message behaves like a
   Ray Query. This message requires a write-back message indicating
   RayQuery for all valid Rays (SIMD lanes) have completed."

If we don't pass the write-back register, somehow it was stepping on
over R0 register and can mess up the scratch space accesses which could
potentially lead to GPU hang. It can be noticed while running it under
simulator trace.

send.rta (16|M0)         null     r124  r126:1  0x0            0x02000100           {$15} // wr:1+1, rd:0; simd16 trace ray
R0 = 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001 00000000 00000000 00000001

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30600>
2024-08-13 20:02:24 +00:00
Alyssa Rosenzweig 5f437aa24d elk: fix compute shader derivatives
derivatives are not fs only so move to be with the rest of subgroup ops.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11674
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30634>
2024-08-13 12:19:30 +00:00
Lionel Landwerlin aaff191356 brw/rt: fix ray_object_(direction|origin) for closest-hit shaders
When closest hit shader is called, the BVH object level
brw_nir_rt_load_mem_ray origin/direction is 0. What we should be using
is the ray origin/direction and apply the transform of the current
instance.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9ba7d459a3 ("intel/rt: Implement the new ray-tracing system values")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30578>
2024-08-13 10:28:50 +00:00
Ian Romanick 119801e647 intel/brw: Move fsat instructions closer to the source
Intel GPUs have a saturate destination modifier, and
brw_fs_opt_saturate_propagation tries to replace explicit saturate
operations with this destination modifier. That pass is limited in
several ways. If the source of the explicit saturate is in a different
block or if the source of the explicit saturate is live after the
explicit saturate, brw_fs_opt_saturate_propagation will be unable to
make progress.

This optimization exists to help brw_fs_opt_saturate_propagation make
more progress. It tries to move NIR fsat instructions to the same block
that contains the definition of its source. It does this only in cases
where it will not create additional live values. It also attempts to do
this only in cases where the explicit saturate will ultimiately be
converted to a destination modifier.

v2: Fix metadata_preserve when theres no progress and use
nir_metadata_control_flow when there is progress. All suggested by
Alyssa.

v3: Fix a typo in the file header comment. Noticed by Ken.  Don't
require nir_metadata_instr_index. Use nir_def_rewrite_uses_after instead
of open-coding something slightly more specific. Both suggested by Ken.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19733645 -> 19733028 (<.01%)
instructions in affected programs: 193300 -> 192683 (-0.32%)
helped: 246
HURT: 1
helped stats (abs) min: 2 max: 48 x̄: 2.51 x̃: 2
helped stats (rel) min: 0.18% max: 0.39% x̄: 0.33% x̃: 0.34%
HURT stats (abs)   min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel)   min: 0.31% max: 0.31% x̄: 0.31% x̃: 0.31%
95% mean confidence interval for instructions value: -2.87 -2.13
95% mean confidence interval for instructions %-change: -0.34% -0.32%
Instructions are helped.

total cycles in shared programs: 916180971 -> 916264656 (<.01%)
cycles in affected programs: 30197180 -> 30280865 (0.28%)
helped: 194
HURT: 142
helped stats (abs) min: 1 max: 21251 x̄: 872.75 x̃: 19
helped stats (rel) min: <.01% max: 23.17% x̄: 2.59% x̃: 0.23%
HURT stats (abs)   min: 1 max: 28058 x̄: 1781.68 x̃: 399
HURT stats (rel)   min: <.01% max: 37.21% x̄: 4.85% x̃: 1.63%
95% mean confidence interval for cycles value: -196.84 694.97
95% mean confidence interval for cycles %-change: -0.17% 1.27%
Inconclusive result (value mean confidence interval includes 0).

fossil-db:

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 151512021 -> 151511351 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17209013596 -> 17209840995 (+0.00%); split: -0.02%, +0.02%
Max live registers: 32013312 -> 32013549 (+0.00%)
Max dispatch width: 5512304 -> 5512136 (-0.00%)

Totals from 774 (0.12% of 630172) affected shaders:
Instrs: 1559285 -> 1558615 (-0.04%); split: -0.05%, +0.01%
Cycle count: 1312656268 -> 1313483667 (+0.06%); split: -0.24%, +0.30%
Max live registers: 82195 -> 82432 (+0.29%)
Max dispatch width: 6664 -> 6496 (-2.52%)

Ice Lake
Totals:
Instrs: 151416791 -> 151416137 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15162468885 -> 15163298824 (+0.01%); split: -0.00%, +0.01%
Max live registers: 32471367 -> 32471603 (+0.00%)
Max dispatch width: 5623752 -> 5623712 (-0.00%)

Totals from 733 (0.12% of 635598) affected shaders:
Instrs: 877965 -> 877311 (-0.07%); split: -0.09%, +0.01%
Cycle count: 190763628 -> 191593567 (+0.44%); split: -0.21%, +0.64%
Max live registers: 72067 -> 72303 (+0.33%)
Max dispatch width: 6216 -> 6176 (-0.64%)

Skylake
Totals:
Instrs: 140794845 -> 140794075 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14665159301 -> 14665320514 (+0.00%); split: -0.00%, +0.01%
Max live registers: 31783341 -> 31783662 (+0.00%); split: -0.00%, +0.00%

Totals from 659 (0.11% of 625670) affected shaders:
Instrs: 829061 -> 828291 (-0.09%); split: -0.09%, +0.00%
Cycle count: 185478478 -> 185639691 (+0.09%); split: -0.33%, +0.41%
Max live registers: 67491 -> 67812 (+0.48%); split: -0.01%, +0.48%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:10 -07:00
Ian Romanick f5815a003e intel/brw: Use def analysis for simple cases of saturate propagation
I had hoped this would improve compilation performance too. I tried
several different long running fossils, and there was no difference.

Fossil-db results are all over the place from platform to platform.

All of the Tiger Lake shaders hurt for spills and fills are fragment
shaders in rdr2.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19734088 -> 19733645 (<.01%)
instructions in affected programs: 71200 -> 70757 (-0.62%)
helped: 186
HURT: 0
helped stats (abs) min: 1 max: 7 x̄: 2.38 x̃: 1
helped stats (rel) min: 0.06% max: 2.79% x̄: 0.83% x̃: 0.48%
95% mean confidence interval for instructions value: -2.69 -2.07
95% mean confidence interval for instructions %-change: -0.93% -0.72%
Instructions are helped.

total cycles in shared programs: 916290473 -> 916180971 (-0.01%)
cycles in affected programs: 3403719 -> 3294217 (-3.22%)
helped: 89
HURT: 88
helped stats (abs) min: 1 max: 36685 x̄: 1424.13 x̃: 10
helped stats (rel) min: <.01% max: 26.75% x̄: 1.66% x̃: 0.46%
HURT stats (abs)   min: 1 max: 8750 x̄: 195.98 x̃: 7
HURT stats (rel)   min: <.01% max: 17.12% x̄: 1.57% x̃: 0.19%
95% mean confidence interval for cycles value: -1199.88 -37.43
95% mean confidence interval for cycles %-change: -0.66% 0.56%
Inconclusive result (%-change mean confidence interval includes 0).

fossil-db:

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 151458346 -> 151457413 (-0.00%)
Cycle count: 17202426472 -> 17202406469 (-0.00%); split: -0.00%, +0.00%
Max live registers: 31989626 -> 31989959 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5500560 -> 5500384 (-0.00%)

Totals from 479 (0.08% of 628970) affected shaders:
Instrs: 398836 -> 397903 (-0.23%)
Cycle count: 18064565 -> 18044562 (-0.11%); split: -0.40%, +0.29%
Max live registers: 36663 -> 36996 (+0.91%); split: -0.02%, +0.92%
Max dispatch width: 4392 -> 4216 (-4.01%)

Tiger Lake
Totals:
Instrs: 149913036 -> 149912182 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15560086488 -> 15560135139 (+0.00%); split: -0.00%, +0.00%
Spill count: 61241 -> 61251 (+0.02%)
Fill count: 107304 -> 107314 (+0.01%)
Max live registers: 31964752 -> 31965119 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5517568 -> 5517248 (-0.01%)

Totals from 486 (0.08% of 628673) affected shaders:
Instrs: 396065 -> 395211 (-0.22%); split: -0.23%, +0.01%
Cycle count: 17677691 -> 17726342 (+0.28%); split: -0.23%, +0.51%
Spill count: 1302 -> 1312 (+0.77%)
Fill count: 3746 -> 3756 (+0.27%)
Max live registers: 37538 -> 37905 (+0.98%); split: -0.02%, +0.99%
Max dispatch width: 4576 -> 4256 (-6.99%)

Ice Lake
Totals:
Instrs: 151348422 -> 151347463 (-0.00%)
Cycle count: 15155678386 -> 15155691726 (+0.00%); split: -0.00%, +0.00%
Fill count: 108114 -> 108111 (-0.00%)
Max live registers: 32444479 -> 32444814 (+0.00%); split: -0.00%, +0.00%
Max dispatch width: 5611288 -> 5611256 (-0.00%)

Totals from 483 (0.08% of 634352) affected shaders:
Instrs: 393333 -> 392374 (-0.24%)
Cycle count: 16706439 -> 16719779 (+0.08%); split: -0.14%, +0.22%
Fill count: 3654 -> 3651 (-0.08%)
Max live registers: 37246 -> 37581 (+0.90%); split: -0.02%, +0.92%
Max dispatch width: 4312 -> 4280 (-0.74%)

Skylake
Totals:
Instrs: 140741190 -> 140734481 (-0.00%); split: -0.00%, +0.00%
Cycle count: 14659096516 -> 14659116346 (+0.00%); split: -0.00%, +0.00%
Max live registers: 31757558 -> 31757725 (+0.00%)
Max dispatch width: 5470040 -> 5469920 (-0.00%)

Totals from 3542 (0.57% of 624449) affected shaders:
Instrs: 3081309 -> 3074600 (-0.22%); split: -0.22%, +0.00%
Cycle count: 228843073 -> 228862903 (+0.01%); split: -0.11%, +0.12%
Max live registers: 304531 -> 304698 (+0.05%)
Max dispatch width: 31016 -> 30896 (-0.39%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:05 -07:00
Ian Romanick adcce2bba4 intel/brw: Small code refactor in brw_fs_opt_saturate_propagation
This bit of code will have a second use in the next commit.

v2: Fix some broken indentation. Noticed by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:03 -07:00
Ian Romanick 9125b7c1b4 intel/elk: Don't propagate saturate to an instruction that writes flags
There are two problems.

1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a
   different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'.

2. Ignoring the first problem, this only produces the desired flags
   for LE and G. All other cases can produce the wrong result.

shader-db:

All Intel platforms had similar results. (Broadwell shown)
total instructions in shared programs: 18282314 -> 18282316 (<.01%)
instructions in affected programs: 78 -> 80 (2.56%)
helped: 0
HURT: 2

total cycles in shared programs: 952924234 -> 952924252 (<.01%)
cycles in affected programs: 584 -> 602 (3.08%)
helped: 0
HURT: 2

Fixes: e6022281f2 ("intel/elk: Rename files to use elk prefix")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:26:01 -07:00
Ian Romanick 3d8fea0e09 intel/brw: Don't propagate saturate to an instruction that writes flags
There are two problems.

1. This is not NaN safe. 'add.le.sat dst F, Inf F, -Inf F' has a
   different result than 'add dst F, Inf F, -Inf F; cmp.le null, dst F, 0F'.

2. Ignoring the first problem, this only produces the desired flags
   for LE and G. All other cases can produce the wrong result.

For example, batman_arkham_city_goty.foz 6a63c4caacaa0dae has the
following code:

    mad.ge.f0.0(8)  g51<1>F         g50<8,8,1>F     g46<8,8,1>F     g11<1,1,1>F
    mov.sat(8)      g52<1>F         g51<1,1,0>F
    ...
    (+f0.0) sel(8)  g54<1>UD        g53<8,8,1>UD    0x3f000000UD

Without this commit, the saturate is incorrectly propagated to the MAD.

A similar case exists in witcher_3_dxvk_g2.foz 5b03243be667a275.

There are even worse cases like total_war_warhammer3.dx12vk-g6.foz
78328466761ef7ab and ee920491573860fc. The former has the following
code (and the latter has very similar code):

    mad.l.f0.0(16)  g95<1>F         g93<8,8,1>F     g62<8,8,1>F     g68<1,1,1>F
    ...
    mov.sat(16)     g109<1>F        -g95<1,1,0>F
    ...
    (+f0.0) sel(16) g68<1>UD        g111<1,1,0>UD   g54<1,1,0>UD
    (+f0.0) sel(16) g70<1>UD        g113<1,1,0>UD   g56<1,1,0>UD
    (+f0.0) sel(16) g72<1>UD        g115<1,1,0>UD   g58<1,1,0>UD

Saturate propagation makes a hash of this code:

    mad.sat.l.f0.0(16) g106<1>F     -g93<8,8,1>F    -g62<8,8,1>F    g68<1,1,1>F
    ...
    (+f0.0) sel(16) g70<1>UD        g110<1,1,0>UD   g56<1,1,0>UD
    (+f0.0) sel(16) g72<1>UD        g112<1,1,0>UD   g58<1,1,0>UD
    (+f0.0) sel(16) g68<1>UD        g108<1,1,0>UD   g54<1,1,0>UD

Not only is the saturate incorrectly applied to the MAD, but the MAD
result is negated without changing the conditional modifier to G!

NOTE: Backports of this commit to stable branches may need to be more
like the following commit to elk.

shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19729375 -> 19729377 (<.01%)
instructions in affected programs: 112 -> 114 (1.79%)
helped: 0
HURT: 2

total cycles in shared programs: 916234266 -> 916234288 (<.01%)
cycles in affected programs: 636 -> 658 (3.46%)
helped: 0
HURT: 2

fossil-db:

All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 151531594 -> 151531601 (+0.00%)
Cycle count: 17209107419 -> 17209107474 (+0.00%); split: -0.00%, +0.00%

Totals from 6 (0.00% of 630198) affected shaders:
Instrs: 4550 -> 4557 (+0.15%)
Cycle count: 194629 -> 194684 (+0.03%); split: -0.00%, +0.03%

Fixes: 947c828d5c ("i965/fs: Add a saturation propagation optimization pass.")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:25:57 -07:00
Ian Romanick 6da4649191 intel/brw: Eliminate dead flag writes
This prevents a couple small regressions in the next commit.

The only changes in shader-db or fossil-db were on Skylake. This seems
to eliminate an unused flags write that doesn't exist on other platforms.
With that flag write eliminated, a later CMP can be scheduled better.

I did not investigate this further.

v2: Clean up some unnecessary bits and add some comments to
can_elminate_conditional_mod. Suggested by Ken and Matt.

Skylake
Totals:
Cycle count: 14665454524 -> 14665454444 (-0.00%)

Totals from 10 (0.00% of 625685) affected shaders:
Cycle count: 38630 -> 38550 (-0.21%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29774>
2024-08-09 14:25:54 -07:00
Alyssa Rosenzweig bf9a17e2d5 elk: switch to derivative intrinsics
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>
2024-08-09 17:07:59 +00:00
Alyssa Rosenzweig eec02246f8 brw: switch to derivative intrinsics
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30566>
2024-08-09 17:07:59 +00:00
Kenneth Graunke b6f4f64b43 intel/brw: Drop image_{load,store}_raw_intel handling
Gfx8 required us to emulate image load store with untyped messages,
whereas Gfx9 just has typed message support for everything.  brw no
longer supports Gfx8, so all of this code is effectively dead.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30576>
2024-08-09 07:20:08 +00:00
Tapani Pälli 7a4020e129 anv: implement workaround for Wa_18038825448
Description states that we need to enable PS_EXTRA state
EnablePSdependencyonCPsizechange whenever PixelShaderIsPerCoarsePixel
state changes.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>
2024-08-09 07:30:03 +03:00
Tapani Pälli 9582de9ee3 anv: refactor cmd_buffer_flush_gfx_runtime_state for dirty state
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>
2024-08-09 07:30:03 +03:00
Lionel Landwerlin bbfafc71da anv: limit some state dirtying after blorp/simpler-shaders
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>
2024-08-09 07:30:03 +03:00
Tapani Pälli ff8953f666 anv: fix a cmd_buffer reference in simple shader
In utrace timestamp copy case cmd_buffer is NULL.

Fixes: dbbcd5c32c ("anv: factor out generation kernel dispatch into helper")
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30475>
2024-08-09 07:30:03 +03:00