Files
mesa/src/amd
Georg Lehmann 4683187f49 radv/nir/lower_cmat: load gfx11 8bit ACC using the B layout to get aligned loads
This allows us to use aligned loads that can be vectorized, without any
downside as 8bit scalar loads always write 16bits of a register.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shader:
MaxWaves: 71 -> 68 (-4.23%)
Instrs: 60146 -> 59781 (-0.61%); split: -0.67%, +0.06%
CodeSize: 412448 -> 413428 (+0.24%); split: -0.11%, +0.35%
VGPRs: 2112 -> 2160 (+2.27%)
SpillVGPRs: 89 -> 68 (-23.60%)
Scratch: 11776 -> 8704 (-26.09%)
Latency: 196628 -> 193770 (-1.45%); split: -2.62%, +1.17%
InvThroughput: 224944 -> 226274 (+0.59%); split: -0.02%, +0.61%
VClause: 862 -> 796 (-7.66%)
Copies: 3166 -> 3342 (+5.56%); split: -6.22%, +11.78%
Branches: 37 -> 38 (+2.70%)
PreSGPRs: 311 -> 312 (+0.32%)
PreVGPRs: 2153 -> 2214 (+2.83%); split: -1.35%, +4.18%
VALU: 51073 -> 51448 (+0.73%); split: -0.03%, +0.77%
SALU: 1072 -> 1074 (+0.19%)
VMEM: 3275 -> 2765 (-15.57%)
VOPD: 1739 -> 1783 (+2.53%); split: +7.99%, -5.46%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>
2025-07-30 07:25:51 +00:00
..
2025-07-24 06:31:17 +00:00