4683187f49
This allows us to use aligned loads that can be vectorized, without any downside as 8bit scalar loads always write 16bits of a register. Foz-DB Navi31: Totals from 10 out of 14 FSR4 shader: MaxWaves: 71 -> 68 (-4.23%) Instrs: 60146 -> 59781 (-0.61%); split: -0.67%, +0.06% CodeSize: 412448 -> 413428 (+0.24%); split: -0.11%, +0.35% VGPRs: 2112 -> 2160 (+2.27%) SpillVGPRs: 89 -> 68 (-23.60%) Scratch: 11776 -> 8704 (-26.09%) Latency: 196628 -> 193770 (-1.45%); split: -2.62%, +1.17% InvThroughput: 224944 -> 226274 (+0.59%); split: -0.02%, +0.61% VClause: 862 -> 796 (-7.66%) Copies: 3166 -> 3342 (+5.56%); split: -6.22%, +11.78% Branches: 37 -> 38 (+2.70%) PreSGPRs: 311 -> 312 (+0.32%) PreVGPRs: 2153 -> 2214 (+2.83%); split: -1.35%, +4.18% VALU: 51073 -> 51448 (+0.73%); split: -0.03%, +0.77% SALU: 1072 -> 1074 (+0.19%) VMEM: 3275 -> 2765 (-15.57%) VOPD: 1739 -> 1783 (+2.53%); split: +7.99%, -5.46% Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>