aco: use maximum RT vgpr_limit that doesn't reduce wave count

144 instead of 132 with 5 waves, in practice.

Foz-DB Navi31:
Totals from 33 (0.04% of 80273) affected shaders:
Instrs: 3266241 -> 3261329 (-0.15%)
CodeSize: 16885356 -> 16860088 (-0.15%)
VGPRs: 4356 -> 4752 (+9.09%)
SpillVGPRs: 2504 -> 1535 (-38.70%)
Scratch: 264704 -> 216320 (-18.28%)
Latency: 18445909 -> 18395904 (-0.27%)
InvThroughput: 3689182 -> 3679182 (-0.27%)
VClause: 85171 -> 84595 (-0.68%)
SClause: 59365 -> 59320 (-0.08%); split: -0.08%, +0.01%
Copies: 260528 -> 259113 (-0.54%); split: -0.59%, +0.05%
Branches: 92537 -> 92519 (-0.02%)
VALU: 1937426 -> 1935925 (-0.08%); split: -0.08%, +0.01%
SALU: 393075 -> 393047 (-0.01%); split: -0.01%, +0.01%
VMEM: 147914 -> 146003 (-1.29%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37548>
This commit is contained in:
Georg Lehmann
2025-09-24 15:59:33 +02:00
committed by Marge Bot
parent 4b24bc7c70
commit cc08786689

View File

@@ -132,8 +132,12 @@ init_program(Program* program, Stage stage, const struct aco_shader_info* info,
program->dev.sgpr_limit = 104;
}
if (program->stage == raytracing_cs)
program->dev.vgpr_limit = util_align_npot(128, program->dev.vgpr_alloc_granule);
if (program->stage == raytracing_cs) {
unsigned vgpr_limit = util_align_npot(128, program->dev.vgpr_alloc_granule);
unsigned min_waves = program->dev.physical_vgprs / vgpr_limit;
vgpr_limit = program->dev.physical_vgprs / min_waves;
program->dev.vgpr_limit = util_round_down_npot(vgpr_limit, program->dev.vgpr_alloc_granule);
}
program->dev.scratch_alloc_granule = gfx_level >= GFX11 ? 256 : 1024;