aco: use maximum RT vgpr_limit that doesn't reduce wave count
144 instead of 132 with 5 waves, in practice. Foz-DB Navi31: Totals from 33 (0.04% of 80273) affected shaders: Instrs: 3266241 -> 3261329 (-0.15%) CodeSize: 16885356 -> 16860088 (-0.15%) VGPRs: 4356 -> 4752 (+9.09%) SpillVGPRs: 2504 -> 1535 (-38.70%) Scratch: 264704 -> 216320 (-18.28%) Latency: 18445909 -> 18395904 (-0.27%) InvThroughput: 3689182 -> 3679182 (-0.27%) VClause: 85171 -> 84595 (-0.68%) SClause: 59365 -> 59320 (-0.08%); split: -0.08%, +0.01% Copies: 260528 -> 259113 (-0.54%); split: -0.59%, +0.05% Branches: 92537 -> 92519 (-0.02%) VALU: 1937426 -> 1935925 (-0.08%); split: -0.08%, +0.01% SALU: 393075 -> 393047 (-0.01%); split: -0.01%, +0.01% VMEM: 147914 -> 146003 (-1.29%) Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37548>
This commit is contained in:
@@ -132,8 +132,12 @@ init_program(Program* program, Stage stage, const struct aco_shader_info* info,
|
||||
program->dev.sgpr_limit = 104;
|
||||
}
|
||||
|
||||
if (program->stage == raytracing_cs)
|
||||
program->dev.vgpr_limit = util_align_npot(128, program->dev.vgpr_alloc_granule);
|
||||
if (program->stage == raytracing_cs) {
|
||||
unsigned vgpr_limit = util_align_npot(128, program->dev.vgpr_alloc_granule);
|
||||
unsigned min_waves = program->dev.physical_vgprs / vgpr_limit;
|
||||
vgpr_limit = program->dev.physical_vgprs / min_waves;
|
||||
program->dev.vgpr_limit = util_round_down_npot(vgpr_limit, program->dev.vgpr_alloc_granule);
|
||||
}
|
||||
|
||||
program->dev.scratch_alloc_granule = gfx_level >= GFX11 ? 256 : 1024;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user