intel/brw/xe3+: Override P value of GRF register classes to increase thread parallelism.

This causes the graph coloring allocator to use the optimistic
coloring codepath for all nodes whose total Q value exceeds the
threshold of 96 GRFs, in order to do a better job at minimizing the
register requirement of programs even when they are trivially
colorable.  At the threshold of 96 GRFs the number of threads
available per EU starts decreasing as the number of register blocks
requested by the program increases, so decreasing the number of
registers can increase performance.

That showed up in some test cases as a performance inversion from the
enabling of VRT, since the extension of the register set to 256 GRFs
has the side effect of making some non-trivially colorable programs
trivially colorable, which would cause the register allocator to do a
worse job at ordering the (trivial) allocations due to the optimistic
coloring path being skipped, leading to increased register use and
reduced performance.

The following Traci test cases improve significantly as a result of
this change (4 iterations, 5% significance):

MetroExodus-trace-dx11-2160p-ultra:                 1.90% ±0.85%
BaldursGate3-trace-dx11-1440p-ultra:                1.47% ±0.38%
Palworld-trace-dx11-1080p-med:                      1.01% ±0.09%
TerminatorResistance-trace-dx11-2160p-ultra:        0.95% ±0.29%
Control-trace-dx11-1440p-high:                      0.87% ±0.50%

Even though lowering the P value threshold is expected to have a cost
in compile time theoretically due to the increased use of the slower
optimistic path of the graph coloring allocator, this doesn't actually
show up in my numbers, my shader-db and fossil-db compile-time numbers
don't show any statistically significant change (13 iterations, 5%
significance).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
This commit is contained in:
Francisco Jerez
2025-07-16 15:55:29 -07:00
committed by Marge Bot
parent 74168a601e
commit 760437c4c4
+3
View File
@@ -119,6 +119,9 @@ brw_alloc_reg_sets(struct brw_compiler *compiler)
for (int reg = 0; reg <= base_reg_count - class_sizes[i]; reg++)
ra_class_add_reg(classes[i], reg);
if (devinfo->ver >= 30 && !INTEL_DEBUG(DEBUG_NO_VRT))
ra_class_override_p(classes[i], 96 - class_sizes[i] + 1);
}
ra_set_finalize(regs, NULL);