From 760437c4c4b30510302c79cc132fced2f2aa4fa5 Mon Sep 17 00:00:00 2001 From: Francisco Jerez Date: Wed, 16 Jul 2025 15:55:29 -0700 Subject: [PATCH] intel/brw/xe3+: Override P value of GRF register classes to increase thread parallelism. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This causes the graph coloring allocator to use the optimistic coloring codepath for all nodes whose total Q value exceeds the threshold of 96 GRFs, in order to do a better job at minimizing the register requirement of programs even when they are trivially colorable. At the threshold of 96 GRFs the number of threads available per EU starts decreasing as the number of register blocks requested by the program increases, so decreasing the number of registers can increase performance. That showed up in some test cases as a performance inversion from the enabling of VRT, since the extension of the register set to 256 GRFs has the side effect of making some non-trivially colorable programs trivially colorable, which would cause the register allocator to do a worse job at ordering the (trivial) allocations due to the optimistic coloring path being skipped, leading to increased register use and reduced performance. The following Traci test cases improve significantly as a result of this change (4 iterations, 5% significance): MetroExodus-trace-dx11-2160p-ultra: 1.90% ±0.85% BaldursGate3-trace-dx11-1440p-ultra: 1.47% ±0.38% Palworld-trace-dx11-1080p-med: 1.01% ±0.09% TerminatorResistance-trace-dx11-2160p-ultra: 0.95% ±0.29% Control-trace-dx11-1440p-high: 0.87% ±0.50% Even though lowering the P value threshold is expected to have a cost in compile time theoretically due to the increased use of the slower optimistic path of the graph coloring allocator, this doesn't actually show up in my numbers, my shader-db and fossil-db compile-time numbers don't show any statistically significant change (13 iterations, 5% significance). Reviewed-by: Lionel Landwerlin Part-of: --- src/intel/compiler/brw_reg_allocate.cpp | 3 +++ 1 file changed, 3 insertions(+) diff --git a/src/intel/compiler/brw_reg_allocate.cpp b/src/intel/compiler/brw_reg_allocate.cpp index 1aa79fe5151..9170d0f60c3 100644 --- a/src/intel/compiler/brw_reg_allocate.cpp +++ b/src/intel/compiler/brw_reg_allocate.cpp @@ -119,6 +119,9 @@ brw_alloc_reg_sets(struct brw_compiler *compiler) for (int reg = 0; reg <= base_reg_count - class_sizes[i]; reg++) ra_class_add_reg(classes[i], reg); + + if (devinfo->ver >= 30 && !INTEL_DEBUG(DEBUG_NO_VRT)) + ra_class_override_p(classes[i], 96 - class_sizes[i] + 1); } ra_set_finalize(regs, NULL);