intel/brw: Fix behavior of scheduler around flag register writes.

We were currently treating explicit flag writes and reads as a full scheduler barrier, which is unnecessary since the tracking we already do handles explicit flag access correctly so there is no reason for taking a possibly large performance hit from add_barrier_deps(). Found by inspection while trying to understand the poor scheduling of some fragment shaders. Improves performance by a small but statistically significant amount (4 iterations, 5% significance) for the following Traci tests in combination with a subsequent commit that makes the pre-RA scheduler sensitive to instruction latencies: SpaceEngineers-trace-dx11-2160p-high: 0.66% ±0.30% MountAndBlade2-trace-dx11-1440p-veryhigh: 0.62% ±0.23% Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-07-16 15:39:27 -07:00
parent 17b068ed1c
commit 501b1cbc2c
1 changed files with 3 additions and 0 deletions
@@ -1141,6 +1141,9 @@ brw_instruction_scheduler::register_needs_barrier(const brw_reg &reg)
   if (reg.file != ARF || reg.is_null())
      return false;

+   if (reg.nr >= BRW_ARF_FLAG && reg.nr < BRW_ARF_MASK)
+      return false;
+
   /* If you look at SR register layout, there is nothing in there that
    * depends on other instructions. This is just fixed dispatch information.
    *