intel/brw: Fix behavior of scheduler around flag register writes.

We were currently treating explicit flag writes and reads as a full
scheduler barrier, which is unnecessary since the tracking we already
do handles explicit flag access correctly so there is no reason for
taking a possibly large performance hit from add_barrier_deps().

Found by inspection while trying to understand the poor scheduling of
some fragment shaders.  Improves performance by a small but
statistically significant amount (4 iterations, 5% significance) for
the following Traci tests in combination with a subsequent commit that
makes the pre-RA scheduler sensitive to instruction latencies:

SpaceEngineers-trace-dx11-2160p-high:               0.66% ±0.30%
MountAndBlade2-trace-dx11-1440p-veryhigh:           0.62% ±0.23%

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
This commit is contained in:
Francisco Jerez
2025-07-16 15:39:27 -07:00
committed by Marge Bot
parent 17b068ed1c
commit 501b1cbc2c
@@ -1141,6 +1141,9 @@ brw_instruction_scheduler::register_needs_barrier(const brw_reg &reg)
if (reg.file != ARF || reg.is_null())
return false;
if (reg.nr >= BRW_ARF_FLAG && reg.nr < BRW_ARF_MASK)
return false;
/* If you look at SR register layout, there is nothing in there that
* depends on other instructions. This is just fixed dispatch information.
*