To get us started, this is designed to be pretty much the simplest thing
possible. It runs post-RA so we don't need to worry about hurting
occupancy and it uses the classic textbook algorithm for local (single
block) scheduling with the usual latency-weighted-depth heuristic.
-14.22% static cycle count on shaderdb
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32311>