This tries to calculate an underestimate (lower bound) for the register
pressure at various SIMD widths, by counting live values in the NIR
shader. This fundamentally won't be accurate, but it can give us an
idea of whether it's even worth trying a certain SIMD-width compile.
Doing this at the NIR level means we:
- Can use SSA structure rather than fuzzy liveness intervals
- Can avoid the backend scheduler aggressively trying to hide latency,
presenting an overinflated view of the register pressure
- Have divergence information on-hand, making it easier to "scale up"
- Can skip cloning and optimizing NIR for compute shader SIMD widths
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
We were doing a lot of NIR work repeatedly for each SIMD variant of
compute and mesh shaders. Instead, do it once before cloning, and
just do one final optimization loop and out-of-SSA for each.
fossil-db results on Arc B580:
Totals:
Instrs: 233771096 -> 233794024 (+0.01%); split: -0.01%, +0.02%
Subgroup size: 15922768 -> 15922736 (-0.00%); split: +0.00%, -0.00%
Send messages: 12095619 -> 12098234 (+0.02%); split: -0.00%, +0.02%
Loop count: 137562 -> 137523 (-0.03%)
Cycle count: 32600323744 -> 32667411252 (+0.21%); split: -0.06%, +0.27%
Spill count: 540908 -> 542027 (+0.21%); split: -0.07%, +0.28%
Fill count: 700938 -> 698983 (-0.28%); split: -0.73%, +0.45%
Scratch Memory Size: 37266432 -> 37304320 (+0.10%); split: -0.10%, +0.20%
Max live registers: 72691728 -> 72692987 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 67690309 -> 67688352 (-0.00%); split: -0.01%, +0.00%
Totals from 3576 (0.45% of 789301) affected shaders:
Instrs: 6932956 -> 6955884 (+0.33%); split: -0.41%, +0.74%
Subgroup size: 88816 -> 88784 (-0.04%); split: +0.09%, -0.13%
Send messages: 329168 -> 331783 (+0.79%); split: -0.02%, +0.81%
Loop count: 8753 -> 8714 (-0.45%)
Cycle count: 15153678820 -> 15220766328 (+0.44%); split: -0.14%, +0.58%
Spill count: 213751 -> 214870 (+0.52%); split: -0.18%, +0.71%
Fill count: 282616 -> 280661 (-0.69%); split: -1.82%, +1.13%
Scratch Memory Size: 13056000 -> 13093888 (+0.29%); split: -0.27%, +0.56%
Max live registers: 834757 -> 836016 (+0.15%); split: -0.11%, +0.26%
Non SSA regs after NIR: 995033 -> 993076 (-0.20%); split: -0.48%, +0.28%
Looking at a few of the shaders with substantial instruction count
increases, it appears that it is largely due to more loops being
unrolled, which is probably actually a good thing.
The compile time impact of this patch appears to be negligable.
However, doing postprocessing before SIMD cloning allows us to
examine the postprocessed SSA-form NIR for improvements in an
upcoming patch.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
brw_postprocess_nir contains a lot of stuff these days. The first part
does a bunch of lowering and cleanup optimizations in SSA form. The
second part does some post-optimization lowering and the out-of-SSA
conversion.
We may want to do additional work before the post-optimization/post-SSA
phase. Splitting this allows us to insert such tasks in the "middle".
For convenience, brw_postprocess_nir() becomes a wrapper which invokes
both parts, so callers can continue working as they did until they have
a reason to do otherwise.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
This allows us to lower known subgroup size cases earlier, giving us
some earlier optimization opportunities. We would need to know the
actual SIMD width to handle certain cases, but we can just pass 0 here,
which will lead to get_subgroup_size returning 0 - the same as leaving
this unset. We can come back to that later during the per-SIMD-width
postprocessing.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
float_controls2 may have marked these as needing to preserve NaN or
other values. If so, our newly contracted ffma needs to as well.
Fixes dEQP-VK.spirv_assembly.instruction.compute.float_controls2.*.input_args.mat_det_testedWithout_NotNan*
when nir_opt_algebraic is run after this pass.
Cc: mesa-stable
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36750>
In hk_create_drm_physical_device() we might call vk_free() passing
pdev->vk.instance->alloc as first argument, but if we've arrived there
via fail_pdev_alloc the instance has not yet been installed into the
physical device, potentially triggering a SIGSEGV.
Fix it by using a direct reference to the instance as first argument.
Signed-off-by: Sergio Lopez <slp@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37642>
The driver already implements the regular render pass functions in terms of the
VK_KHR_create_renderpass2 functions. However, the extension couldn't be
advertised due to missing support for VK_KHR_multiview. Now multiview is
supported, renderpass2 can be advertised as well.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Acked-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37512>