It looks like the fixed-func hardware is very slow to cull primitives
with zero pos.w but shader based culling helps a lot.
This fixes a massive performance gap with the FSR2 demo compared to
AMDGPU-PRO, +228% on RDNA2.
Based on my investigation, AMDGPU-PRO seems to always cull these
primitives. Note that disabling NGG culling with AMDGPU-PRO reports the
same performance as RADV without that fix. Also note that the FSR2
sample doesn't specify any cull mode (ie. VK_CULL_MODE_NONE is used),
so this is the only reason PRO was culling more than RADV.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7260
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31891>