The main difference between load_global and load_global_constant is that
the latter can be reordered arbitrarily. If the access being lowered is
already tagged as being reorderable, then we can preserve that by using
the load_global_constant intrinsics instead of load_global. This gives
us more flexibility.
On Intel, this lets us use the load_global_constant_uniform_block_intel
intrinsic for doing convergent block loads in more cases. This nets us
significant reductions in spill/fills: Borderlands 3 on Lunarlake sees
spills/fills reduced by 53%. Alchemist sees a 13% reduction.
Improves performance of Borderlands 3 DX12 on Intel Battlemage by
around 44%. Improves Hogwarts Legacy by around 14%.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>
We were accidentally doing a signed integer comparison here for ult32,
or a sign-extending shift for ushr.
One notable bit of fallout was that load_global_uniform_block_intel
address calculations broke on platforms that don't have native 64-bit
integer support, as the iadd64 lowering for "do I need to carry?" was
using ult32...and performing the wrong comparison. We spotted this in
Borderlands 3 on Alchemist once we turned on other optimizations.
Thanks to Lionel Landwerlin for helping spot the problem!
Fixes: c7b312ad45 ("brw: factor out source extraction for rematerialization")
Fixes: 339630ab05 ("brw: enable A64 loads source rematerialization")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>
This was triggering an assertion in the fs_builder::MOV helper that
the destination stride can't be 0 when dispatch_width > 1. What we
want to do is copy the single 64-bit channel of data from the UNIFORM
file to a VGRF. We can use a SIMD1 builder for that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31995>
Previously, double-buffer mode would apply to all jobs except msaa,
but this is not smart, since not all jobs can benefit from this. In
particular, if jobs load the tile buffer and don't store tiles
double buffer mode won't be effective and we would instead pay the
cost of the smaller tile size so we only want to enable in jobs
that meet these requirements.
In order to achieve this we need to postpone the decision about
double buffering until we know the loads and stores of the job,
which means we need to do this late after we have recorded draws.
This means that by default, we assume no double-buffer mode is
used and if we find we want to enable after emitting the draws
we need to re-compute tile sizes and rewrite the
TILE_BINNING_MODE_CFG packet accordingly.
Making the decison about double-buffer late will also enable us to
add heuristics to decide about double-buffer based on the draw calls
emitted in the job, but we will do this in a separate patch.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32111>
We want to make the decision about double-buffer enablement much later
when we have enough information to make it. That means we might need
to rewrite this packet, so we need to save a pointer to its location
in the CL.
Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32111>
By restricting these limitations up to GFX 12, CCS support
can be present on these cases that we think Xe2+ platform
should support compression.
Noticeably, CCS is allowed on depth resources without HiZ,
multi-sampled resources without CCS, and multi-sampled
stencil resources.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31496>
Two aux functions are updated to fix piglit test once CCS is enabled on
multi-sampled stencil resources in a following change. As reviewers
suggested, we don't see much value of the assertion.
https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31496#note_2601233
Piglit test:
fbo-depthstencil blit default_fb -samples=16 -auto
iris_resolve.c:972: iris_resource_get_aux_state: Assertion
`res->surf.samples == 1 || res->surf.msaa_layout == ISL_MSAA_LAYOUT_ARRAY' failed.
iris_resolve.c:996: iris_resource_set_aux_state: Assertion
`res->surf.samples == 1 || res->surf.msaa_layout == ISL_MSAA_LAYOUT_ARRAY' failed.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31496>
Using intel_needs_workaround() within a block of GFX version
checker requires extra carefulness on the road because both
of them specify a range of applicable platforms. The WA block
can be unexpectedly skipped once the GFX version checker gets
updated later.
Moving the WA implementation out of the GFX block to decouple
them for more clarity and less chance of messing up next time.
Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31496>