This allows Vulkan and GL to iterate over the full list of modifiers
instead of hard-coding in various places the "same" list as isl.
(Anvil's list has already diverged from isl's list. It omits Gen12
modifiers).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
If each format channel has the same base type (such unorm), then that
is the format's "uniform channel type".
Calculating the field at buildtime is probably better than looping over
all channels at runtime each time we wish to query it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Return the modifier's score, which indicates the driver's preference for the
modifier relative to others. A higher score is better. Zero means
unsupported.
Intended to assist selection of a modifier from an externally provided list,
such as VkImageDrmFormatModifierListCreateInfoEXT.
v2:
- Rename anv_drm_format_mod_score to isl_drm_modifier_get_score.
- Squash all incremental changes to anv_drm_format_mod_score.
v3:
- Drop redundant 'unlikely'. (for nchery)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v2)
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> (v3)
Thanks to Felix Degrood for discovering that we missed enabling this
additional caching on Tigerlake! Felix also benchmarked the changes.
We now use MOCS 48 (HDC:L1 + L3 + LLC) for render targets, textures,
and pull constant buffers. We leave storage buffers & images, as well
as stateless messages, using the previous MOCS 2 value. We can't use
HDC:L1 with atomics, and we don't know a priori whether storage buffers
will be used with atomics or not. Similarly, the Vulkan buffer device
address feature allows atomics to be performed on buffers via stateless
messages, and we only can control MOCS at the base address level, so
we can't do much there.
This is closer to what the Windows Vulkan and OpenGL drivers do,
though it isn't quite the same - they also disable LLC in some cases,
but we observed this to have noticable performance regressions when
we tried (though a couple titles benefited). We may try experiment
with that in the future.
Improves performance in a number of titles:
- Unreal Engine 4 Shooter Demo [VK]: 11.8%
- Witcher 3 [DXVK]: 3.9%
- Rise of the Tomb Raider [VK]: 1.5%
- Shadow of the Tomb Raider [VK]: 1.0%
- Grand Theft Auto V [DXVK]: 0.8%
We did not observe any performance regressions.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
On Gen12+, we can enable additional caches in certain usage situations.
This routes that decision making to a central place in ISL, based on
surface usage flags, and updates both drivers to use it. (i965 doesn't
need to change because it doesn't support Gen12.)
We continue handling the "external" decision via an anv_mocs() wrapper
for now, since we store that flag in anv_bo, which isl doesn't know
about. (We could introduce an ISL_SURF_USAGE_EXTERNAL, but I'm not
actually sure that would be cleaner.)
This patch should not have any functional nor performance effects, as
we continue selecting the exact same MOCS values for now.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7104>
Drop unnecessary check which allows enabling of lossless write through
compression (HiZ + CCS) for D16_UNORM format on Gen12+.
We had misleading HSD information previously which used to claim that
compression can not be supported for 16bpp format. Although BSpec does
not have any restriction for D16_UNORM format.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6485>
Previously, we were initializing the CCS to 0xFF for MCS+CCS due to a
misunderstanding of the following lines in the bspec:
The following are the general SW requirements for MCS buffer clear
functionality:
...
- If Software wants to enable Color Compression without Fast
clear, Software needs to initialize MCS with zeros.
- Lossless compression and CCS initialized to all F (using HW
Fast Clear or SW direct Clear) on the same surface is not
supported.
The first line does not refer to the CCS as the comment author supposed
but refers to the MCS as the comment says. It means that if you want to
use MCS compression without a fast-clear, you should initialize the MCS
to 0x00. This is because the value 0x00 in the MCS means "all data is
in plane 0" which is a perfectly valid non-fast-clear initialization.
It's also the value the MCS should be in if you do a RECTLIST slow-clear
where the primitive fully covers each pixel such that the same value is
written to all samples.
The second line in the above quote seems to imply that CCS fast-clear is
incompatible with MCS fast-clear. In particular, MCS+CCS fast-clear
uses a 0xff value in the MCS (like on Gen7-11) and leaves the CCS in
either the compressed or the pass-through state. Therefore, we should
initialize the CCS to 0x00 even for MCS+CCS surfaces.
Reviewed-by: Sagar Ghuge<sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4074>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4074>
EXPECT_DEATH works by forking the process and letting the forked process
fail with an assertion. This process is evidently incredibly expensive,
taking ~30 seconds to run the whole isl_aux_info_test on a 2.8GHz
Skylake. Annoyingly all of the (expected) assertion failures also leaves
lots of messages in dmesg and potentially generates lots of coredumps.
Instead, avoid the expense of fork/exec by redefining assert() and
unreachable() in the code we're testing to return a unit-test-only
value. With this patch, the test takes ~1ms.
Also, while modifying the EXPECT_EQ() calls, reverse the arguments so
that the expected value comes first, as is intended. Otherwise gtest
failure messages don't make much sense.
Fixes: https://gitlab.freedesktop.org/mesa/mesa/issues/2567
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4174>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4174>
In ISL, usage flags only carry intent and not semantic meaning. We
don't have a bulletproof way in ISL to specify that an image is of
depth/stencil type. The usage flags are great but blorp, for instance,
loves to disrespect them. One proposed solution to this problem is to
add explicit depth/stencil formats which are distinct from the
corresponding color formats.
Fortunately, however, empirical evidence suggests that this bit only
affects the sampler's interpretation of the CCS data. Therefore, we can
set the bit based off of the aux_usage which is now very specific and
does carry semantic meaning. In particular, aux_usage now makes a
distinction between color CCS and depth/stencil CCS which appears to be
exactly what the DepthStencilResource bit is for.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
Stencil CCS is slightly different from color CCS. Using a color CCS
resolve with stencil CCS doesn't do the right thing and you can't sample
from a stencil CCS image without the DepthStencilResource bit set or you
will get the wrong data. Stencil CCS also has it's own rules such as it
doesn't support fast-clear and has no partial resolve. This seems to
indicate that it should probably be its own isl_aux_usage. Now that
adding new isl_aux_usage values is pretty cheap, let's split stencil CCS
out on its own.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
We also delete the badly named isl_surf_supports_hiz_ccs_wt. The name
is misleading because it doesn't return whether or not the surface
supports HiZ+CCS in write-through mode (any single-sampled HiZ+CCS
capable surface does) but rather a heuristic decision about whether or
not we want to enable write-through mode based on the usage flags in the
isl_surf.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
This is distinct from ISL_AUX_USAGE_HIZ_CCS in that the HiZ surface
operates in write-through mode which means that the HiZ surface is only
used for depth-testing acceleration and the CCS-compressed main surface
is always valid so we can texture from it.
Separating full HiZ from write-through mode at the isl_aux_usage level
has a couple of advantages:
1. It's more explicit. Instead of write-through mode depending on the
heuristic decision in isl_surf_supports_hiz_ccs_wt, it's now
something that's explicitly requested by the driver. This should be
more robust than hoping isl_surf_supports_hiz_ccs_wt always returns
the same thing every time. If someone (say BLORP) ever drops a
usage flag on the isl_surf, there's a chance it could return a
different value without us noticing leading to corruptions.
2. Because ISL_AUX_USAGE_HIZ_CCS_WT is it's own isl_aux_usage flag, we
can say inside the driver that HIZ_CCS does not support sampling but
HIZ_CCS_WT does. We can also pass HIZ_CCS_WT to isl_surf_fill_state
and it can do some validation for us beyond what we would be able to
do if we conflate HIZ_CCS_WT and CCS_E.
3. In the future, we can add new heuristics to the driver which do
things such as start all depth surfaces (regardless of usage flags)
off in HIZ_CCS and then do a full resolve and drop to HIZ_CCS_WT the
first time it gets used by the sampler. This would potentially let
us enable the faster HIZ_CCS mode even in cases where it technically
comes in through the API as a texture.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
The first check is redundant because the first thing we do in the "emit
the aux surface" section is assert that we actually have an aux_surf.
The second check involves an exclusion list of things which don't have
aux surfaces on Gen12 but an inclusion list is much simpler because it's
just "does it have MCS?".
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4056>
GEN:BUG:14010455700 (lineage 1808121037):
"To avoid sporadic corruptions “Set 0x7010[9] when Depth Buffer
Surface Format is D16_UNORM , surface type is not NULL & 1X_MSAA"
Required for fixing ttps://gitlab.freedesktop.org/mesa/mesa/issues/2501.
GEN:BUG:1806527549:
"Set HIZ_CHICKEN (7018h) bit 13 = 1 when depth buffer is D16_UNORM."
This one could fix a GPU hang in some workloads.
v2: Implement WA in isl and add another similar WA (Jason).
v3: Add flushes before changing chicken registers (Jason)
v4: Depth flush and stall + end of pipe sync when changing registers
(Jason).
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3801>
Provide a generic interface which manages aux resolves in ISL. The
feature differences between this and what's in iris is:
* Support for media compression. ISL_AUX_USAGE_MC behaves differently
from many other usages of CCS, so it was useful to implement this
support upfront, while designing the interfaces.
* Optimizations for full-surface writes. For example, after a
full-surface write occurs with ISL_AUX_USAGE_CCS_E in the PARTIAL_CLEAR
state, isl_aux_state_transition_write() returns COMPRESSED_NO_CLEAR
instead of COMPRESSED_CLEAR.
A performance suggestion for main-surface-invalidating/replacing writes
is given as a comment instead of adding a boolean to
isl_aux_prepare_access(). This avoids extra validation and should be
simple enough for the caller to handle.
v2. Add assertions. (Jason)
v3. Use switches in 2 more functions. (Jason)
Store aux metadata in a static table. (Jason)
Change prepare and finish function signatures. (Jason)
Keep isl_aux_state_transition_* functions separate.
v4. (Jason)
Assert against resolving in AUX_INVALID.
Rename aux_info struct to aux_usage_info.
Drop the justification for each aux_usage_info field.
Split out the NONE case in write function.
Restructure tests to more easily confirm coverage.
Rename access_compressed field to compressed.
Make write behavior less ambiguous.
v5. (Jason)
Add more detail above WRITES_RESOLVE_AMBIGUATE.
Add ISL_AUX_USAGE_MC to WritesResolveAmbiguate.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2957>
Gen12 added CCS_E support for A8_UNORM. Intercept A8_UNORM format and
switch to R8_UNORM, as both share the same aux map format encoding so
they are compatible.
Fixes Piglit's ext_framebuffer_multisample-formats all_samples, which
was hitting an assert about A8_UNORM and R8_UINT not being CCS_E
compatible formats.
v2: Add gen check (Kenneth Graunke)
v3: Intercept A8_UNORM and set format to R8_UNORM (Jason Ekstrand)
v4:
- Remove gen check and move block little bit down (Jason Ekstrand)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3719>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3719>
This will get reused in the shader compiler once we switch it over to pipe
formats instead of GL enums. We can't easily deduplicate i965's
mesa-to-isl mapping because of cases like A32_FLOAT that are mapped
differently.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/3355>