Input attachments which read GMEM via the UCHE aperture need to
flush the blit cache on A7XX and wait for the writes to land, this
implements it as access flags and a pending flush with special
semantics.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
This register is set by the proprietary driver along with other VSC state
for binning, the stale value of this register set by the prop driver was
being used by Turnip resulting in crashes that were exclusive to Android
due to only running the prop driver alongside Turnip there.
The fix is to emit this new register alongside all other VSC state inside
the `update_vsc_pipe` function.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
A7XX needs the CCU blit caches to be flushed before a CP_BLIT to
ensure any writes from a CP_EVENT_WRITE::BLIT have landed, without
this the source buffer may have an incomplete load/clear when the
2D blit starts resulting in what's written out being broken.
The corruption can be seen with GMEM passes using CP_BLIT especially
when forced using `TU_DEBUG=gmem,unaligned_store`.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
On A7XX, A6XX_RB_CCU_CNTL was broken into two registers, A7XX_RB_CCU_CNTL which
has static properties that can be set once, this requires a WFI to take effect.
As a result, it's now set during `tu6_hw_init` rather than being set every time.
While the newly introduced register A7XX_RB_CCU_CNTL2 has properties that may
change per-RP and don't require a WFI to take effect, only CCU inval/flush
events are required. This is now the only register set in `emit_rb_ccu_cntl`.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
LRZ wasn't entirely disabled due to the register `A7XX_GRAS_LRZ_DEPTH_BUFFER_INFO`
not being set to `0` in all circumstances, this register affects rendering even
when LRZ is disabled so needs to be set to `0` until LRZ is properly implemented.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
A7XX has corruption when 2D blits are performed on D24S8 images
from GMEM when the source format is FMT6_8_8_8_8_UNORM, this is
fixed by using FMT6_Z24_UNORM_S8_UINT_AS_R8G8B8A8.
Fixes VK-CTS: dEQP-VK.pipeline.monolithic.multisample.misc.*
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
These were broken due to the new window offset register not being
set for every tile, even with this the 2D blit path is broken for
MSAA D24S8 resolves but since outside of FDM that should be handled
by the event blit path it's not a major concern but should be fixed.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
This seemingly works on A7XX with no issues and the comment there
prior suggests that it should work on A6XX so this case is now
allowed to go through the event blit rather than the slow path.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
The CCU layout logic needed to match the full `use_fast_path` case
in `tu_store_gmem_attachment`, not just unaligned but also for the
stencil storage logic.
The current code works since depth/stencil formats are forced to use the
slow path by `blit_can_resolve`. However, that will be removed since only
seperate stencil stores are unable to use the fast path while combined
stores can use it without any issues. This change prevents a regression
due to no longer choosing the sysmem CCU layout for seperate stencil
stores when fast-path resolves are allowed for DS formats.
Fixes VK-CTS cases (when fast-path stores for DS formats are enabled):
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d24_unorm_s8_uint.compatibility_depth_zero_stencil_zero_testing_stencil
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d24_unorm_s8_uint_separate_layouts.compatibility_depth_zero_stencil_zero_testing_stencil
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
Only a fraction of GMEM was being used by the color CCU even in
sysmem mode where it would go unused aside from the portion used by
the depth CCU. This can help with color CCU bottlenecks on both
A6XX and A7XX.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
The tile align size was incorrect resulting in certain invalid bins
being selected that would cause rendering to entirely break down. In
addition, the maximum tile size has been further increased on A7XX.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
GMEM is entirely non-functional on A7XX, however, it is useful to be
able to test intermediary commits as support is added. This is still
put behind an explicit `TU_DEBUG` gmem flag to avoid regressions from
bisecting sysmem issues.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
The fast-clear value is now the same for SNORM and UNORM, so our trick
of reinterpreting SNORM as UNORM when copying now works with UBWC. We
can also freely reinterpret UNORM, SNORM, and UINT formats, as tested by
dEQP-VK.image.mutable.*.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27506>
When we bind a descriptor set with dynamic descriptors, we can't ignore
dynamic descriptors in previously-bound higher descriptor sets. For
example, assume we have descriptor sets A and B, each of which has one
dynamic storage buffer, and we do:
CmdBindDescriptorSets(firstSet=1, descriptorSetCount=1, A)
CmdBindDescriptorSets(firstSet=0, descriptorSetCount=1, B)
and in the first CmdBindDescriptorSets the pipeline layout includes a
descriptor set layout compatible with B in set 0. Then, following
"Pipeline Layout Compatibility," set 0 is disturbed:
When binding a descriptor set to set number N, a previously bound
descriptor set bound with lower index M than N is disturbed if the
pipeline layouts for set M and N are not compatible for set M.
Otherwise, the bound descriptor set in M is not disturbed
When it's disturbed, it's effectively turned into a set with 1 undefined
dynamic storage buffer:
When a descriptor set is disturbed by binding descriptor sets, the
disturbed set is considered to contain undefined descriptors bound
with the same pipeline layout as the disturbing descriptor set.
This disturbed set is compatible with B, so in the second
CmdBindDescriptorSets this clause doesn't apply:
If, additionally, the previously bound descriptor set for set N was
bound using a pipeline layout not compatible for set N, then all
bindings in sets numbered greater than N are disturbed.
and A remains valid to access. The code before 88db7364 worked only if
the pipeline layout when binding B contained a descriptor layout
compatible with A in set 1, because it used the pipeline layout's total
size when allocating the internal dynamic descriptors array, but that
isn't actually a requirement, so the previous code was already broken.
After 88db7364 we only allocate as much space as required by the current
descriptors being bound, because I misread the rules here, which made it
more broken and broke 3DMark Wildlife Extreme that does something like
this.
In order to properly fix this we need to keep track of the maximum ever
seen dynamic descriptor size, similar to what we already do for
descriptor sets, and use that. We have no idea what needs to be
preserved when binding a descriptor set with dynamic descriptors, so we
have to be conservative.
Fixes: 88db7364 ("tu: Rework dynamic offset handling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27750>
We have to write all the same regs blob is writing or we risk using
stale reg value written by blob.
I went through blob trace again and added all missing magic regs,
I hope for the last time.
This fixes screen corruption for Mobox users and in some cases
for different emulators users. The reg which caused the issue
is HLSQ_UNKNOWN_A9AC.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27721>
Use the same intializing trick as in 27d5543: first we initialize our
properties struct to { false }, then we fill the fields in one by one.
C++ does not allow assigning to an array from an initializer list, so
the properties exposed as an array in the struct are initialized either
one by one, or assigned in a chain.
As the properties are initialized at init time, move tu_get_properties
and tu_get_physical_device_properties_* before tu_physical_device_init,
so get_properties() would be callable by it.
This lets us delegate the physical device property entrypoints to
common runtime code.
Tested with drm-shim, doing a diff on vulkaninfo output. Differing
fields were pipelineCacheUUID, driverInfo and driverUUID, i.e. the
actual properties do not differ.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27723>
Some D3D11 games rely on out-of-bounds indirect UBO loads to return
real values from underlying bound descriptor. This workaround would
prevent us from lowering indirectly accessed UBOs to consts.
Later DXVK would declare dynamically indexed uniforms with upper
size bound, to make the accesses spec compliant. But for now
we need our own workaround.
Known affected games:
- Dark Souls 3
- Sekiro: Shadows Die Twice
- Final Fantasy Type-0 HD
- Ultrakill
- Dishonored 2
DXVK discussions:
- https://github.com/doitsujin/dxvk/issues/405
- https://github.com/doitsujin/dxvk/issues/3861
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27727>
A7XX introduces some changes into the CCU such as having different
amounts of memory per CCU for depth and color and dividing up CCU
control into two registers A7XX_RB_CCU_CNTL and A7XX_RB_CCU_CNTL2
where CNTL2 no longer requires a complete flush to be updated, we
currently don't take advantage of this as any CCU updates set both
registers but it's a potential optimization we can add in the future.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
Add a separate pass which uses the analyze_ubo_ranges machinery to
construct ranges of readonly globals accessed in the shader and push
them to constants in the preamble, using ldg.k if possible. This is
enough to handle inline uniforms in turnip but also provides a base for
OpenCL, although the pass would need further work for that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
a750 has SS6_DIRECT path broken, we should either use UBO lowering
or SS6_INDIRECT path.
It is implemented as INDIRECT load even on a750+ because with UBO
lowering it would be tricky to get const offset for to use in multidraw,
also we would need to ensure the offset is not 0.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
A750 expects driver params loaded through the preamble, old path
does work but has issues when the same LOAD_STATE is used between
several draw calls (it seems that LOAD_STATE is executed only for
the first draw call).
To solve this we now lower driver params to UBOs and let NIR deal with
them.
Notes:
- VS params are loaded via old path since blob do the same and there
are no issues observed.
- FDM is not supported at the moment.
- For now driver params data is emitted via CP_NOP because it's tricky
to allocate space for the data. (It is emitted when we are already in
sub_cs)
Co-Authored-By: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>