Commit Graph

4867 Commits

Author SHA1 Message Date
Mark Collins 1ba6ccc51a tu: Unconditionally enable GMEM on A7XX
GMEM is at parity with A7XX sysmem in terms of functionality so it's
safe to enable it without any conditions now.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins de3dc30a29 tu: Add blit cache flushing for input attachments
Input attachments which read GMEM via the UCHE aperture need to
flush the blit cache on A7XX and wait for the writes to land, this
implements it as access flags and a pending flush with special
semantics.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 0cf27a7236 tu: Clear VSC_UNKNOWN_0D08 on A7XX
This register is set by the proprietary driver along with other VSC state
for binning, the stale value of this register set by the prop driver was
being used by Turnip resulting in crashes that were exclusive to Android
due to only running the prop driver alongside Turnip there.

The fix is to emit this new register alongside all other VSC state inside
the `update_vsc_pipe` function.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 71918f7cff tu: Fix CP_BLIT sync on A7XX
A7XX needs the CCU blit caches to be flushed before a CP_BLIT to
ensure any writes from a CP_EVENT_WRITE::BLIT have landed, without
this the source buffer may have an incomplete load/clear when the
2D blit starts resulting in what's written out being broken.

The corruption can be seen with GMEM passes using CP_BLIT especially
when forced using `TU_DEBUG=gmem,unaligned_store`.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 9a67f00398 tu: Set RB_CCU_CNTL during HW init on A7XX
On A7XX, A6XX_RB_CCU_CNTL was broken into two registers, A7XX_RB_CCU_CNTL which
has static properties that can be set once, this requires a WFI to take effect.
As a result, it's now set during `tu6_hw_init` rather than being set every time.

While the newly introduced register A7XX_RB_CCU_CNTL2 has properties that may
change per-RP and don't require a WFI to take effect, only CCU inval/flush
events are required. This is now the only register set in `emit_rb_ccu_cntl`.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 265eb463b5 tu: Disable LRZ properly on A7XX
LRZ wasn't entirely disabled due to the register `A7XX_GRAS_LRZ_DEPTH_BUFFER_INFO`
not being set to `0` in all circumstances, this register affects rendering even
when LRZ is disabled so needs to be set to `0` until LRZ is properly implemented.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 3188c1b5c7 tu: Use Z24_UNORM_S8_UINT_AS_R8G8B8A8 for A7XX GMEM D24S8 blits/clear
A7XX has corruption when 2D blits are performed on D24S8 images
from GMEM when the source format is FMT6_8_8_8_8_UNORM, this is
fixed by using FMT6_Z24_UNORM_S8_UINT_AS_R8G8B8A8.

Fixes VK-CTS: dEQP-VK.pipeline.monolithic.multisample.misc.*

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 1714e0c240 tu: Fix 2D blit path for GMEM stores on A7XX
These were broken due to the new window offset register not being
set for every tile, even with this the 2D blit path is broken for
MSAA D24S8 resolves but since outside of FDM that should be handled
by the event blit path it's not a major concern but should be fixed.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 9e699afa9b tu: Allow event blit to resolve depth stencil formats
This seemingly works on A7XX with no issues and the comment there
prior suggests that it should work on A6XX so this case is now
allowed to go through the event blit rather than the slow path.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins cc6399de31 tu: Update CCU layout selection logic for seperate stencil stores
The CCU layout logic needed to match the full `use_fast_path` case
in `tu_store_gmem_attachment`, not just unaligned but also for the
stencil storage logic.

The current code works since depth/stencil formats are forced to use the
slow path by `blit_can_resolve`. However, that will be removed since only
seperate stencil stores are unable to use the fast path while combined
stores can use it without any issues. This change prevents a regression
due to no longer choosing the sysmem CCU layout for seperate stencil
stores when fast-path resolves are allowed for DS formats.

Fixes VK-CTS cases (when fast-path stores for DS formats are enabled):
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d24_unorm_s8_uint.compatibility_depth_zero_stencil_zero_testing_stencil
dEQP-VK.renderpass2.depth_stencil_resolve.image_2d_32_32.samples_2.d24_unorm_s8_uint_separate_layouts.compatibility_depth_zero_stencil_zero_testing_stencil

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins b44474407d tu: Use full size color CCU in sysmem mode
Only a fraction of GMEM was being used by the color CCU even in
sysmem mode where it would go unused aside from the portion used by
the depth CCU. This can help with color CCU bottlenecks on both
A6XX and A7XX.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 40b3a38951 freedreno/devices: Update A7XX tile values
The tile align size was incorrect resulting in certain invalid bins
being selected that would cause rendering to entirely break down. In
addition, the maximum tile size has been further increased on A7XX.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 120779f009 tu: Set RB_UNKNOWN_88E4 for A7XX event blits
Event blits on A7XX are entirely broken without setting the first
bit of this register.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Danylo Piliaiev f0ae416fc1 tu/autotuner: Use CP_EVENT_WRITE7 for submission fence
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 4e6a1f8852 tu/autotune: Use CP_EVENT_WRITE7::ZPASS_DONE on A7XX
The `RB_SAMPLE_COUNT_ADDR` register no longer exists on A7XX and
the address is provided as a part of `CP_EVENT_WRITE7`.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 45b415a044 tu: Use CP_SET_PSEUDO_REG for A7XX VSC stream regs
VSC stream registers on A7XX are psuedo-registers rather than actual
registers and need to be set via `CP_SET_PSEUDO_REG`.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 2c78d104b0 tu: Only set PC/VFD PWR_CNTL regs on A6XX
These are no longer used on A7XX and should not be emitted.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 0b2df4ca26 tu: Set CP_THREAD_CONTROL::CONCURRENT_BIN_DISABLE in A7XX HW init
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 1af86d5a6a tu: Set A7XX registers in tu6_tile_render_begin
These are mostly copied from the sysmem registers with the values
based off prop GMEM traces.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Mark Collins 9759222282 tu: Allow GMEM on A7XX when TU_DEBUG=gmem
GMEM is entirely non-functional on A7XX, however, it is useful to be
able to test intermediary commits as support is added. This is still
put behind an explicit `TU_DEBUG` gmem flag to avoid regressions from
bisecting sysmem issues.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26461>
2024-02-28 22:49:58 +00:00
Danylo Piliaiev be46639974 freedreno/a7xx: Fix base_align for non-UBWC depth-stencil
A7XX appears to require alignment of 4096 for DS in both
UBWC and non-UBWC cases.

Fixes rendering with TU_DEBUG=noubwc

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27848>
2024-02-28 16:30:15 +00:00
Zan Dobersek 25a0eadcae tu: tu_device should clean up its global bo
The global buffer object is allocated and mapped during tu_device creation.
Correspondingly it should also be cleaned up during device destruction.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27814>
2024-02-28 15:51:00 +00:00
Danylo Piliaiev 55e99728e0 tu: Do not emit zero-sized fs params
The comparison change accidentally slipped in.

Fixes a crash in:
  dEQP-VK.subgroups.size_control.framebuffer.fragment_allow_varying_subgroup_size

Fixes: 76e417ca59
("turnip,ir3/a750: Implement consts loading via preamble")

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27829>
2024-02-28 12:00:33 +00:00
Faith Ekstrand 6ec177b116 vulkan: Rework vk_render_pass_state::attachments
The new bitfield has a separat flag for each of the color attachments.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024>
2024-02-27 22:17:09 +00:00
Faith Ekstrand c09c086c12 vulkan: Add a vk_render_pass_state_has_attachment_info() helper
We already have a helper like this internally.  Give it a better name
and expose it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27024>
2024-02-27 22:17:09 +00:00
Connor Abbott a80a23dc49 tu: Enable UBWC for storage images on a7xx
I'm not sure exactly when this was introduced. It doesn't work on a650,
but does work on a7xx, I'm not sure whether it works on the a660
generation.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27506>
2024-02-27 07:34:15 -05:00
Connor Abbott b9e04f8293 tu: Enable UBWC for SNORM formats on a740+
The fast-clear value is now the same for SNORM and UNORM, so our trick
of reinterpreting SNORM as UNORM when copying now works with UBWC. We
can also freely reinterpret UNORM, SNORM, and UINT formats, as tested by
dEQP-VK.image.mutable.*.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27506>
2024-02-27 07:33:59 -05:00
Connor Abbott 4529b2ea54 tu: Reenable MSAA UBWC on a6xx gen1
This passes a full CTS run now, probably due to other fixes in the
meantime.

Closes: #7438
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27506>
2024-02-27 07:17:29 -05:00
Connor Abbott db0291c235 tu: Follow pipeline compatibility rules for dynamic descriptors
When we bind a descriptor set with dynamic descriptors, we can't ignore
dynamic descriptors in previously-bound higher descriptor sets. For
example, assume we have descriptor sets A and B, each of which has one
dynamic storage buffer, and we do:

CmdBindDescriptorSets(firstSet=1, descriptorSetCount=1, A)
CmdBindDescriptorSets(firstSet=0, descriptorSetCount=1, B)

and in the first CmdBindDescriptorSets the pipeline layout includes a
descriptor set layout compatible with B in set 0. Then, following
"Pipeline Layout Compatibility," set 0 is disturbed:

   When binding a descriptor set to set number N, a previously bound
   descriptor set bound with lower index M than N is disturbed if the
   pipeline layouts for set M and N are not compatible for set M.
   Otherwise, the bound descriptor set in M is not disturbed

When it's disturbed, it's effectively turned into a set with 1 undefined
dynamic storage buffer:

   When a descriptor set is disturbed by binding descriptor sets, the
   disturbed set is considered to contain undefined descriptors bound
   with the same pipeline layout as the disturbing descriptor set.

This disturbed set is compatible with B, so in the second
CmdBindDescriptorSets this clause doesn't apply:

   If, additionally, the previously bound descriptor set for set N was
   bound using a pipeline layout not compatible for set N, then all
   bindings in sets numbered greater than N are disturbed.

and A remains valid to access. The code before 88db7364 worked only if
the pipeline layout when binding B contained a descriptor layout
compatible with A in set 1, because it used the pipeline layout's total
size when allocating the internal dynamic descriptors array, but that
isn't actually a requirement, so the previous code was already broken.
After 88db7364 we only allocate as much space as required by the current
descriptors being bound, because I misread the rules here, which made it
more broken and broke 3DMark Wildlife Extreme that does something like
this.

In order to properly fix this we need to keep track of the maximum ever
seen dynamic descriptor size, similar to what we already do for
descriptor sets, and use that. We have no idea what needs to be
preserved when binding a descriptor set with dynamic descriptors, so we
have to be conservative.

Fixes: 88db7364 ("tu: Rework dynamic offset handling")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27750>
2024-02-26 23:52:41 +00:00
Alyssa Rosenzweig 9da77e6c97 tu: use vk_index_to_restart
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27764>
2024-02-26 14:13:08 +00:00
Konrad Dybcio 1f508a5dac freedreno/registers: Add some HWCG regs
A702 sets even more of these.. Follow suit!

Signed-off-by: Konrad Dybcio <konrad.dybcio@linaro.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27700>
2024-02-24 01:15:04 +00:00
Rob Clark e7ee2c8ca5 tu: Give suballoc bo's a name
So they show up in gem debugfs with a more useful label.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27700>
2024-02-24 01:15:04 +00:00
Rob Clark bcc5ddcc3b freedreno/crashdec: Find potential fault buffers
Denote if a buffer we know about is covering the fault address (kernel
issue), or if the fault address is within the 2 * size range, indicating
that the buffer is potentially the one the GPU read past the end of.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27700>
2024-02-24 01:15:04 +00:00
Danylo Piliaiev ebde7d5e87 tu/a7xx: Write even more magic regs to fix rendering issues on Android
We have to write all the same regs blob is writing or we risk using
stale reg value written by blob.

I went through blob trace again and added all missing magic regs,
I hope for the last time.

This fixes screen corruption for Mobox users and in some cases
for different emulators users. The reg which caused the issue
is HLSQ_UNKNOWN_A9AC.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27721>
2024-02-23 14:19:11 +00:00
Oskar Viljasaar 89622f5089 tu: Use common physical device properties infrastructure
Use the same intializing trick as in 27d5543: first we initialize our
properties struct to { false }, then we fill the fields in one by one.
C++ does not allow assigning to an array from an initializer list, so
the properties exposed as an array in the struct are initialized either
one by one, or assigned in a chain.
As the properties are initialized at init time, move tu_get_properties
and tu_get_physical_device_properties_* before tu_physical_device_init,
so get_properties() would be callable by it.

This lets us delegate the physical device property entrypoints to
common runtime code.

Tested with drm-shim, doing a diff on vulkaninfo output. Differing
fields were pipelineCacheUUID, driverInfo and driverUUID, i.e. the
actual properties do not differ.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27723>
2024-02-23 13:03:03 +00:00
Danylo Piliaiev f4c40fc89c tu: Add workaround for D3D11 games accessing UBO out of bounds
Some D3D11 games rely on out-of-bounds indirect UBO loads to return
real values from underlying bound descriptor. This workaround would
prevent us from lowering indirectly accessed UBOs to consts.

Later DXVK would declare dynamically indexed uniforms with upper
size bound, to make the accesses spec compliant. But for now
we need our own workaround.

Known affected games:
- Dark Souls 3
- Sekiro: Shadows Die Twice
- Final Fantasy Type-0 HD
- Ultrakill
- Dishonored 2

DXVK discussions:
- https://github.com/doitsujin/dxvk/issues/405
- https://github.com/doitsujin/dxvk/issues/3861

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27727>
2024-02-23 12:08:53 +00:00
Danylo Piliaiev 5dd5d4c4b5 tu: Exclude more a7xx regs from stomping
Stomping these regs even for a short time leads to crashes.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev e4631bee61 freedreno/devices: Update magic regs for a7xx
These regs are written by blob, for some of them blob could
write non-zero values. So executing Turnip after blob without
writing these regs could lead to nasty GPU crashes.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev eb1e71e707 freedreno,tu: Move varying interp and varying repl modes to xml
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev 78c843230c tu/a750: Consider vertex attr buff in gmem allocation
A750 added a new optimization - placement of vertex attributes
into GMEM, so part of GMEM is carved out for it and needs to
be considered during GMEM allocations.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Mark Collins 5266815ca9 tu/a7xx: Update CCU layout logic for A7XX
A7XX introduces some changes into the CCU such as having different
amounts of memory per CCU for depth and color and dividing up CCU
control into two registers A7XX_RB_CCU_CNTL and A7XX_RB_CCU_CNTL2
where CNTL2 no longer requires a complete flush to be updated, we
currently don't take advantage of this as any CCU updates set both
registers but it's a potential optimization we can add in the future.

Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev 98d6d93a82 turnip,ir3/a750: Implement inline uniforms via ldg.k
Inline consts suffer the same issue as driver params, so they also
should be preloaded via preamble. There is special instruction to
load from global memory into consts.

Co-Authored-By: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Connor Abbott 6a744ddebc ir3: Initial support for pushing globals with ldg.k
Add a separate pass which uses the analyze_ubo_ranges machinery to
construct ranges of readonly globals accessed in the shader and push
them to constants in the preamble, using ldg.k if possible. This is
enough to handle inline uniforms in turnip but also provides a base for
OpenCL, although the pass would need further work for that.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Connor Abbott 513fa1873c ir3/a7xx: Fix load_global_ir3 with immediate offset
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Connor Abbott 45c71803f9 tu: Add more info to ldg inline uniform path
This will let us push the ldg into the preamble.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev b87b8fdf73 tu: Use SS6_INDIRECT for VS params
a750 has SS6_DIRECT path broken, we should either use UBO lowering
or SS6_INDIRECT path.

It is implemented as INDIRECT load even on a750+ because with UBO
lowering it would be tricky to get const offset for to use in multidraw,
also we would need to ensure the offset is not 0.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev 76e417ca59 turnip,ir3/a750: Implement consts loading via preamble
A750 expects driver params loaded through the preamble, old path
does work but has issues when the same LOAD_STATE is used between
several draw calls (it seems that LOAD_STATE is executed only for
the first draw call).

To solve this we now lower driver params to UBOs and let NIR deal with
them.

Notes:
- VS params are loaded via old path since blob do the same and there
  are no issues observed.
- FDM is not supported at the moment.
- For now driver params data is emitted via CP_NOP because it's tricky
  to allocate space for the data. (It is emitted when we are already in
  sub_cs)

Co-Authored-By: Connor Abbott <cwabbott0@gmail.com>
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev 7429ca3115 tu: Use SS6_INDIRECT consts upload path for 3d blits
3d blits used DIRECT consts upload path, which doesn't work
properly on a750+, however uploading them via SS6_INDIRECT
seem to be working.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev 30597970a5 tu/a7xx: Do not preload shaders, HW does it by default
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00
Danylo Piliaiev ac75edb8c4 tu/a7xx: Correctly set A7XX_HLSQ_UNKNOWN_A9AE.SYSVAL_REGS_COUNT
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26934>
2024-02-12 22:05:13 +00:00