The alignment was considered only for offset, but its users
(at least ir3_nir_opt_preamble) expect the size itself to also
be aligned.
Fixes tests:
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_geom
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tessc
dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tesse
gmem-dEQP-VK.spirv_assembly.instruction.graphics.16bit_storage.struct_mixed_types.uniform_buffer_block_tesse
Fixes: 922ef8e720
("ir3: Make allocation of consts more generic and order independent")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33161>
Previously, we just used the next block after a loop that
has a back-edge. This assumes that loop-exit blocks can
only be removed when falling through to the next block,
when in fact it can also be a jump to somewhere else,
in future even to some block before the actual loop.
12 (0.02% of 79395) affected shaders.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
They might be needed as convergence point in order to
insert code (e.g. for loop alignment, wait states, etc.).
Totals from 1 (0.00% of 79395) affected shaders:
CodeSize: 12672 -> 12716 (+0.35%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32477>
TS can be valid and flushed at the same time when no compression is
used. This state is beneficial if we needed to flush TS to the base
surface (filling cleared tiles) for any reason, but still use TS
state to accelerate read requests into PE or TX caches.
The current seqno based tracking of the TS flush state has a major
drawback with the following sequence of events:
1. fast clear surface (TS is now valid)
2. flush TS (base surface tiles filled, TS still valid,
flush seqno == surface seqno)
3. render to surface (surface seqno increased)
4. flush resource
Step 4 will now execute a full TS flush as the flush and surface seqnos
are different after rendering and TS is still valid, wasting memory
bandwidth to fill already filled tiled that are still marked as clear
in the TS state. If the TS has been flushed already, step 4 should be
a no-op.
Switch from the seqno based tracking to tracking the flush state itself,
marking the TS state un-/flushed as needed. With this boolean tracking
of the flush state step4 above will correctly see that the TS has already
been flushed since the last fast clear and skip the tile fill blit.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32956>
If we determine that the amount of varyings will fit within the 8-bit
offset of LD_VAR_BUF[_IMM], instruct the compiler to use it for varyings
and skip setting up Attribute Descriptors.
This should save a bit of memory and overhead in reading varyings.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32969>
Introduce a varying load count pass to get the maximum amount of varying
loads from a fragment shader (prior to optimization passes), in order to
only allocate as many Attribute Descriptors as required. This will
generally lead to smaller buffers in SRT0 for fragment shaders.
As the amount of ADs is now dynamic based on the shader, we need to
lower varying loads early for fragment shaders in v9+, as the amount of
ADs will determine the offset for dummy_sampler, required during
nir_lower_descriptors.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32969>
The current implementation uses LD_VAR_BUF[_IMM] to look up varyings,
which limits the number of varying components to 64 due to an 8-bit
offset value.
As this does not align to maxVertexOutputComponents (128), this change
replaces the use of LD_VAR_BUF[_IMM] with LD_VAR[_IMM] + Attribute
Descriptors, which do not have this limitation.
As allocating Attribute Descriptors is potentially expensive, this can
be further optimized by falling back to LD_VAR_BUF[_IMM] in cases where
we can ensure we do not use more than 64 varying components.
This change currently does not change behavior for gallium/panfrost,
though that should be done as well.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32969>
The fields "Attribute stride" and "Packet stride" are in the wrong
order, and "Packet stride" should not be shr() modified.
This has probably not shown up as an issue before due to the use of
LD_VAR_BUF[_IMM] for varyings, which does not require us to create
Attribute Descriptors with type vertex_packet.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32969>
If line smooth, stipple, quad_prims and a collection of other things are needed, zink produces a geometry shader to create them.
This code is disabling that from happening when there is no geometry shader support.
This removes a constant barrage of validation issues when trying to draw basic triangles.
Some of this missing functionality will be added back in later.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33059>
This returns the underlying device pointer but as an opaque
uintptr_t.
This will be required because libdrm_amdgpu will return the
same device when called multiple times from the same process.
radeonsi relies on the pointer value to identify if the device
are the same and adjust the synchronisation logic based on that.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33081>
This reported support for Profile2 on all VCNs that support Profile0
and reported no supported formats if Profile2 was not supported.
Instead, we should not advertise the Profile2 at all if not supported.
Fixes: e359b3c525 ("radeonsi/vcn: support 12bit YUV420 AV1 decoding")
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33092>