The r0.5 thread payload register contains Surface State Offset bits
[27:6] as bits [31:10], so we need to shift the register right by 4 in
order to get the surface state offset expected in ExBSO mode, which is
the only extended descriptor encoding supported by the UGM shared
function for SS addressing on Xe2+.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
This field encodes bits [27:6] of the scratch surface state offset
according to the hardware spec, already on XeHP platforms. However,
on previous platforms we were passing bits [25:4] instead, which was
apparently okay for two reasons:
1/ We never used more than 8 MB of scratch surface states apparently.
2/ A shift right by 2 was implicitly happening while copying the
value of r0.5 into the address register holding the extended
descriptor, which with the ExBSO addressing mode disabled
considered bits [31:12] as the surface state index within the
pool.
However on Xe2 ExBSO addressing mode is always enabled for the UGM
shared function, so we have to add an extra SHR instruction to format
the extended descriptor regardless, and there is no point in
disobeying the hardware spec passing a left-shifted offset.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
This field encodes bits [27:6] of the scratch surface state offset
according to the hardware spec, already on XeHP platforms. However,
on previous platforms we were passing bits [25:4] instead, which was
apparently okay for two reasons:
1/ We never used more than 8 MB of scratch surface states apparently.
2/ A shift right by 2 was implicitly happening while copying the
value of r0.5 into the address register holding the extended
descriptor, which with the ExBSO addressing mode disabled
considered bits [31:12] as the surface state index within the
pool.
However on Xe2 ExBSO addressing mode is always enabled for the UGM
shared function, so we have to add an extra SHR instruction to format
the extended descriptor regardless, and there is no point in
disobeying the hardware spec passing a left-shifted offset.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29543>
Xe2 replaces auxiliary surface mapping by software to compress buffers
with reserving part of the memory for the compression purpose.
To enable compression in Xe2 it is necessary to bind memory with one of
the PAT indexes that has compression enabled.
We're introducing 2 new iris_heaps to allocate compressed BO's out of
on Xe2, one for integrated and another for discrete platforms.
With these new iris_heaps we gain cache and sub-allocation for free.
If the compression requirements are met
iris_resource_image_is_pat_compressible() returns true so
BO_ALLOC_COMPRESSED is set and the the BO is allocated out of
the correct heap.
At this moment iris_resource_image_is_pat_compressible()
defaults to returning false as more work needs to be done but
the foundation for the compressed allocation is here.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28833>
Compressed memory types are not CPU visible and Vulkan specification
don't have any requirement about that but some applications like
vkcube fails to run without a host visible option, so here appending
default_buffer_mem_types and compressed_mem_types.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28833>
Xe2 replaces auxiliary surface mapping by software to compress buffers,
instead it reserves part of the memory for the compression purpose.
To enable compression in Xe2 it is necessary bind memory with one of
the PAT indexes that has compression enabled.
It is still always returning false in anv_image_is_pat_compressible()
as it still needs more work before compression can be enabled but the
foundation for the compressed allocation is here.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28833>
We'll reduce pitch alignment in a following patch. However, CCS
fast-clears don't seem to work unless the pitch is 512B aligned.
Disable fast clears for unaligned pitches.
Prevents the next patch from failing the following piglit tests:
* fbo-attachments-blit-scaled-linear
* hiz-stencil-test-fbo-d24s8
* hiz
* polygon-mode-facing
* clearbuffer-mixed-format
* glsl-lod-bias (transient failure)
No failures have been observed in anv, but there are more restrictions
for fast-clears in that driver compared to iris.
Note:
* The -fbo flag is necessary to make these fail. Otherwise, they end up
with aligned render targets.
* Each of these tests allocate an image that has a pitch greater than
512B and they collectively cover all the misalignment options - 128B,
256B and 384B.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
Intel modifiers supporting compression are specified to be compatible
with the display engine, even if they won't actually be used for
scanout.
Attempting to capture a wider scope of modifiers resulted in test
errors. I chose to narrow the scope instead of digging into them.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
Avoid using an isl_surf for the compression control surface on gfx12.
Instead, store the offset of this surface in the iris_resource struct.
The size of the surface is no longer stored, but it can be computed
on-demand with the aux-map helper functions.
This change enables us to get rid of the GFX12 CCS abstractions in ISL
that required all main surfaces to have a 512B-aligned pitch. This
requirement is seemingly only for display surfaces.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29659>
As each anv_device has its own address space it was necessary create
one dummy_aux_bo per anv_device.
Also this workaround requires us to disable the
buffer_length_in_aux_addr optimization, that is done in the physical
device creating because isl_dev of physical device is copied
to isl_dev in anv_device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29619>
This workaround ask us to set a dummy aux address to all
SURFTYPE_BUFFERs with AuxiliarySurfaceMode == AUX_NONE.
It also says that the same dummy aux address can be reused acrsoss all
buffers.
So here adding dummy_aux_address to isl_device, ANV and Iris will
set a value to when running a in a GPU affected.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29619>
This job actually takes just under 5 minutes[*] without any fraction, so
there is no need for this, we can have full coverage and stay below the
10min-per-job limit.
[*] I've seen up to 10min when the CI is busy, so let's put the timeout
at 3x the normal run time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29809>
I just saw more of these in other glsl versions, and it's likely that it
doesn't matter which version is active from our point of view, so let's
just put all of them in the same "flaky" bag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29813>
Now that I got a hands on access to this GPU and could run deqp-vk, it uses
blob v676.0 and the values are different from v744.19. Not only they
are different, with the values from v744 there are CTS test faulures.
Fixes at least ASTC tests.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29786>
We need some manual logic to work out the size of pData, so we handroll this
one. This fixes push DUT with emulated secondaries.
Affects dEQP-VK.binding_model.shader_access.secondary_cmd_buf.*push*templ* if
emulated secondaries are used.
Neither panvk nor dozen support push DUT yet, so this isn't hurting anyone and
doesn't need to be cc'd stable. But hopefully panvk & dozen get on that :}
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28682>