If the varying descriptors will always be the same for a given shader
variant (certainly true if none of separable shaders, transform
feedback, or point sprites are used), we only need to link once. Now
that pan_pool supports both owned and unowned modes, we have the
flexibility to reuse the code path for both allocation strategies.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10954>
In general gallium shaders are all SSO and it's up to the driver to handle
lining up varying storage between stages at draw time. However, there's a
NIR option "unify_interfaces" that iris uses which applies to non-SSO (as
indicated by nir->info.separate_shader) shaders and makes the inputs_read
and outputs_written match up at GLSL-to-NIR link time, and then iris then
avoids any lowering passes that would add new varyings.
By introducing info gathering after variant creation (because all I knew
was "gallium is always SSO"), I broke the unify_interfaces link-time setup
on iris. Just skip that when the unify_interfaces flag is set, and add
some asserts to catch anyone trying to mix unify_interfaces with known
varying-adjusting lowering passes.
Closes: #4450
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10876>
Panfrost has two compilers, one for Midgard GPUs and one for Bifrost
GPUs. The respective compilers are src/panfrost/midgard and
src/panfrost/bifrost. Changes internal to just one compiler (or
disassembler) cannot affect the other hardware, so there's no need to
run extra jobs in these cases.
Also split out common vs Gallium panfrost so we can do the right thing
for panvk builds in the imminent future.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10924>
To avoid having to download the same traces again and again in each job,
use the caching proxy configured in the Collabora lab.
We can currently hardcode it like this because we don't test the same
driver in more than one lab, but when that changes we will need a more
flexible approach.
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10949>
With display on iGPU and render on dGPU, VRR is not working. To fix
this set RADEON_FLAG_GTT_WC flag when allocating memory for prime. This
allows kernel function amdgpu_display_user_framebuffer_create() to
allocate GTT memory with USWC flag making the buffer scanout for iGPU.
This helps the ddx amdgpu_present_check_flip() function to return
true. Now, xserver will flip the framebuffer instead of blit. Due
to this VRR feature will work where iGPU supports USWC flag.
v2: modify commit message with use case (Michel Dänzer)
v3: allow GTT_WC flag only if VRAM_DOMAIN and NO_CPU_ACCESS (Bas Nieuwenhuizen)
v4: add check for wsi_info is NULL
v5: use wsi_info pointer to check for prime alloc (Bas Nieuwenhuizen)
v6: set _GTT_WC flag when wsi_info pointer is not NULL (Bas Nieuwenhuizen)
Signed-off-by: Yogesh Mohanmarimuthu <yogesh.mohanmarimuthu@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10657>
On XeHP there are restrictions on types of source and destinations
with float types. As shuffle is implemented using MOV we need to make
sure we lower it to supported types.
This fixes tests like :
dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemax_vec4_vertex
dEQP-VK.subgroups.arithmetic.framebuffer.subgroupexclusivemul_f16vec3_vertex
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Suggested-by: Francisco Jerez <currojerez@riseup.net>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10902>
Fixes perf regression introduced from tileY LID order for CS
shaders that access both textures and buffers. Walks LIDs in
X-major fashion, but with blocks of height 4. This maps LIDs per
HW thread for SIMD8/16/32 as (2x4/4x4/8x4), which is always good
for tileY resources and usually good for linear resources.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10733>
Computer shaders that access tileY resources (textures) benefit
from Y-locality accesses. Easiest way to implement this is walk
local ids in Y-major fashion, instead of X-major fashion. Y-major
local ids will reduce partial writes and increase cache locality
for tileY accesses since tileY resources cachelines progress in
Y direction.
Improves performance on TGL:
Borderlands3.dxvk-g2 +1.5%
Y-major can introduce a performance drop on CS that use mixture
of buffers and images. This should be fixed in next commit.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10733>
I forgot that after rebasing on large_consts support that this is now
called after the first time nir_lower_wrmask is called and can generate
partial writemasks that need to be lowered. While we're here, also call
the main optimization loop if things are lowered to scratch because it
generates address arithmetic that may need to be cleaned up.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10922>
There's a bug in ATEST (and LD_TILE?) handling affecting v6, manifesting
as gpu sched timeouts in dmesg and ultimately as random dEQP flakes.
Until this can be sorted out and the G72 jobs are proven to be stable,
we can't have them in the critical merge path.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10923>
I've been meaning to make this change for a while, but I've been holding
out in case dEQP-GLES2 on T860/Bifrost catches a driver issue missed by
dEQP-GLES3 and also missed by T720's dEQP-GLES2 job. Given that hasn't
happened (indeed, dEQP-GLES3 is an effective super set of dEQP-GLES2),
let's save the cycles and skip the redundant jobs.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10923>