There were two issues:
1. The global_work_offset parameter is optional but we errored on NULL
2. We didn't return the reqd_work_group_size when set on the kernel.
Fixes: 376d1e6667 ("rusticl: implement cl_khr_suggested_local_work_size")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38375>
There is no disable_screen_content_tools in AV1 spec, instead this
should be seq_choose_screen_content_tools. But we don't need that either
as we keep the effective value in force_screen_content_tools.
Same for seq_choose_integer_mv and force_integer_mv.
Also stop overriding these values and instead fix frame header coding
to work with all combinations.
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38260>
Before this commit, nested loops aren't counted correctly:
-------------
V |
-> A --> B --> C ->
^ |
-------
A is both predecessor and successor of B but A isn't in B's loop.
Instead a block B is in loop header H's block if H is the successor
of B and H dominates B.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38393>
Instead of inserting the spill instruction before the instruction that
caused the spill, instead insert it either right after the definition
or at the end of the block that contains the definition.
This helps reduce code size and also moves STOREs outside of loops on
average.
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38238>
_math_matrix_is_dirty() should only be used to decide if we need to
run _math_matrix_analyse(). We already decided that we had a new
texture matrix when we called _mesa_update_texture_matrices() so
we need to set _TexMatEnabled correctly otherwise we might
incorrectly return _NEW_FF_VERT_PROGRAM | _NEW_FF_FRAG_PROGRAM in
the following if-statement.
Fixes: ec978e002f ("mesa: only update fixed-func programs on texture matrix enablement changes")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/14286
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38473>
This adds some new call operations to handle various parts of the
reductions.
cmat_reduce: is the initial toplevel operation from SPIR-V
this is used after lowering for row/col operation on single hw
supported matrix sizes. The spir-v operation is lowered into
multiple of these on flex dimensions, but also can be lowered into
others.
cmat_reduce_finish:
after multiple reduction operations on a flexible dimension matrix,
there is often subsequent operations on the output matrices to
finish the operation.
cmat_reduce_2x2:
this takes 4 input matrices, and 1 dst to do a 2x2 reduction op.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
With coopmat2 a bunch of functions need a lot of lowering passes
to happen before they can be lowered, so mark them as to be lowered
later.
Drivers needing these should call the nir_remove_non_cmat_call_entrypoints
where they remove entrypoints now, and call the original nir_remove_non_entrypoints
after lowering coopmat2.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38389>
This extension seems to be supported on GC3000 (HALTI2) and later hardware.
While no explicit feature bit documents this capability, testing
confirms that the required vertex formats work correctly on these GPUs.
This patch adds the missing B10G10R10A2 vertex format variants
(UNORM, SNORM, USCALED, SSCALED), gates support behind the HALTI2
feature check, and updates features.txt to reflect the new capability.
All relevant piglit tests pass.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38446>
The VMA of VkDeviceMemory has to accomodate all the resources that can
be bound to it. For sparse images it's 64KiB alignment, for other
tiled images it's 4KiB. But we also have a workaround that requires a
64KiB alignment for Tile4 images.
The initial version of the slab allocator missed the 4KiB alignment.
This fix adds the workaround handling too.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: dabb012423 ("anv: Implement anv_slab_bo and enable memory pool")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38480>
We check fn_set_fbds_provoking_vertex_stride == 0 to determine whether a
previous function variant has already been allocated, so this value must
be initialized to zero before we start the loop. We could fix this by
explicitly initializing just that field, but I figure it's simpler and
safer to just zero-initialize the whole struct.
Signed-off-by: Olivia Lee <olivia.lee@collabora.com>
Fixes: 885805560f ("panvk/csf: fix case where vk_meta is used before PROVOKING_VERTEX_MODE_LAST")
Reviewed-by: Eric R. Smith <eric.smith@collabora.com>
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38458>
To be able to support multiple GPU architectures, we need to thread
carefully with HW defs. So let's limit the availability of the HW defs
to where it's needed. We do this by moving the HW def includes and
helpers to query them to end of the source-files.
In the long run, we probably want something a bit more formal to get
access to HW-dependent values based on the hw-info. But there's some
work in progress to change how that works, so let's kick the can down
the road a bit on that part.
Reviewed-by: Ashish Chauhan <ashish.chauhan@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38423>