Don't rely on the current default (which is 2048 bytes) buffer size for
blocks -- which ends up being too small for most shaders. Since we
already rely on value_id_bound to allocate an array of vtn_value, use
that to estimate a better value.
In addition to space for the array, we approximate the extra size of
extra data structures with the size of vtn_ssa_value, and skip it to the
next size (double it) to cover the CFG related allocations. This
results in only single system allocation necessary to back the temporary
data for the majority of the shaders.
Parsing code was slightly reordered so we can validate and read the
value_id_bound before the temporary allocator is created.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25279>
All the vtn_* structures and arrays are used only during the lifetime of
spirv_to_nir(); we don't need to free them individually nor steal
them out; and some of them are smaller than the 5-pointer header
required for ralloc allocations.
These properties make them a good candidate for using an
arena-style allocation.
Change the code to create a linear_parent and use that for all the vtn_*
allocation. Note that NIR data structures still go through ralloc,
since we steal them (through the nir_shader) at the end, i.e. they
outlive the parsing.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25279>
Fixes the build on OpenBSD, where major() is in sys/types and
sys/sysmacros.h does not exist. Also include sys/mkdev.h if
MAJOR_IN_MKDEV is defined.
Fixes: 6d60115be7 ("zink: Fix enumerate devices when running compositor")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26735>
Yet another iteration on the same YUV surfaces.
The change from 87ecfdfbf0 has 2 odd things:
* it's using MAX2(original value, new value) but the point of updating
surf_slice_size / surf_size is to make it correct relative to the new
value of surf_pitch
* it's multiplying surf_pitch (= number of elements per row) by height (ok)
by surf->bpe (= number of bytes per element) by surf->blk_w (= number of
"horizontal" pixels in an element) so the end unit doesn't make sense.
Fix this by computing a reasonnable value based on unit: the surf_slice_size
is the number of elements per row (surf_pitch) x number of bytes per element
(bpe) x number of rows.
This makes the expected size correct and thus fixes users of eglCreateImageKHR,
like the issue #6131.
I tested a bunch of gst pipelines and ffmpeg scripts on various files I have
and didn't notice any issues (on gfx10.3 and gfx9).
Fixes: 87ecfdfbf0 ("ac/surface: adapt surf_size when modifying surf_pitch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6131
Acked-by: Marek Olšák <marek.olsak@amd.com>
Tested-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26693>
While running tests, it is be useful to have non-sequenced dumps of
certain buffers to see their contents from changes in the decompiled
CS. This introduces a function gpu_read_into_file(...) for specifying
a file to read a specific GPU buffer into after replaying the CS.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
In newer traces, in any cases where instructions need to be executed
for both cases of a predicate, such as for GMEM/sysmem. The proprietary
driver emits the TRUE and FALSE body one after another with a NOP at
the end of the TRUE condition body so the CP skips over the FALSE body.
Currently, the NOP skips over all instructions in the ELSE body which
results in them not being decoded whatsoever. This commit checks if
we encounter any NOPs while in a conditional block and appropriately
parses out them out into their own ELSE scope when we do.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
Due to the larger amount of conditional execution in newer traces
the flat view makes it hard to parse what might be conditionally
executed and what might now. This makes it easier to view by adding
a scope for conditionally executed commands which is indented to the
next level.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26465>
When the source destination index is a constant, we can avoid generating
a lot of the intermediate code. At the very least, this makes initial
NIR dumps much easier to read.
v2: Simplify tracking of dst_index. Suggested by Caio.
Suggested-by: Caio
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
Gfx12.5 (DG2) will use DPAS instructions to accelerate the
implementation. Earlier platforms will use equivalent discrete
instructions (basically subgroup operations). Gfx12 (Tigerlake) will use
DP4A for 8-bit integer matrix multiplication. Older platforms, which
lack DP4A, will use a suboptimal instruction sequence. There is plenty
of room for improvement here.
On DG2 (Gfx12.5) gets the following results from the CTS:
Test run totals:
Passed: 1642/13982 (11.7%)
Failed: 0/13982 (0.0%)
Not supported: 12340/13982 (88.3%)
Warnings: 0/13982 (0.0%)
Waived: 0/13982 (0.0%)
On DG2 (Gfx12.5) with forced lowering, Raptor Lake (Gfx12) and Ice Lake
(Gfx11):
Test run totals:
Passed: 1662/13982 (11.9%)
Failed: 0/13982 (0.0%)
Not supported: 12320/13982 (88.1%)
Warnings: 0/13982 (0.0%)
Waived: 0/13982 (0.0%)
The difference in the number of tests run is due to
saturatingAccumulation not being set on DG2 when DPAS is used. There is
a comment in "intel/dev: Advertise integer configs with
saturatingAccumulation too" that explains how this could be added should
the need arise.
v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
VUID-RuntimeSpirv-saturatingAccumulation-08983 says:
For OpCooperativeMatrixMulAddKHR, the SaturatingAccumulation
cooperative matrix operand must be present if and only if
VkCooperativeMatrixPropertiesKHR::saturatingAccumulation is VK_TRUE.
As a result, we have to advertise integer configs both with and without
this flag set.
v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
The commit is a little ugly. The definition of anv_fixup_subgroup_size
is moved before the added call site. In addition, the bit starting at
the "Cooperative matrix extension requires..." comment is added.
v2: Dramatic simplification of SIMD selection. Suggested by Caio.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>
This flag tracks whether or not cooperative matrices are fully enabled
on the physica device (i.e., both the configs exist and the environment
varible is set). This is mainly to support a later commit "anv: Set
PIPELINE_SELECT systolic mode enable flag."
This could be squashed into "anv: Implement VK_KHR_cooperative_matrix."
I left it separate because we might go back to the previous method.
v3: Don't hide the extension behind an environment variable
(ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting
PIPELINE_SELECT.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>