If there are holes between color outputs (e.g. a shader exports MRT1,
but not MRT0), we can remove the holes by moving higher MRTs lower. The
hardware will remap the MRTs to their correct locations if we remove
holes in SPI_SHADER_COL_FORMAT but not CB_SHADER_MASK. This is good for
performance because the hardware will allocate less space for color
MRTs.
This also allows to remove even more unused color exports because we no
longer need to force previous targets to be non-zero. Only SotTR seems
affected from our fossils db.
fossils-db (NAVI21):
Totals from 859 (0.64% of 134913) affected shaders:
VGPRs: 24328 -> 24216 (-0.46%)
CodeSize: 1433276 -> 1422576 (-0.75%)
Instrs: 255275 -> 253728 (-0.61%)
Latency: 1666836 -> 1661544 (-0.32%)
InvThroughput: 346038 -> 343406 (-0.76%)
Copies: 16520 -> 16506 (-0.08%)
PreSGPRs: 25934 -> 25920 (-0.05%)
PreVGPRs: 19903 -> 19662 (-1.21%)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5786>
Jumps in the command streams, allowing us to chain ("link") command
buffers. Naming is from PowerVR, which contains an identical command.
PowerVR's has conditional jumps and function call support, it's likely
that AGX inherited this too but I haven't tested that. (Those might be
useful for conditional rendering and secondary command buffers
respectively?)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Now that we have fine grained state emit code, let's use it to reduce
driver overhead. Dirty tracking is delicate: while this seems to work,
I've also added an ASAHI_MESA_DEBUG=dirty option in debug builds
to disable the optimizations here for future debug.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Looking at PowerVR's PPP definitions in tree in Mesa
(src/imagination/csbgen/), we find that AGX's "tagged" data structures
are actually sequences of state items prefixed by a header specifying
which state follows. Rather than hardcoding the sequences in which Apple's
driver chooses to bundle state, we need the XML to be flexible enough to
encode or decode any valid combination of state. That means reworking
the XML. While doing so, we find a number of fields that are identical
between RGX and AGX, and fix the names while at it (for example, the W
Clamp floating point).
Names are from the PowerVR code in Mesa where sensible.
Once we've reworked the XML, we need to rework the decoder. Instead of
reading tags and printing the combined state packets, the decoder now
must unpack the header and print the individual state items specified by
the header, with slightly more complicated bounds checking.
Finally, state emission in the driver becomes much more flexible. To
prove the flexibility actually works, we now emit all PPP state (except for
viewport and scissor state) as a single PPP update. This works. After
this we can move onto more interesting arrangements of state for lower
driver overhead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
We found a perf regression with 9027c5df4c ("anv: remove the
LOCAL_MEM allocation bit") which seems to be that we over subscribe
local memory, leading i915 to swap things in/out too much.
This change avoid putting buffers in local memory if they are not
allocated from a DEVICE_LOCAL heap.
Maybe we can revisit this later if i915 is better able to deal with
more buffers in local memory.
v2: Remove implicit_css from anv_bo when not in lmem (Ivan)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 9027c5df4c ("anv: remove the LOCAL_MEM allocation bit")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7188
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18395>
The image is busy until xcb_put_image returns. This isn't a major worry
at the moment since we're doing the PutImage directly from
vkQueuePresent, but if we moved that to a worker thread the race window
would be a lot easier to hit.
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18214>
It is just a renamed VK_ARM_rasterization_order_attachment_access.
Zink depends on it to expose KHR_blend_equation_advanced_coherent
Passes GL tests via Zink:
dEQP-GLES31.functional.blend_equation_advanced.*
KHR-GLES31.core.blend_equation_advanced.*
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18420>
There are more magic regs which have different values between GPU
subgenerations than we specified.
The updated list and values where obtained by using libwrapfake
with v631 blob and dEQP-VK.draw.renderpass.basic_draw.draw.triangle_list.1
vk cts test.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18229>
A VK_ERROR_FORMAT_NOT_SUPPORTED error was being returned when setting up the
image view tex state for images with the VK_IMAGE_USAGE_STORAGE_BIT and/or
VK_IMAGE_USAGE_INPUT_ATTACHMENT_BIT bit(s) set due to missing handling in
pvr_pack_tex_state(). Resolve this by handling these cases, while taking the
opportunity to simplify how the tex type is determined when packing
TEXSTATE_IMAGE_WORD0.
It was also found that the depth field in TEXSTATE_IMAGE_WORD1 was being set up
incorrectly, as it was relying on the image depth being 0 for 1D and 2D images,
but the image depth will always be 1 in these cases.
Partial fix for dEQP-VK.image.qualifiers.volatile.cube.r32f. This now goes from
failing to seg faulting when VK_FORMAT_R32_SFLOAT is added to the format table,
as the test is now getting further.
Signed-off-by: Frank Binns <frank.binns@imgtec.com>
Reviewed-by: Rajnesh Kanwal <rajnesh.kanwal@imgtec.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18320>
The VC4 and Lima Piglit failures seems to mostly fall in two camps:
1. The hardware lacks sRGB support, but the drivers decide to expose it
nevertheless, with some varying level of emulation. This leads to some
failures, probably because we're missing sRGB decoding somewhere.
2. The spec@ext_texture_compression_s3tc@compressedteximage fails,
mostly due to the test not setting the mipfilter to nearest. With
that fixed, the test passes on VC4, but still fails on Lima due to an
a bit dodgy miplod bias in the driver.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Acked-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18180>