This is the most serious bug we've had in a long time due to a fundamental
misunderstanding of the hardware (due to incomplete reverse-engineering). It
caught me off guard.
The texture descriptor has "mode" bits which configure two aspects of how the
address pointer is interpreted:
* whether it is indirected, pointing to a secondary page table for sparse
* whether it writes texture access counters (for Metal's idea of sparse).
...Neither of these is a "null texture" mode.
So why did I see Apple's blob using a non-normal mode for null textures, and why
did I copy those settings?
1. Because the hardware texture access counters provide a cheap way to detect
null texture accesses after the fact, which I think their GPU debug tools
use. I'm not sure why release builds of the driver do/did that, but whatever.
2. Because I assumed Cupertino knew best and I didn't bother looking too close.
We can't use them here (without doing extra memory allocations), since then
the GPU will increment access counters. And since our null texture address used
to just be a pointer in the command buffer, that mean the GPU will trash
whatever memory happened to be 0x400 bytes after the start of the null texture
descriptor. The symptom being random faults.
This bug was caught when trying to use the zero-page instead, which raised a
permission fault when the GPU tried to write counts. Then I remembered the
sparse mechanism and had a bit of a eureka moment. Immediately followed by an
"oh, f#$&" moment as I realized how many random bugs could potentially be root
caused to this.
The fix is two-fold:
1. Use normal layout instead.
2. Set the address to the zero-page (which is a fixed VA) and detect null
textures by checking the address, instead of the mode.
The latter is a good idea anyway, but both parts needs to be done atomically to
maintain bisectability.
Backport-to: 25.1
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34703>
The src[2] value 0x1100 is set based on observed behavior of the blob driver,
though its exact meaning remains to be documented.
Passes all dEQP-GLES3.functional.shaders.texture_functions.texelfetch.*
tests on GC7000.
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34685>
Reduce unnecessary function calls, collect from iterators instead of
adding to mutable collections, and remove unnecessary trait bounds.
v2: split change to associated device check into new commit
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34167>
Shuffle some integer types to reduce the necessity of casting.
Additionally, clean up some unnecessary use of `Vec` to avoid
allocations.
v2: revert changes to `PipeContext` to avoid hiding truncation errors,
move change to fallible conversions to separate patch
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34167>
Rename `is_alligned()` util to match std naming and move it alongside
similar utilities which copy upstream functionality and annotate those
utilities with their first stable version where applicable.
Replace use of `mesa_rust_util::offset_of!()` in one case where nested
field support is not needed.
v2: revert removal of `offset_of!()` for nested field support
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34167>
Much of the kernel code implicitly depends on a maximum work dimension
of 3, and in practice, gallium will never give a grid dimension of less
than 3, constraining the value to a constant.
v2: use a const for max dimension instead of a `min` call
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34167>
Currently mesa-clc bundles OpenCL headers from Clang only if the static
LLVM is used (which means Clang / LLVM are not present on the target
system). In some cases (e.g. when building in OpenEmbedded environemnt)
it is desirable to have shared LLVM library, but skip installing the
whole Clang runtime just to compile shaders. Add an option that forces
OpenCL headers to be bundled with the mesa-clc binary.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34551>
Note that this code is executed on the generic FDo gitlab runners, not
in our docker images. This change is merely to avoid the confusion that
lead to the code in the previous commit.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34684>
Note that if you grep, you'll find two instances of `wget`, but those are not
executed in our docker images, but on the generic FDo gitlab runners, so the
/etc/wgetrc file in our docker images cannot affect them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34684>
Invalid for now, but used by vkd3d-proton, where the use case is to convert
a result matrix to lower precision, followed by a store.
For 16bit accumulation matrices, GFX11 only uses 16bits per 32bit register.
RADV's coop matrix code pads the unused space with undefs and uses a vector
with twice as many elements as the matrix length. Extending that to 8bit by
leaving 24 bits unused is unnecessary as these matrices as there
is no hw unit that requires it. And in wave32, it would also result in
vectors larger than NIR's limit.
So tightly pack 8bit matrices without any undef padding.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34382>
Since headless overrides create_mem, it needs to override finish_create
too. Fixes a segfault in nvk that was caused by us mixing
wsi_create_null_image_mem with wsi_finish_create_blit_context, which
would then call CmdCopyImageToBuffer with image->blit.buffer == NULL
Fixes a cts failure on nvk in:
dEQP-VK.image.swapchain_mutable.headless.2d.r8g8b8a8_unorm_b8g8r8a8_unorm_clear_copy_format_list
and several others
Fixes: 579578f10a ("vulkan/wsi/drm: Break create_prime_image in pieces")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34646>