vl_rbsp_fillbits may fill less than 32 bits if it removes emulation
prevention bytes, but will fill at least 16 bits. We need to call it
twice when reading more than 16 bits.
This fixes parsing H264 SPS packed header in va frontend when emulation
prevention bytes are at position where 32 bit values are read.
Cc: mesa-stable
Reviewed-by: Leo Liu <leo.liu@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26276>
On Steam Deck, shaders are pre-compiled for better performance (less
stuttering, less CPU usage, etc). But when a compiler fix needs to be
backported, there is currently no way to handle this properly.
This introduces 3 drirc options
radv_override_{graphics,compute,ray_tracing}_shader_version in order to
force the driver to re-compile pipelines when needed. By default, the
shader version is 0 for all pipelines.
When one drirc is set for a specific game, RADV will re-compile all
pipelines only once with the compiler fix included (because the
pipeline key would be different).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26094>
util_fast_urem32 is used in the hot path of hashmap lookups and this
asserts causes noticeable overhead. The correctness of this code should
be well exercised both from testing and mathematical proofs, so gate
this assertion behind #ifdef DEBUG.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14168>
An "augmented tree" is a tree with extra data attached which flows from
the leaves to the root. An "interval tree" is a datastructure of
(potentially-overlapping) intervals where, in addition to inserting and
removing intervals, we can quickly lookup all the intervals which
overlap a given interval.
After describing red-black trees, CLRS explains how it's possible to
implement an interval tree using an augmented red-black tree where the
nodes are ordered by interval start and each node also stores the
maximum interval end for its entire subtree.
Implement the interval tree extension described by CLRS. Iterating over
all overlapping intervals is actually an exercise, so we have to solve
the exercise. The recursive solution has been re-written to use the
parent pointers to avoid needing a stack, similarly to rb_tree_first()
and rb_node_next().
For now, we only implement unsigned intervals, but the core algorithms
are all abstracted to allow other types. There's still some boilerplate,
but it's the best that can be done in C.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22071>
Another case of a game assuming XeSS is available since an
Intel ARC GPU is discovered by the game's executable binary.
With this, a warning will appear that GPU is unstable/not supported,
but a warning is preferable over the game crashing.
No other issues observed upon starting & playing the game.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25965>
gcc is able to optimize away either the modulo or the logical and. This
makes no difference to gcc.
clang is only able to optimize away the logical and. This allows clang
to generate faster code for BITFIELD_MASK.
As for BITFIELD64_MASK, this also makes no difference to clang except it
fixes a compile error for BITFIELD64_MASK(64):
error: shift count >= width of type [-Werror,-Wshift-count-overflow]
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9989
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25757>
this patch calls the init and finish functions of the vk
runtime astc decoder. initializes emulate_astc flag. sets
up the additional plane to store decoded texture.
v2: fix _tex_dataformat() and _tex_numformat() (Chia-I Wu)
use correct function for bufferToImage (Chia-I Wu)
v3: add radv_is_layout_emulated() (Chia-I Wu)
avoid repeated pattern (Chia-I Wu)
v4: not create all pipelines on_demand (Chia-I Wu)
v5: current code does not support astc hdr (Chia-I Wu)
v6: keep luts in staging buffer only (Chia-I Wu)
v7: use 2DArray for both input and output
v8: document todo to use fp16 (Chia-I Wu)
not required to move meta init anymore (Chia-I Wu)
move astc_emulation_format to vk_texcompress_astc.h (Chia-I Wu)
v9: remove LAYOUT check (Chia-I Wu)
check on iview->vk.view_format
move setting tiled flags for astc (Chia-I Wu)
remove is format emulated check in radv_is_storage_image* (Chia-I Wu)
use LAYOUT_ASTC for if check (Chia-I Wu)
no 1D support (Chia-I Wu)
calculate start end offset in 2x blk size
v10: remove old wrong code (Chia-I Wu)
v11: use existing defined local format variable (Chia-I Wu)
dst image layout is always VK_IMAGE_LAYOUT_GENERAL (Chia-I Wu)
Reviewed-by: Chia-I Wu <olvaffe@gmail.com>
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24672>
We convert from doubles to half by going through float in between, but
as noted in the comment in this commit, that can give wrong results in
some cases.
Add some helpers to ensure correct results based on rounding mode that
will be used in the next commit.
v2: Use fi/di from u_math.h (Ian)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25281>
In the linear allocator, when a size larger than the minimum
buffer size is allocated, we currently create the new buffer
to fit exactly the requested size.
In that case, don't bother updating the `latest` pointer, since
this newly created buffer is already full. In the worst case,
the current `latest` is full and it would be the same; in the
best case, there's still room for small allocations, so we avoid
wasting space if the next allocations would fit.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25517>