We were not initializing the PS invocation count to zero before
computing the sum of the per-thread results.
This fixes an issue where querying the result of the query more
than once would cause the result to grow larger each time.
Signed-off-by: Brian Paul <brianp@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22281>
there are two cases to be handled here:
* normal
* software
the latter case requires env vars based on the frontend, and if a sw
device isn't found then init should fail
the former case should (in theory) just yolo the first device and assume
that's what the user wanted based on whatever env vars and layers are
in use
fixes#7508, #7132
maybe also affects #8152
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22184>
Note: The meaning of clockwise vs counter-clockwise changes after the
yz flip, therefore the determination of winding needs to be done before
the yz flip logic. Therefore the yz flip is moved to the GS and applied
as a lowering on top of the base GS.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22277>
For instance, with "piglit/bin/arb_shader_image_load_store-host-mem-barrier --quick -auto -fbo":
==18549==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x61200000a059 at pc 0x7f65d8937b80 bp 0x7fff6ed19a00 sp 0x7fff6ed199f8
READ of size 1 at 0x61200000a059 thread T0
#0 0x7f65d8937b7f in evergreen_set_shader_images ../src/gallium/drivers/r600/evergreen_state.c:4277
#1 0x7f65d6b471b8 in st_bind_images ../src/mesa/state_tracker/st_atom_image.c:172
#2 0x7f65d6b76b26 in st_validate_state ../src/mesa/state_tracker/st_util.h:129
#3 0x7f65d6b76b26 in prepare_draw ../src/mesa/state_tracker/st_draw.c:88
#4 0x7f65d6b77c8a in st_draw_gallium ../src/mesa/state_tracker/st_draw.c:141
#5 0x7f65d72698a2 in _mesa_draw_arrays ../src/mesa/main/draw.c:1202
Fixes: a6b3792843 ("r600: add core pieces of image support.")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22273>
When using multiple binaries, we don't know the required number of VGPRs beforehand,
which means we either have to over-allocate VGPRs or avoid shared VGPRs.
As bpermute is the only instructions needing shared VGPRs, we decide for the latter.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22267>
This was the last missing feature for GPL. The main problem is that
the on-disk shaders cache size will increase a lot because we don't
deduplicate shaders but there is on-going work to improve that.
We also can't use the shaders cache for libraries created with the
RETAIN_LINK_TIME_OPTIMIZATION flag and module identifiers because we
don't know the SPIR-V and thus can't retain NIR shaders for linking.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>
This is to generate a different key for a library created with
FRAGMENT_SHADER_BIT and no FS (ie. it would generate a noop FS) and
a library created with FRAGMENT_OUTPUT_INTERFACE with no CB attachments.
Otherwise, the same key would be generated and this would corrupt
the cache.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>
With GPL, a stage can be imported from a library which means that the
binary is NULL (it's freed right after compilation) but the shader is
non-NULL. To avoid crashing, rely on non-NULL binaries because this
implies that the shader is non-NULL as well.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22264>
compute is a special case because the zink_shader itself is created
in a thread, which means it cannot be accessed directly at bind time
since it may not have finished creating itself yet
to avoid prematurely waiting on an async fence, the one value needed
at bind time can instead be stored separately
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22266>
nir_shader objects are hefty, and they really add up when there's a lot
of them. there's also not much use in keeping them around, as any time
they'll be used, they're always cloned first, and deserializing isn't
likely to be any slower than a clone
cuts driver memory usage by ~40% for tomb raider
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22266>