This patch exposes shader hashes (computes and draws) to Perfetto and
utrace. By including these hashes in traces, developers can correlate
compute and draw calls with their assoicated ASM dumps when analyzing
the traces.
To achieve this, intel_tracepoint.py has been reworked to preprocess
tracepoint arguments dynamically. Any argument containing "hash" in its
variable name is now forrmated as hexadecimal before being passed to the
tracepoint definition.
Signed-off-by: Michael <michael.cheng@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32708>
Was causing trouble in some build configurations, we don't really need
them. Unless there's a good reason, defaults to use ralloc for
consistency with the larger codebase.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Antonio Ospite <None>
Reviewed-by: Kenneth Graunke <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32916>
This message has been confusing users, especially now that
popular toolkits such as Gtk started using a Vulkan renderer.
Printing a message on non-conformant implementations is also
actually not required. So let's remove it.
We haven't fully finished the GFX12 implementation yet, but on
all other hardware, RADV should work just fine, and is definitely
not meant for "testing use only".
Cc: mesa-stable
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12314
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32930>
DONTBLOCK is sort of almost good enough except that the api frontend
can also use this and it can't use the full power of Trust Me Buddy™
that qbo maps require
this causes unnecessary ioctl syncs, which annihilates perf in games
that constantly check query results
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31823>
this allows a return without checking syncobj, avoiding overhead,
but when a query still isn't completing after multiple checks then
try checking the pool directly
this circumvents the usual qbo mechanism in specific cases (e.g., Everspace)
where an app fires off a million timestamp queries and the overhead of
checking a timeline semaphore kills perf
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31823>
If there is a 4-byte hole between 2 loads, they are vectorized. Example:
load 4 + hole 4 + load 8 -> load 16
This helps GLSL uniform loads, which are often sparse. See the code for more
info.
RADV could get better code by vectorizing later.
radeonsi+ACO - TOTALS FROM AFFECTED SHADERS (45482/58355)
Spilled SGPRs: 841 -> 747 (-11.18 %)
Code Size: 67552396 -> 65291092 (-3.35 %) bytes
Max Waves: 714439 -> 714520 (0.01 %)
This should have no effect on LLVM because ac_build_buffer_load scalarizes
SMEM, but it's improved for some reason:
radeonsi+LLVM - TOTALS FROM AFFECTED SHADERS (4673/58355)
Spilled SGPRs: 1450 -> 1282 (-11.59 %)
Spilled VGPRs: 106 -> 107 (0.94 %)
Scratch size: 101 -> 102 (0.99 %) dwords per thread
Code Size: 14994624 -> 14956316 (-0.26 %) bytes
Max Waves: 66679 -> 66735 (0.08 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29399>
There is nothing preventing ACO from generating loads with unused
components. This happens often with GLSL uniforms. Some of those loads
are partially re-vectorized after this.
radeonsi+ACO:
TOTALS FROM AFFECTED SHADERS (19564/58918)
VGPRs: 732900 -> 728448 (-0.61 %)
Spilled SGPRs: 429 -> 433 (0.93 %)
Code Size: 38446004 -> 38485612 (0.10 %) bytes
Max Waves: 305440 -> 305549 (0.04 %)
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29399>
Walk the whole vertex stride thanks to XFB info sorted by offset, gather
individual components from same or different outputs, and once we have
gathered 4, store them as vec4.
It also removes the memory_modes field from VMEM stores because I don't
think it's needed.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>
Walk the whole vertex stride thanks to XFB info sorted by offset, gather
individual components from same or different outputs, and once we have
gathered 4, store them as vec4.
It also removes the COHERENT flag from VMEM stores because NGG streamout
doesn't use it either and I don't think it's needed.
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32686>
The mesa crate should just provide the means of creating those, but the
logic of what to create shouldn't be there.
The passed in arguments also heavily vary, and this way we can be explicit
about what variants needs what inputs.
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32903>
This allows us to use more contextual information of the image to create
the pipe_sampler_view properly instead of passing all the required
properties via function arguments.
Reviewed-by: @LingMan
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32903>