The loop is supposed to execute exactly once, but the previous logic
inadvertently executes 0 or 1 times depending on whether dst is NULL (it never
is). Reexpress the loop to execute exactly once, eliminating the unnecessary
branch in this hot path. Noticed when reading the NIR of generated pack code.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25498>
This new entrypoint takes in a SPIR-V blob and generates a header containing
a static inline nir_builder-family function for each function in the SPIR-V
library. The generated function will look for the function in the shader and, if
not found, insert a new nir_function with the appropriate signature -- to be
linked with the library later. Then, it will call the function, with the
appropriate gymnastics to handle return values as necessary.
This makes it super convenient to wrap CL libraries for use in a NIR pass.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25498>
This has been confusing for some time, as from a xml file with the
suffix v33 (so suggesting just one version) we were generating the
headers for v33, v40, v41 and v71.
So now there is a header for the vc4 driver, and one header for the
v3d/v3dv (so v3d "platform") drivers.
FWIW, this means that now the name of the original xml and the header
files generated doesn't maintain a so similar pattern, but again the
equivalence were not there anyway.
Acked-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25851>
Fixes performance regression introduced by prior refactoring of
pipe control code that unnecessarily added CS_FLUSH to query start
and end. Issue was diagnosed by Ben L (thank you!)
Confirmed this restores performance on:
* Borderlands3 +2%
* Payday +3%
* Factorio +3%
* HogwartsLegacy +4%
* Ghostrunner +7%
Fixes: 6dc95685 (convert genX_query pipe controls to use pc helper)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25983>
A somewhat random collection of fossils:
N Min Max Median Avg Stddev
x 6 16.59 16.61 16.605 16.603333 0.0081649658
+ 6 15.99 16 16 15.998333 0.0040824829
Difference at 95.0% confidence
-0.605 +/- 0.00830327
-3.64385% +/- 0.0485573%
(Student's t, pooled s = 0.00645497)
I'm not sure if nir_opt_if and nir_opt_loop_unroll are actually idempotent
or not.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24197>
For example, in the loop:
while (more_late_algebraic) {
more_late_algebraic = false;
NIR_PASS(more_late_algebraic, nir, nir_opt_algebraic_late);
NIR_PASS(_, nir, nir_opt_constant_folding);
NIR_PASS(_, nir, nir_copy_prop);
NIR_PASS(_, nir, nir_opt_dce);
NIR_PASS(_, nir, nir_opt_cse);
}
if nir_opt_algebraic_late makes no progress, later passes might be
skippable depending on which ones made progress in the previous iteration.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24197>
When the pitch or slice pitch isn't properly aligned,
the SDMA HW is unable to copy between tiled images and buffers.
To work around this, we process the image chunk by chunk,
copying the data to a temporary buffer which uses supported
pitches, and then copy it to the intended destination.
The implementation assumes that at least one pixel row of the
image will fit into the temporary buffer, and will try to copy
as many rows at once as possible. Sadly, this still results in
a lot of packets being generated for large images.
A possibe future improvement is to copy the image slice by slice
when only the slice pitch is misaligned. However, that is out
of scope for this commit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
Some copy operations are poorly supported by the SDMA hardware,
meaning that the built-in packets don't support them, so we will
need to work around that by copying to and from a temporary BO.
The size of the temporary buffer was chosen so that it can fit
at least one full pixel row of the largest possible image.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
Previously, RADV only had a simple implementation of
image to buffer copies using the SDMA for the PRIME copy.
This commit replaces that with a full-featured implementation
that includes buffer to image and image to buffer copies and
removes the assumptions that the PRIME copy had, as well as
adds new helper functions which will be shared with other copy
functions in upcoming commits.
Unaligned buffer/image copies require a workaround, which
will be implemented by a future commit.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25831>
This register seems needed to enable compute shader shader invocations
on GFX7. On GFX8+ it's working fine without emitting this register but
I think it doesn't hurt.
This fixes dEQP-VK.query_pool.statistics_query.*_cq on GFX7.
Fixes: a9945216ba ("radv: fix COMPUTE_SHADER_INVOCATIONS query on compute queue")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25957>
The following sequence is valid (although weird) but many other drivers
(including RADV) were broken:
- bind pipeline with some static state
- set state command for that static state (to a bad value)
- bind the same pipeline again
- draw
Fixes new dEQP-VK.dynamic_state.*.double_static_bind.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25954>
Another case of a game assuming XeSS is available since an
Intel ARC GPU is discovered by the game's executable binary.
With this, a warning will appear that GPU is unstable/not supported,
but a warning is preferable over the game crashing.
No other issues observed upon starting & playing the game.
Signed-off-by: Casey Bowman <casey.g.bowman@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25965>
Removes the link-time dependency on tgsi_get_gl_varying_semantic from
Gallium auxiliary.
ps_prim_id_input linkage removed due to redundancy - the SPI SID is
calculated for VARYING_SLOT_PRIMITIVE_ID on both sides.
Signed-off-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25695>