Commit Graph

65612 Commits

Author SHA1 Message Date
Ruijing Dong 8cd53d95fe radesonsi/vcn: update vcn4 tile processing logic
Vcn4 tile number calculation doesn't consider
some input case, which could result in output
bitstream corruption, this fixed the issue.

Not using the number_of_tiles directly but from
calculation. Also change to re-use some macros
from local vcn_5_0.c to ac_vcn_enc.h header file.

Updated vcn4 spec_misc_av1 ip package.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29556>
2024-06-10 13:12:20 +00:00
Ruijing Dong 53f6cf29e9 radeonsi/vcn: remove tile_config_flag
The tile config structure will be used in vcn4 as well.

Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29556>
2024-06-10 13:12:20 +00:00
David Rosca 0d21aa4a08 frontends/va: Fix crash in vaRenderPicture when decoder is NULL
Fixes: d1b794685f ("frontends/va: Send all bitstream buffers to driver at once")
Reviewed-by: Sil Vilerino <sivileri@microsoft.com>
Reviewed-by: Boyuan Zhang <Boyuan.Zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29599>
2024-06-10 12:58:18 +00:00
Eric Engestrom bbe9cf47bf nvk+zink/ci: consider all the double tests in spec@glsl-4.00@execution@built-in-functions to be flaky
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29636>
2024-06-10 01:10:46 +02:00
Eric Engestrom c2dc60751b nvk+zink/ci: add flakes seen in nightly pipeline
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29635>
2024-06-09 22:51:37 +00:00
Eric Engestrom 9f4c0d2a71 nvk+zink/ci: mark KHR-GL46.sparse_texture2_tests.SparseTexture2* as fixed
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29635>
2024-06-09 22:51:37 +00:00
Eric Engestrom 04c939113f turnip+zink/ci: mark a dEQP-GLES(2|3).functional.rasterization.(fbo|primitives).line_(strip_|)wide as fixed
Fixes: 07fa635f11 ("gallium/u_blitter: add option to override fragment shader for util_blitter_blit")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29633>
2024-06-09 21:04:18 +00:00
Pavel Ondračka ae06e018fa r300: fix RC_OMOD_DIV_2 modifier
Backend only recognizes the MUL a*0.5 pattern if there is an immediate
as one of the sources, however by then the source would we already
coverted to RC_FILE_NONE with constant half swizzles. So teach
peephole_omod to recognize this pattern as well.

RV530:
total instructions in shared programs: 128860 -> 128750 (-0.09%)
instructions in affected programs: 11942 -> 11832 (-0.92%)
helped: 106
HURT: 17
total presub in shared programs: 8739 -> 8736 (-0.03%)
presub in affected programs: 32 -> 29 (-9.38%)
helped: 3
HURT: 0
total omod in shared programs: 427 -> 1212 (183.84%)
omod in affected programs: 38 -> 823 (2065.79%)
helped: 0
HURT: 160
total temps in shared programs: 17544 -> 17554 (0.06%)
temps in affected programs: 70 -> 80 (14.29%)
helped: 0
HURT: 10
total lits in shared programs: 3153 -> 3159 (0.19%)
lits in affected programs: 9 -> 15 (66.67%)
helped: 0
HURT: 6
total cycles in shared programs: 191334 -> 191253 (-0.04%)
cycles in affected programs: 21240 -> 21159 (-0.38%)
helped: 101
HURT: 27

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
2024-06-09 09:07:32 +02:00
Pavel Ondračka d94d2a05b2 r300: fix for ouput modifier and DDX/DDX
Empirical testing shows that output modifiers are not working if the
DDX/DDY is writing directly to output.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
2024-06-09 09:07:32 +02:00
Pavel Ondračka 472c64c90e r300: fix writemask rewrite when converting to omod
Consider the following case:
  0: MUL temp[1].y, input[0]._x__, input[1]._y__;
  1: MOV temp[1].x, input[0].x___;
  2: MOV temp[1].z, const[0].__x_;
  3: MUL temp[2].xyz, const[1].xxx_, temp[1].yxz_;
  ...

We correctly recognize that we can convert mul into omod for all three
instructions, however the mul swizzle was not handled correctly:
  0: MUL temp[2].y / 2, input[0]._x__, input[1]._y__;
  1: MOV temp[2].x / 2, input[0].x___;
  2: MOV temp[2].z / 2, const[0].__x_;
  ...

Just create the conversion swizzle from the initial mul swizzle when rewriting
the original instruction writemasks.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
2024-06-09 09:07:31 +02:00
Pavel Ondračka 32cc2c2812 r300: fix cycles counting for KIL
We add a cycles penalty when we see a begin tex and than subtract from
it based on when first alu comes that needs the results. However if the
only instruction in the TEX block is just KIL, we don't have to add any
penalty as nothing waits for it.

Signed-off-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28784>
2024-06-09 09:07:31 +02:00
Pavel Ondračka fcc97bd6c3 r300/ci: fails list update
shaders@glsl-bug-110796 skips since https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/921

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29626>
2024-06-09 08:04:59 +02:00
Karol Herbst 5d013da038 rusticl/memory: copies might overlap for host ptrs
We can't really gurantee there is no overlap, because applications might
pass in arbitrary host pointers.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29604>
2024-06-08 17:03:31 +00:00
Karol Herbst e522c91d5c rusticl/spirv: do not pass a NULL pointer to slice::from_raw_parts
Fixes: e8de580998 ("rusticl/kernel: basic implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29604>
2024-06-08 17:03:31 +00:00
Eric Engestrom cb30b266ca ci/deqp: uprev gl & gles cts
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29602>
2024-06-08 08:19:47 +00:00
Eric Engestrom c02329ded1 ci: set a common B2C_JOB_SUCCESS_REGEX with the message that's printed for all jobs
Simpler code, and more reliable against serial corruption because that
message is printed 4 times (vs only once for the other ones).

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29608>
2024-06-08 07:16:27 +00:00
Marek Olšák dc113c418d ac/nir: import the dispatch logic for the universal compute clear/blit shader
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 6b15e45908 ac/nir: import the universal compute clear/blit shader
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 1becc6953c ac/nir: import the MSAA resolving pixel shader from radeonsi
It has a lot of options for efficiency.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák f96bbb64d6 radeonsi: add decision code to select when to use compute blit for performance
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 3424e16ece radeonsi: add decision code to select when to use CB_RESOLVE for performance
The answer is "almost never".

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák c5641387f3 radeonsi: add a new blit microbenchmark
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 0c545e2fca radeonsi: add fail_if_slow parameter into si_msaa_resolve_blit_via_CB
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 77d81fb8b0 radeonsi: add a custom MSAA resolving pixel shader
This is faster for 8 samples because it forms a VMEM clause, unlike
the default shader.

It also uses 16-bit types in the shader when possible and averages fewer
components if the format has less than 4.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 21e90d9c6e radeonsi: clear color buffers via compute for special tiling cases
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 2a0b9839ca radeonsi: add use_aco into CS blit shader key
it will be set in a future commit

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák fe7a4ed708 radeonsi: use shader_info::use_aco_amd to determine whether to use ACO
It's set by si_nir_scan_shader, so we need to use it after that.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák c83225cd0a radeonsi: print the compute shader blit key for AMD_DEBUG
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák d62ad0da5f radeonsi: use MIMG A16 (16-bit image coordinates) in compute blits
This reduces VGPR usage for MSAA blits and blitting multiple pixels per
lane.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák d6c96024a8 radeonsi: extend NIR compute helpers to allow returning 16-bit results
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 5b3e1a0532 radeonsi: change the compute blit to clear/blit multiple pixels per lane
The target is 8-16B per lane regardless of the format and number of
samples. This is needed to fully utilize the memory bandwidth instead
of only a small fraction of it. These are optimal numbers identified by
benchmarking.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák d4c066abaf radeonsi: adds flags parameter into si_compute_blit to replace fail_if_slow
So that we can also specify sync flags.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 30af861bff radeonsi: restructure (rewrite) the compute blit shader
This merges the separate MSAA, downsampling, upsampling, and non-MSAA blocks.
It's not meant to change behavior, but some change are necessary:
- disallow 16 samples
- loads only load the number of components that we need
- optimizations barriers are placed optimally and include the sample index
  in the same vector as the coordinates, so that LLVM is forced to form VMEM
  clauses for loads and stores
- the shader queries the descriptor for the dst image manually and passes
  it to the image store instead of the image variable (this is needed to get
  latency hiding for scalar loads in the presence of optimization barriers)

This is a prerequisite for blitting multiple pixels per lane.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák d2ce5fc07a radeonsi: split xy_clamp_to_edge to separate X and Y flags for the compute blit
to generate less shader code if only one of the axes needs clamping.

Use util_is_box_out_of_bounds instead of doing it manually.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 7ee936bf65 radeonsi: convert the compute blit shader hash table to u64 keys
32 bits is not enough anymore. We'll add more.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák 40bcb588dd radeonsi: remove the old si_compute_copy_image
It's replaced by the compute blit.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:11 +00:00
Marek Olšák b0c0cca3a7 radeonsi: switch the old compute image copy to the new one using the blit
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák f3a59fe216 radeonsi: add a new version of si_compute_copy_image using the compute blit
It's faster and handles more stuff.

This is mostly the same code as the old version, but it calls
si_compute_blit at the end.

A later commit will remove the old version, so that there is no code
duplication.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák b7389615c6 radeonsi: rename si_compute_copy_image -> si_compute_copy_image_old
It will be replaced in several stages.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák 8b030ac588 radeonsi: rename si_compute_blit "testing" parameter to "fail_if_slow"
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák a4602395d2 radeonsi: switch compute image clears to the compute blit shader
The compute blit shader is faster and handles more stuff.
This removes the old clear_render_target shader.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák 9915289bdf radeonsi: extend the compute blit to do image clears as well
The compute blit is faster and handles more stuff than
the clear_render_target shader. We can just pass a clear value to it
to replace the source image.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák e41887c6a4 radeonsi: cosmetic and robustness changes for the compute blit
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák 0c5d727a5e radeonsi: document better how X/Y flipping in the compute blit works
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák bb86366fee radeonsi/gfx11: enable MSAA image stores in the compute blit
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák 5897dde3f7 radeonsi: don't fail due to DCC when using the compute blit on compute queues
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák fcd9f0069f radeonsi: don't use si_can_use_compute_blit in the compute blit
It makes supporting compute queues on all chips more complicated.
Other uses of si_can_use_compute_blit will be removed, so the function
will be removed too.

Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák 1b924bad5e radeonsi: reject unsupported parameters as the first thing in the compute blit
Reviewed-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák 993c30af06 radeonsi: fix sample0_only for the compute blit
Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00
Marek Olšák 0ca93e8090 radeonsi: optimize unaligned compute blits
If a blit starts on a coordinate that is not at the beginning of a tile
(e.g. 8x8), launch extra threads before 0,0,0 to make all following blocks
start at the beginning of such tiles. This makes such blits faster.

Acked-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28917>
2024-06-08 05:48:10 +00:00