Commit Graph

351 Commits

Author SHA1 Message Date
Daniel Schürmann
9b1a635bb3 amd/common: merge radv_nir_opt_access_speculate() into ac_nir_flag_smem_for_loads()
One shader is negatively affected, but we save 2 entire iterations over every shader.
This effect is also mitigated with the next commits.

Totals from 1 (0.00% of 79839) affected shaders: (Navi48)

Instrs: 947 -> 958 (+1.16%)
CodeSize: 4728 -> 4732 (+0.08%)
Latency: 20678 -> 20723 (+0.22%)
InvThroughput: 2697 -> 2698 (+0.04%)
SClause: 26 -> 27 (+3.85%)
Copies: 139 -> 145 (+4.32%)
Branches: 46 -> 47 (+2.17%)
VALU: 460 -> 463 (+0.65%)
SALU: 201 -> 204 (+1.49%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
2025-10-14 16:33:12 +00:00
Daniel Schürmann
9553e56c67 radv: use load_global instead of load_global_amd for load_sample_positions_amd
For consistency.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37843>
2025-10-14 16:33:10 +00:00
Samuel Pitoiset
a47952d495 radv: upload and emit dynamic descriptors separately from push constants
Dynamic descriptors are rarely used and this will allow to do more
optimizations for push constants, like gathering the size from shaders
themselves instead of using the pipeline layout.

fossils-db (GFX1201):
Totals from 21740 (27.30% of 79646) affected shaders:
Instrs: 11186407 -> 11192061 (+0.05%); split: -0.05%, +0.10%
CodeSize: 59842068 -> 59864412 (+0.04%); split: -0.04%, +0.08%
Latency: 56333136 -> 56325208 (-0.01%); split: -0.03%, +0.02%
InvThroughput: 8576452 -> 8576516 (+0.00%); split: -0.00%, +0.00%
SClause: 279186 -> 279713 (+0.19%); split: -0.06%, +0.25%
Copies: 577854 -> 581735 (+0.67%); split: -0.28%, +0.95%
PreSGPRs: 867163 -> 866409 (-0.09%)
SALU: 1391187 -> 1395055 (+0.28%); split: -0.12%, +0.39%

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37768>
2025-10-14 15:34:43 +00:00
Samuel Pitoiset
bc32286e5b radv: declare a new user SGPR for dynamic descriptors
To move them out of push constants.

fossils-db (GFX1201):
Totals from 20700 (25.99% of 79646) affected shaders:
Instrs: 14375624 -> 14370051 (-0.04%); split: -0.07%, +0.03%
CodeSize: 76746128 -> 76723772 (-0.03%); split: -0.05%, +0.02%
Latency: 74103586 -> 74113651 (+0.01%); split: -0.01%, +0.02%
InvThroughput: 11908817 -> 11908798 (-0.00%); split: -0.00%, +0.00%
VClause: 249605 -> 249607 (+0.00%); split: -0.00%, +0.00%
SClause: 337914 -> 337772 (-0.04%); split: -0.08%, +0.04%
Copies: 843585 -> 839233 (-0.52%); split: -0.62%, +0.10%
PreSGPRs: 836283 -> 837260 (+0.12%)
SALU: 1790713 -> 1786374 (-0.24%); split: -0.29%, +0.05%

Co-authored-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37768>
2025-10-14 15:34:43 +00:00
Samuel Pitoiset
876e6a3bfe radv/rt: fix memory leak in lower_rt_instructions_monolithic()
Found with ASAN.

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37844>
2025-10-14 06:54:02 +00:00
Samuel Pitoiset
08dbab0600 radv: rename shader arg descriptor_sets to descriptors
It's more generic and descriptor heaps will use it too.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
2025-10-10 13:22:03 +00:00
Samuel Pitoiset
609ae4e647 radv: rename indirect_descriptor_sets to indirect_descriptors
With descriptor heap the driver will also have to emit indirect
descriptor heaps in some cases.

Rename couple of things to make them more generic.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37786>
2025-10-10 13:22:03 +00:00
Samuel Pitoiset
08ddf2f878 radv: lower embedded/immutable samplers earlier
Lowering them earlier right after VTN would allow us to implement
embedded samplers for descriptor heap properly for merged shaders.

Non-immediate samplers are still lowered in
radv_nir_apply_pipeline_layout because they require shader arguments.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37688>
2025-10-07 09:25:28 +00:00
Georg Lehmann
cf30742a66 radv,aco: don't end monolithic ray tracing with unconditional terminate
The terminate requires more code and blocks us from deallocating VGPRs early.

Foz-DB Navi31:
Totals from 63 (0.08% of 80273) affected shaders:
Instrs: 3372702 -> 3372467 (-0.01%)
CodeSize: 17441676 -> 17440736 (-0.01%)
Latency: 19763447 -> 19763288 (-0.00%)
InvThroughput: 3860502 -> 3860478 (-0.00%)
Branches: 96204 -> 96141 (-0.07%)
SALU: 406648 -> 406549 (-0.02%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37542>
2025-09-25 15:35:55 +00:00
Rhys Perry
591b498e1f radv: fix progress reporting in lower_rt_derefs
Only create nir_load_rt_arg_scratch_offset_amd if needed.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35069>
2025-09-24 08:20:27 +00:00
Marek Olšák
bbab69d343 radv: fix load_smem alignment
radv_cmd_buffer_upload_alloc_aligned is used with alignment=0, which
guarantees that the alignment is at least 4.

Fixes: 9e16ed7a13 - ac/nir: switch nir_load_smem_amd uses to ac_nir_load_smem wrapper

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37345>
2025-09-19 21:08:25 -04:00
Georg Lehmann
a2d3cbac2a radv: determine subgroup/wave size early
This means we can actually implement varying subgroup size correctly.
It also means that we implement the implicit SPIR-V 1.6 full subgroups
requirement in compute shaders with cswave32/rtwave32.

In the future it will also allow more optimizations that use the subgroup size.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>

The only somewhat complex case here is GFX10 geometry shaders, if gewave32 is
used. We then only know the subgroup size when is_ngg is decided, as legacy
GS doesn't support wave32.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37294>
2025-09-14 13:21:21 +00:00
Georg Lehmann
4143f0725a radv/nir/lower_cmat: clean up GFX11 ACC->B convert
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37213>
2025-09-09 06:08:55 +00:00
Georg Lehmann
5c0ebcdaef radv/nir/lower_cmat: clean up gfx12 transpose
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37213>
2025-09-09 06:08:55 +00:00
Georg Lehmann
2da7b4bd0a radv/nir/lower_cmat: add shuffle_xor_imm helper
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37213>
2025-09-09 06:08:54 +00:00
Christian Gmeiner
1492de1bc3 radv: re-format using clang-format
No manual changes here, this is simply running
$ ninja -C build/ clang-format

Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37226>
2025-09-09 05:48:56 +00:00
Samuel Pitoiset
8e4d5743d2 radv: move debug related drirc to radv_drirc::debug
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37145>
2025-09-05 05:56:17 +00:00
Georg Lehmann
83326af899 nir/builder: add nir_inverse_ballot_imm
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37178>
2025-09-04 14:03:56 +00:00
Georg Lehmann
ef8c364d3d nir: make inverse_ballot 1bit only
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37178>
2025-09-04 14:03:56 +00:00
Samuel Pitoiset
decf9af472 radv/rt: only use one user SGPR for the traversal shader addr
All shaders are allocated in the 32-bit addr space. To avoid an issue
with alignment, and also for future work, there is an unused user SGPR.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37133>
2025-09-03 05:53:41 +00:00
Daniel Schürmann
fcf8899c9e radv/rt: use ACCESS_CAN_REORDER when loading SBT entries
Totals from 56 (0.07% of 79839) affected shaders: (Navi48)

Instrs: 2790220 -> 2790130 (-0.00%); split: -0.00%, +0.00%
CodeSize: 14704952 -> 14704292 (-0.00%)
Latency: 13994383 -> 13953444 (-0.29%); split: -0.29%, +0.00%
InvThroughput: 2717973 -> 2710748 (-0.27%); split: -0.27%, +0.00%
VClause: 68783 -> 68687 (-0.14%)
SClause: 51910 -> 52007 (+0.19%)
Copies: 223192 -> 223190 (-0.00%); split: -0.01%, +0.01%
VALU: 1557513 -> 1557451 (-0.00%); split: -0.00%, +0.00%
VMEM: 118789 -> 118692 (-0.08%)
SMEM: 66498 -> 66595 (+0.15%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36933>
2025-09-02 19:07:30 +00:00
Samuel Pitoiset
bc9a020dd3 radv: rename NGG culling user SGPRs
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37022>
2025-09-01 08:52:55 +00:00
Marek Olšák
9e16ed7a13 ac/nir: switch nir_load_smem_amd uses to ac_nir_load_smem wrapper
ac_nir_load_smem will use load_global_amd

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37101>
2025-08-30 15:04:32 -04:00
Georg Lehmann
acd879f096 radv: set ACCESS_CAN_SPECULATE for smem buffer loads with known good descriptors
Foz-DB GFX1201:
Totals from 2872 (3.59% of 80098) affected shaders:
MaxWaves: 78208 -> 78234 (+0.03%); split: +0.21%, -0.18%
Instrs: 6214171 -> 6193701 (-0.33%); split: -0.40%, +0.07%
CodeSize: 33121244 -> 33113692 (-0.02%); split: -0.18%, +0.16%
VGPRs: 151680 -> 152016 (+0.22%); split: -0.25%, +0.47%
SpillSGPRs: 775 -> 776 (+0.13%)
Latency: 46080905 -> 45955331 (-0.27%); split: -0.55%, +0.28%
InvThroughput: 6235954 -> 6250598 (+0.23%); split: -0.25%, +0.48%
VClause: 111125 -> 110955 (-0.15%); split: -0.17%, +0.02%
SClause: 221845 -> 214761 (-3.19%); split: -3.20%, +0.01%
Copies: 501387 -> 488215 (-2.63%); split: -2.96%, +0.33%
Branches: 191455 -> 178574 (-6.73%)
PreSGPRs: 146364 -> 146923 (+0.38%); split: -0.12%, +0.50%
PreVGPRs: 120813 -> 121073 (+0.22%)
VALU: 3139282 -> 3137471 (-0.06%); split: -0.11%, +0.05%
SALU: 1079863 -> 1083158 (+0.31%); split: -0.55%, +0.86%
VMEM: 182255 -> 182247 (-0.00%)
SMEM: 293409 -> 290233 (-1.08%)

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36938>
2025-08-27 09:45:19 +00:00
Georg Lehmann
5a10142a9f radv/nir/lower_cmat: split up larger nested switches
This has been annoying me for quite some while, the level of indention
makes reviewing code changes in Gitlab harder.

I think now is a good time to change this before more cmat lowering is added.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37002>
2025-08-27 08:20:47 +00:00
Samuel Pitoiset
c5a5c8818c radv/nir/lower_cmat: handle untyped pointers for load/store
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36677>
2025-08-26 13:47:07 +00:00
Samuel Pitoiset
19c712c8ef radv: rename rast_prim to vgt_outprim_type everywhere
To avoid confusion between the primitive topology and the output
rasterized primitive.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36912>
2025-08-25 12:17:38 +00:00
Samuel Pitoiset
ce83800262 radv: remove unused forwarded declarations of pipeline layout
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36792>
2025-08-18 07:25:34 +00:00
Konstantin Seurer
cc0dc4b566 radv: Store parent node IDs inside nodes on GFX12
Saves some space.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36691>
2025-08-15 13:00:32 +00:00
Konstantin Seurer
be4be884e1 radv: Rename radv_printf files to radv_debug_nir
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34392>
2025-08-15 10:32:34 +00:00
Samuel Pitoiset
0ac7f1888f radv: reduce the combined image/sampler desc size on GFX11+
From 96 to 64 due to the 32 bytes descriptor alignment.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36762>
2025-08-14 06:47:30 +00:00
Samuel Pitoiset
297cf6f1aa radv/meta: add a pass to clear HiZ surfaces
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36739>
2025-08-12 13:48:09 +00:00
Konstantin Seurer
c4b18c689f radv: Emit compressed primitive nodes on GFX12
Emits two triangles per node whenever possible. The nir code will
revisit the triangle node to handle the second triangle only if both
triangles are interescted by the ray.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35734>
2025-08-07 20:23:15 +00:00
Qiang Yu
196569b1a4 all: rename gl_shader_stage to mesa_shader_stage
It's not only for GL, change to a generic name.

Use command:
  find . -type f -not -path '*/.git/*' -exec sed -i 's/\bgl_shader_stage\b/mesa_shader_stage/g' {} +

Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Acked-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36569>
2025-08-06 10:28:40 +08:00
Alyssa Rosenzweig
82ae8b1d33 treewide: simplify nir_def_rewrite_uses_after
Most of the time with nir_def_rewrite_uses_after, you want to rewrite after the
replacement. Make that the default thing to be more ergonomic and to drop
parent_instr uses.

We leave nir_def_rewrite_uses_after_instr defined if you really want the old
signature with an arbitrary after point.

Via Coccinelle patch:

    @@
    expression a, b;
    @@

    -nir_def_rewrite_uses_after(a, b, b->parent_instr)
    +nir_def_rewrite_uses_after_def(a, b)

Followed by a bunch of sed.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Alyssa Rosenzweig
cc6e3b84cb treewide: use nir_def_as_*
Via Coccinelle patch:

    @@
    expression definition;
    @@

    -nir_instr_as_alu(definition->parent_instr)
    +nir_def_as_alu(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_intrinsic(definition->parent_instr)
    +nir_def_as_intrinsic(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_phi(definition->parent_instr)
    +nir_def_as_phi(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_load_const(definition->parent_instr)
    +nir_def_as_load_const(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_deref(definition->parent_instr)
    +nir_def_as_deref(definition)

    @@
    expression definition;
    @@

    -nir_instr_as_tex(definition->parent_instr)
    +nir_def_as_tex(definition)

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Acked-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36489>
2025-08-01 15:34:24 +00:00
Antonio Ospite
ddf2aa3a4d build: avoid redefining unreachable() which is standard in C23
In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>

See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in

And this causes build errors when building for C23:

-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
                 from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
  123 | #define unreachable(str)    \
      |         ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
  456 | #define unreachable() (__builtin_unreachable ())
      |         ^~~~~~~~~~~
-----------------------------------------------------------------------

So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.

Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.

This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.

All the instances of the macro, including the definition, were updated
with the following command line:

  git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
  while read file; \
  do \
    sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
  done && \
  sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
2025-07-31 17:49:42 +00:00
Georg Lehmann
4683187f49 radv/nir/lower_cmat: load gfx11 8bit ACC using the B layout to get aligned loads
This allows us to use aligned loads that can be vectorized, without any
downside as 8bit scalar loads always write 16bits of a register.

Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shader:
MaxWaves: 71 -> 68 (-4.23%)
Instrs: 60146 -> 59781 (-0.61%); split: -0.67%, +0.06%
CodeSize: 412448 -> 413428 (+0.24%); split: -0.11%, +0.35%
VGPRs: 2112 -> 2160 (+2.27%)
SpillVGPRs: 89 -> 68 (-23.60%)
Scratch: 11776 -> 8704 (-26.09%)
Latency: 196628 -> 193770 (-1.45%); split: -2.62%, +1.17%
InvThroughput: 224944 -> 226274 (+0.59%); split: -0.02%, +0.61%
VClause: 862 -> 796 (-7.66%)
Copies: 3166 -> 3342 (+5.56%); split: -6.22%, +11.78%
Branches: 37 -> 38 (+2.70%)
PreSGPRs: 311 -> 312 (+0.32%)
PreVGPRs: 2153 -> 2214 (+2.83%); split: -1.35%, +4.18%
VALU: 51073 -> 51448 (+0.73%); split: -0.03%, +0.77%
SALU: 1072 -> 1074 (+0.19%)
VMEM: 3275 -> 2765 (-15.57%)
VOPD: 1739 -> 1783 (+2.53%); split: +7.99%, -5.46%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36117>
2025-07-30 07:25:51 +00:00
Marek Olšák
09e607c385 nir: add access to load_smem_amd (for ACCESS_CAN_SPECULATE)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36099>
2025-07-24 18:41:38 +00:00
Marek Olšák
4c8a757951 radv,radeonsi: mark VS input loads and poly stipple load speculatable
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35950>
2025-07-24 06:31:17 +00:00
Alyssa Rosenzweig
8a1a410389 treewide: use SWAP macro
Via Coccinelle patch + manual clean up:

    @@
    identifier temporary, a, b;
    type T;
    @@

    -T temporary = a;
    -a = b;
    -b = temporary;
    +SWAP(a, b);

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36297>
2025-07-23 19:49:47 +00:00
Alyssa Rosenzweig
6b34e2174e nir: introduce ergonomic tex builder
for intrinsics, we have these really nice builders using designated initializers
+ macros to specify optional indices. texture instrs have even more craziness
involved, but we can do the same trick. this commit takes the existing "fixed
form" deref-centric tex builders and generalizes them to work with non-deref
textures, making it useful also for GL and late VK passes, while providing an
API that strives to be ergonomic and consistent.

this series only implements a subset of possible texture operations for now, but
more generalizing could be added as people have need.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36050>
2025-07-21 12:11:41 +00:00
Konstantin Seurer
d59c22b6e1 radv/rt: Implement null acceleration structure in shader code
The previous approach is broken with descriptor buffer capture/replay
because the address off the dummy VA used can randomly change.

Totals from 78 (20.58% of 379) affected shaders:

Instrs: 3837275 -> 3839653 (+0.06%); split: -0.01%, +0.07%
CodeSize: 20235104 -> 20251744 (+0.08%); split: -0.01%, +0.09%
SpillSGPRs: 997 -> 1007 (+1.00%)
Latency: 22305937 -> 22331551 (+0.11%); split: -0.03%, +0.15%
InvThroughput: 4232313 -> 4237341 (+0.12%); split: -0.03%, +0.15%
VClause: 97043 -> 97027 (-0.02%); split: -0.02%, +0.01%
SClause: 72169 -> 72416 (+0.34%); split: -0.00%, +0.35%
Copies: 321578 -> 322126 (+0.17%); split: -0.11%, +0.28%
Branches: 110163 -> 110444 (+0.26%); split: -0.00%, +0.26%
PreSGPRs: 7879 -> 7942 (+0.80%)
VALU: 2155040 -> 2156425 (+0.06%); split: -0.02%, +0.09%
SALU: 502292 -> 503078 (+0.16%); split: -0.00%, +0.16%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36034>
2025-07-19 21:02:42 +00:00
Konstantin Seurer
d28ff8050a radv/rt: Use inv_dir for software ray-triangle tests
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
2025-07-19 16:35:37 +00:00
Konstantin Seurer
5494789e89 radv/rt: Optimize emulated ray-triangle tests
The imod instructions are lowered to 4 alu instructions each. We can do
better by packing the results with the values for kz.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
2025-07-19 16:35:37 +00:00
Konstantin Seurer
d140f2a6a2 radv: Implement watertightness for emulated RT
Instead of using fp64 (Which is broken in some cases) the new approach
only uses fp32 and implements tiebreaking for edge/vertex hits. Using
fp32 is also much faster, improving performance of q2rtx by around 40%.

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
2025-07-19 16:35:36 +00:00
Konstantin Seurer
55641f9ca0 radv: Disable pointer flags and the GFX12 WA for emulated RT
Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Reviewed-by: Autumn Ashton <misyl@froggi.es>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36213>
2025-07-19 16:35:36 +00:00
Konstantin Seurer
df44b353ad radv: Optimize ray tracing position fetch
Gets rid of a lot of indirection when fetching triangle positions.
Storing the primitive address increases register pressure by a bit but
the traversal shader which should have the highest register demand
should not be affected when position fetch is not used.

Totals:
Instrs: 4021686 -> 4022435 (+0.02%); split: -0.01%, +0.03%
CodeSize: 21235812 -> 21235832 (+0.00%); split: -0.02%, +0.02%
Latency: 23402275 -> 23412110 (+0.04%); split: -0.04%, +0.09%
InvThroughput: 4352818 -> 4352206 (-0.01%); split: -0.04%, +0.02%
VClause: 101906 -> 102058 (+0.15%); split: -0.03%, +0.18%
Copies: 342210 -> 342368 (+0.05%); split: -0.09%, +0.14%
Branches: 114988 -> 114993 (+0.00%)
PreVGPRs: 26551 -> 27111 (+2.11%)
VALU: 2249366 -> 2249524 (+0.01%); split: -0.01%, +0.02%
SALU: 529828 -> 529808 (-0.00%); split: -0.01%, +0.00%

Reviewed-by: Natalie Vock <natalie.vock@gmx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35533>
2025-07-19 16:07:59 +00:00
Georg Lehmann
497f607c8e radv/nir/lower_cmat: vectorize GFX11 B -> ACC conversion
Foz-DB Navi31:
Totals from 7 out of 14 FSR4 shaders:
MaxWaves: 50 -> 52 (+4.00%)
Instrs: 44951 -> 44516 (-0.97%); split: -1.00%, +0.03%
CodeSize: 309176 -> 305500 (-1.19%); split: -1.23%, +0.04%
VGPRs: 1464 -> 1416 (-3.28%)
SpillVGPRs: 188 -> 92 (-51.06%)
Scratch: 24064 -> 11776 (-51.06%)
Latency: 171318 -> 163663 (-4.47%); split: -4.51%, +0.04%
InvThroughput: 178796 -> 178956 (+0.09%); split: -0.04%, +0.13%
VClause: 769 -> 730 (-5.07%); split: -6.50%, +1.43%
Copies: 3149 -> 3261 (+3.56%); split: -1.21%, +4.76%
PreVGPRs: 1607 -> 1467 (-8.71%)
VALU: 37715 -> 37744 (+0.08%); split: -0.11%, +0.18%
SALU: 754 -> 753 (-0.13%)
VMEM: 2813 -> 2621 (-6.83%)
VOPD: 1674 -> 1685 (+0.66%); split: +1.55%, -0.90%

Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>
2025-07-16 11:46:52 +00:00
Georg Lehmann
7546169e1c radv/nir/lower_cmat: vectorize GFX11 ACC -> B conversion
Foz-DB Navi31:
Totals from 10 out of 14 FSR4 shaders:
Instrs: 64204 -> 60749 (-5.38%)
CodeSize: 439052 -> 417668 (-4.87%)
SpillVGPRs: 186 -> 188 (+1.08%)
Scratch: 23808 -> 24064 (+1.08%)
Latency: 208878 -> 202903 (-2.86%)
InvThroughput: 232898 -> 225688 (-3.10%)
VClause: 902 -> 907 (+0.55%); split: -1.55%, +2.11%
Copies: 6418 -> 3762 (-41.38%)
Branches: 55 -> 37 (-32.73%)
PreSGPRs: 297 -> 298 (+0.34%)
PreVGPRs: 2299 -> 2303 (+0.17%)
VALU: 54762 -> 51489 (-5.98%)
SALU: 956 -> 938 (-1.88%)
VMEM: 3469 -> 3473 (+0.12%)
VOPD: 3895 -> 2126 (-45.42%)

Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36115>
2025-07-16 11:46:52 +00:00