Commit Graph

1310 Commits

Author SHA1 Message Date
Ivan Avdeev ff6504d4c0 radv: add experimental support for AMD BC-250 board
AMD BC-250 is a mining board based on an AMD APU with an integrated GPU
that kernel recognizes as Cyan Skillfish.

It is basically RDNA1/GFX10, but with added hardware ray tracing
support. LLVM calls it GFX1013, see
https://llvm.org/docs/AMDGPU/AMDGPUAsmGFX1013.html

Support for this GPU hasn't been extensively tested. Some games are
known to work, some non-trivial ray query compute and ray tracing
pipeline rendering works too. Q2RTX works.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33116>
2025-03-04 08:07:31 +00:00
Daniel Schürmann 52253da783 aco: unify get_addr_sgpr_from_waves() and get_addr_vgpr_from_waves() into one function
which returns the limit as RegisterDemand.

Also remove the unused get_extra_sgprs() from aco_ir.h.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00
Rhys Perry 539f9b4ba6 nir,aco,radv: add align_mul/offset to buffer_amd intrinsics
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29242>
2025-02-07 13:52:57 +00:00
Daniel Schürmann 1a8a643bbd aco/isel: track control flow divergence in loops more accurately
We introduce two new variables, cf_context::in_divergent_cf and
cf_context::parent_loop.has_divergent_break, in order to determine
whether there is any other invocations on a different CF path.

Totals from 1305 (1.64% of 79395) affected shaders: (Navi31)

Instrs: 659211 -> 657815 (-0.21%); split: -0.22%, +0.01%
CodeSize: 3483228 -> 3477960 (-0.15%); split: -0.16%, +0.01%
VGPRs: 68820 -> 48048 (-30.18%)
Latency: 14197750 -> 14170767 (-0.19%); split: -0.26%, +0.07%
InvThroughput: 1619103 -> 1619826 (+0.04%); split: -0.02%, +0.07%
VClause: 12384 -> 12350 (-0.27%)
SClause: 26693 -> 26844 (+0.57%); split: -0.01%, +0.57%
Copies: 44994 -> 43535 (-3.24%); split: -3.26%, +0.02%
PreSGPRs: 49007 -> 48907 (-0.20%)
PreVGPRs: 32171 -> 32121 (-0.16%)
VALU: 349984 -> 349857 (-0.04%); split: -0.04%, +0.00%
SALU: 84252 -> 83988 (-0.31%); split: -0.32%, +0.00%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann 583c3586fe aco/isel: remove loop nest information from exec_info
Since we never enter loops with an empty exec mask, and the
control flow is structured, we don't need to consider the
loop nest depth.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann a77258346c aco/isel: fix assumptions about potential empty exec mask in nested control flow
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann 44216e035f aco/isel: add and use exec_info::empty() helper
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann 8e8398832c aco/isel: use cf_context in loop_context to restore cf information
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann 8b9c9fb904 aco/isel: use cf_context in if_context to restore cf information
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann c2bfc05d71 aco/isel: rename cf_context::has_divergent_branch
Make it more consistent with cf_context::has_branch.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann 0c5a91b9f2 aco/isel: move cf_info into separate struct cf_context
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Daniel Schürmann 61fa007e48 aco/isel: fix empty exec tracking for uniform branches
Totals from 5 (0.01% of 79395) affected shaders: (Navi31)

Instrs: 54730 -> 54715 (-0.03%)
CodeSize: 276928 -> 276852 (-0.03%)
Latency: 215212 -> 214874 (-0.16%)
InvThroughput: 40154 -> 40150 (-0.01%)
Copies: 6824 -> 6821 (-0.04%); split: -0.06%, +0.01%
Branches: 1625 -> 1615 (-0.62%)
SALU: 5682 -> 5678 (-0.07%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33206>
2025-02-05 10:54:21 +00:00
Konstantin Seurer 60a20bcf3d nir: Stop using instructions for debug info
Annotating ssa defs without affecting compilation is impossible with
debug info instructions since referencing a nir_def from the debug info
instr will add uses.

The old approach also stops worrking if passes reorder instructions.

This patch proposes a solution which should not regress performance just
like the old approach. The difference is that this one allocates a bit
more space for debug info instead of adding a new instruction for it.

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33141>
2025-01-30 20:14:01 +00:00
Marek Olšák f98613d47c aco: implement replacement of sample_mask_in with helper_invocation in PS prolog
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák a842f198d7 aco: simplify how broadcast_last_cbuf is implemented in PS epilog
So PS epilogs only need a single bool flag that determines whether all
enabled color buffers should be written.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák 5c4f737b84 aco: implement replacing frag_coord with pixel_coord in PS prolog
This adds an option to replace frag_coord.xy with pixel_coord when sample
shading is disabled, which is most of the time. This reduces the number of
input VGPRs.

It's already implement in ac_nir_lower_ps_early for monolithic shaders.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:26 -05:00
Marek Olšák d7d4d56f5b ac,aco,radeonsi: replace SampleMaskIn with 1 << SampleID if full sample shading
Since the sample mask is always 1 << sample_id with full sample shading,
just use that instead of loading sample_mask_in. Set it to 0 if it's
a helper invocation. This removes the sample mask input VGPR.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33024>
2025-01-25 12:20:25 -05:00
Marek Olšák d160252270 ac: use Z_EXPORT_FORMAT=32_AR for Z + Alpha mrtz exports
This should be faster than 32_ABGR.

Also, stencil exports are changed from UINT16_ABGR to 32_GR,
which should have no effect on performance.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33046>
2025-01-16 02:58:03 +00:00
Timur Kristóf 50035f0316 ac/nir: Move all ac_nir_* files to a new folder.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32966>
2025-01-14 13:46:30 +01:00
Samuel Pitoiset 10e424f586 aco: always use ds_bpermute for shuffle/rotate on GFX12
ds_bpermute supports both 32 and 64 lanes now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32974>
2025-01-13 08:33:38 +00:00
Samuel Pitoiset f94bd67b82 aco: fix VS prologs on GFX12
MTBUF/MUBUF instructions must use zero for SOFFSET, use const_offset
instead.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32904>
2025-01-07 13:44:32 +00:00
Marek Olšák 0d5b03f2b9 ac/nir: split local_invocation_ids to 3 separate VGPR inputs
so that we can set the upper range per VGPR.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák ceb6f8fc32 amd: lower load_tess_rel_patch_id/primitive_id/tess_coord and overwrite.. in NIR
The overwrite instruction complicates it a little, which is why these
intrinsics are lowered together.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 61bfb4fa06 amd: lower load_subgroup_invocation in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák e69f47faee amd: lower load_local_invocation_index in NIR
This is the last intrinsic that needed the LS VGPR bug workaround in ACO
and ac_nir_to_llvm.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 342dcbdc8b amd: lower load_vertex_id/instance_id and overwrite_vs_arguments in NIR
2 things complicate this:
- overwrite_vs_arguments_amd
- the LS VGPR bug workaround

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 66dd70adc5 amd: lower load_gs_wave_id_amd in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 923f59c971 amd: lower load_barycentric_at_offset in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 16ab05fad1 amd: lower load_barycentric_pixel/centroid/sample in NIR
radeonsi needs to preserve interp_mode in the arg load.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 7e83f6ca8b amd: lower load_front_face in NIR
radeonsi must do this after si_lower_nir_abi, which optimizes front_face,
but doesn't lower it.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 6ad5225b2a amd: lower load_frag_shading_rate in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 6d2e29ff6e amd: lower load_sample_pos in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 110e474b4f amd: lower load_sample_id in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 684c8da553 amd: lower load_invocation_id in NIR
ACO can't look for it because it's lowered there.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák d281240c57 amd: lower load_first_vertex/base_instance/draw_id/view_index in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 0d372b043b amd: lower load_local_invocation_id in NIR
This is based on ACO.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 13cb5c7b72 amd: lower load_frag_coord in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Marek Olšák 58cb155068 amd: lower load_pixel_coord in NIR
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32782>
2025-01-02 17:36:55 +00:00
Georg Lehmann 43fca7fffe amd: support load_front_face_fsign
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:35 +00:00
Georg Lehmann aee0c7274c amd: switch to FRONT_FACE_ALL_BITS(0)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32791>
2024-12-30 22:31:34 +00:00
Georg Lehmann 33a73203b0 aco/isel: skip and(exec) for top level demote_if/terminate_if
In nested control flow this is nessecary to not demote/terminate invocations
that are part of the global but not part of the local mask.

At the top level, the masks are the same and no additional invocations
can be accidentally disabled.

Foz-DB Navi21:
Totals from 2095 (2.64% of 79395) affected shaders:
Instrs: 1058326 -> 1056839 (-0.14%)
CodeSize: 5632480 -> 5626616 (-0.10%)
Latency: 12082761 -> 12080520 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 2246677 -> 2246636 (-0.00%); split: -0.00%, +0.00%
Copies: 114446 -> 114433 (-0.01%)
SALU: 230585 -> 229098 (-0.64%)

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32755>
2024-12-26 18:34:38 +00:00
Marek Olšák de996ac481 radeonsi: kill Z and stencil PS outputs if depth or stencil is disabled
This adds kill_z and kill_stencil flags to the shader PS epilog key, which
removes those outputs if depth or stencil are disabled.

It must be implemented in:
* ACO PS epilog
* LLVM PS epilog
* ac_nir_lower_ps for monolithic shaders

Some of the samplemask code wasn't completely correct, but probably harmless.

Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32713>
2024-12-24 12:02:20 +00:00
Qiang Yu dff14d102d aco: fix voffset missing when buffer store base >=4096
Regression on test:
  dEQP-GLES31.functional.geometry_shading.basic.output_256

voffset is missing if buffer store base >=4096, we need to
re-calculate offen after resolve_excess_vmem_const_offset().

Fixes: cdaf269924 ("aco: inline store_vmem_mubuf/emit_single_mubuf_store")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32767>
2024-12-24 01:42:45 +00:00
Marek Olšák 85c20def94 ac,radv,radeonsi: enable TCS input reads from VGPRs for all compatible loads
Cross-invocation TCS input access doesn't prevent same-invocation access.
This improves shaders that use both for the same inputs.

Also, if some components of a vec4 slot only use same-invocation access and
other components only use cross-invocation access (it's possible after
compaction), this takes the VGPR path for the components with
same-invocation access, which didn't happen previously because all masks
only describe whole vec4s.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31673>
2024-12-18 11:07:59 +00:00
Samuel Pitoiset 553eb1a3fd radv: fix alpha-to-coverage with alpha-to-one when MRTZ is also exported
On AMD hardware, it's possible to export a separate alpha channel for
applying alpha-to-one after alpha-to-coverage and not before.

On GFX11+, it's already mostly supported but alpha needs to be exported
to MRTZ.a and one to MRT0.a. The hw always uses alpha for
alpha-to-coverage from MRTZ.a.

On older generations, the driver needs the same separate alpha export
but it also needs to configure the hardware with COVERAGE_TO_MASK_ENABLE
which selects alpha from MRTZ.a.

This should fix alpha-to-coverage with alpha-to-one when either
depth, stencil or samplemask are exported but it still needs a slightly
different solution without MRTZ. I will fix that later.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>
2024-12-11 10:50:31 +00:00
Samuel Pitoiset 70047e6bd6 aco: export alpha to MRTZ.a and one to MRT0.a for alpha-to-one on GFX11
For FS epilogs.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32523>
2024-12-11 10:50:31 +00:00
Georg Lehmann 4a977ea24f aco/gfx11+: use v_and_b32 to extract local id 0
Foz-DB Navi31:
Totals from 2561 (3.23% of 79206) affected shaders:
CodeSize: 10399004 -> 10389120 (-0.10%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32532>
2024-12-10 11:58:21 +00:00
Daniel Schürmann b64fff7731 aco: remove definition from Pseudo branch instructions
They are not needed anymore.

Totals from 7019 (8.84% of 79395) affected shaders: (Navi31)

Instrs: 14805400 -> 14824196 (+0.13%); split: -0.00%, +0.13%
CodeSize: 78079972 -> 78132932 (+0.07%); split: -0.01%, +0.08%
SpillSGPRs: 4485 -> 4515 (+0.67%); split: -0.76%, +1.43%
Latency: 165862000 -> 165836134 (-0.02%); split: -0.02%, +0.00%
InvThroughput: 30061764 -> 30057781 (-0.01%); split: -0.01%, +0.00%
SClause: 392323 -> 392286 (-0.01%); split: -0.01%, +0.00%
Copies: 1012262 -> 1012234 (-0.00%); split: -0.04%, +0.04%
Branches: 365910 -> 365909 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 360167 -> 355363 (-1.33%)
VALU: 8837197 -> 8837276 (+0.00%); split: -0.00%, +0.00%
SALU: 1402593 -> 1402621 (+0.00%); split: -0.03%, +0.03%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32037>
2024-12-06 14:34:03 +00:00
Georg Lehmann 7425e71ae0 aco/gfx12: disable vinterp ddx/ddy optimization
This only seems to work on gfx11 and gfx11.5, and it's only faster on gfx11.5.

We could continue to use vinterp, with constants copied to vgprs, but
whether that's beneficial depends on the shader.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>

Fixes: bee487df48 ("aco/gfx11.5+: use vinterp for fddx/fddy")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12250
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32495>
2024-12-06 12:01:39 +00:00
Rhys Perry 63b0692eac aco: don't use uniform continues if exec might be empty
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31143>
2024-11-25 10:32:59 +00:00