Commit Graph

1897 Commits

Author SHA1 Message Date
Dave Airlie a2701bfdb8 aco: move info pointer to a copy.
This is just setup to move this to a different struct later.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16342>
2022-05-11 19:07:11 +00:00
Timur Kristóf f553076eaf aco: Remove now-superfluous intrinsics.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13155>
2022-05-10 17:16:03 +00:00
Rhys Perry cc410dd4d1 aco: fix cmpswap global atomic definition on GFX6
Missed this one.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Fixes: 2f0bb39e16 ("aco: ensure that definitions fixed to operands have matching regclasses")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16367>
2022-05-10 01:01:13 +00:00
Rhys Perry 152092b8ea aco: skip s_barrier if TCS patches are within subgroup
fossil-db (Sienna Cichlid):
Totals from 518 (0.32% of 162293) affected shaders:
Instrs: 124943 -> 123908 (-0.83%)
CodeSize: 708764 -> 704624 (-0.58%)
Latency: 618380 -> 618279 (-0.02%)
InvThroughput: 214061 -> 214051 (-0.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16356>
2022-05-09 16:30:27 +00:00
Rhys Perry 359e60cf5e aco: split load_sbt_amd result
fossil-db (Sienna Cichlid):
Totals from 11 (0.01% of 162293) affected shaders:
Instrs: 47857 -> 47738 (-0.25%)
CodeSize: 261556 -> 261080 (-0.18%)
Latency: 1176822 -> 1176245 (-0.05%)
InvThroughput: 784549 -> 784165 (-0.05%)
Copies: 5959 -> 5840 (-2.00%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16203>
2022-05-06 15:15:13 +00:00
Daniel Schürmann d70688492c aco/optimizer: re-combine and copy-propagate p_create_vector(p_split_vector)
Totals from 309 (0.23% of 134913) affected shaders: (GFX10.3)
CodeSize: 1853812 -> 1857732 (+0.21%); split: -0.05%, +0.27%
Instrs: 340810 -> 341789 (+0.29%); split: -0.07%, +0.36%
Latency: 3301814 -> 3301774 (-0.00%); split: -0.02%, +0.02%
InvThroughput: 590473 -> 590914 (+0.07%); split: -0.00%, +0.08%
Copies: 28751 -> 29731 (+3.41%); split: -0.87%, +4.28%
PreSGPRs: 14010 -> 14028 (+0.13%); split: -0.01%, +0.14%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15414>
2022-05-06 14:52:07 +00:00
Daniel Schürmann 5e6e47ecea aco/ra: improve split_vector register assignment if the operand is not killed
This allows for more coalescing when lowering the copies.

Totals from 44801 (33.21% of 134913) affected shaders: (GFX10.3)
VGPRs: 1513264 -> 1513248 (-0.00%)
CodeSize: 113354240 -> 113172872 (-0.16%); split: -0.16%, +0.00%
Instrs: 21648793 -> 21603397 (-0.21%); split: -0.21%, +0.00%
Latency: 95762290 -> 95757403 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 15427354 -> 15427341 (-0.00%); split: -0.00%, +0.00%
Copies: 2065330 -> 2019933 (-2.20%); split: -2.20%, +0.00%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15414>
2022-05-06 14:52:07 +00:00
Daniel Schürmann 499dc20e6a aco: don't re-create vectors for load_barycentric_* intrinsics
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15414>
2022-05-06 14:52:07 +00:00
Rhys Perry 2f0bb39e16 aco: ensure that definitions fixed to operands have matching regclasses
If the operand is not killed, the definition needs to be large enough so
that the new location for the operand does not intersect with the old
location.

Fixes with zink:
KHR-GL45.shader_image_load_store.basic-allTargets-atomicCS
KHR-GL45.shader_image_load_store.basic-allTargets-atomicGS
KHR-GL45.shader_image_load_store.basic-allTargets-atomicVS

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6276
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16106>
2022-05-05 19:56:48 +01:00
Rhys Perry 1b639a0ce5 aco/ra: fix vgpr_limit
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Fixes: b98a4d4dd7 ("aco: refactor GPR limit calculation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16297>
2022-05-04 11:12:13 +00:00
Georg Lehmann 69cceab718 aco: Remove D16 zero components from image stores.
No foz-db changes.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15179>
2022-05-04 09:58:03 +00:00
Georg Lehmann 7a6dbe0c77 aco: Implement image_load d16.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15179>
2022-05-04 09:58:03 +00:00
Georg Lehmann 7ffcaf9187 aco: Implement image_store d16.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15179>
2022-05-04 09:58:03 +00:00
Joshua Ashton ce7966fcb4 aco: Use movk for AddressHi bits in vertex prolog
Eliminates a useless extra dword by transforming
        s_mov_b32 s47, 0xffff8000                                   ; beaf03ff ffff8000
to
        s_movk_i32 s47, 0x8000                                      ; b02f8000

which does the same thing.

Reviewed-By: Tatsuyuki Ishi <ishitatsuyuki@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15415>
2022-05-03 10:15:36 +00:00
Daniel Schürmann 58bd9a379e aco/ra: fix live-range splits of phi definitions
It could happen that at the time of a live-range split,
a phi was not yet placed in the new instruction vector,
and thus, instead of renamed, a new phi was created.

Fixes: dEQP-VK.subgroups.ballot_broadcast.compute.subgroupbroadcast_i8vec2
Cc: mesa-stable

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16248>
2022-05-03 09:54:30 +00:00
Daniel Schürmann 12d7f911c9 aco/optimizer: prevent any overflow between SGPR and const offset on MUBUF
Apparently, if the SGPR offset + const offset overflows,
it doesn't work.

Totals from 145 (0.11% of 134913) affected shaders: (GFX10.3)
SpillSGPRs: 134 -> 104 (-22.39%)
CodeSize: 1632676 -> 1645916 (+0.81%); split: -0.03%, +0.84%
Instrs: 316920 -> 320252 (+1.05%); split: -0.01%, +1.07%
Latency: 1456285 -> 1459686 (+0.23%); split: -0.02%, +0.25%
InvThroughput: 165785 -> 166086 (+0.18%); split: -0.02%, +0.20%
VClause: 6815 -> 6875 (+0.88%); split: -0.03%, +0.91%
SClause: 19089 -> 19079 (-0.05%); split: -0.06%, +0.01%
PreSGPRs: 7302 -> 7304 (+0.03%); split: -0.01%, +0.04%

Fixes: KHR-GL45.shader_storage_buffer_object.basic-operations-case1-cs
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15866>
2022-04-29 15:57:57 +00:00
Daniel Schürmann d5dc0c0392 aco: adjust num_waves for LDS before scheduling
Totals from 67 (0.05% of 134913) affected shaders: (GFX10.3)
VGPRs: 2024 -> 2136 (+5.53%); split: -0.40%, +5.93%
CodeSize: 162364 -> 162348 (-0.01%); split: -0.08%, +0.07%
MaxWaves: 1882 -> 1816 (-3.51%); split: +0.11%, -3.61%
Instrs: 29176 -> 29162 (-0.05%); split: -0.09%, +0.04%
Latency: 329984 -> 327272 (-0.82%); split: -0.88%, +0.06%
InvThroughput: 54653 -> 54672 (+0.03%); split: -0.01%, +0.04%
VClause: 782 -> 761 (-2.69%); split: -2.81%, +0.13%
SClause: 833 -> 824 (-1.08%); split: -2.28%, +1.20%
Copies: 1872 -> 1873 (+0.05%); split: -0.37%, +0.43%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16039>
2022-04-29 15:39:10 +00:00
Daniel Schürmann 8d8c59b4cd aco: split num_waves adjustment into separate function
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16039>
2022-04-29 15:39:10 +00:00
Daniel Schürmann 6220046ad1 aco: remove 'max_waves' and use 'num_waves' to adjust for LDS and workgroup size
Totals from 21 (0.02% of 134913) affected shaders: (GFX10.3)
VGPRs: 1024 -> 1176 (+14.84%)
CodeSize: 127824 -> 127664 (-0.13%); split: -0.17%, +0.04%
MaxWaves: 416 -> 378 (-9.13%)
Instrs: 22521 -> 22502 (-0.08%); split: -0.17%, +0.09%
Latency: 146386 -> 143154 (-2.21%); split: -2.21%, +0.00%
InvThroughput: 28379 -> 28944 (+1.99%); split: -0.23%, +2.22%
VClause: 575 -> 579 (+0.70%); split: -0.87%, +1.57%
SClause: 692 -> 645 (-6.79%)
Copies: 780 -> 747 (-4.23%); split: -4.74%, +0.51%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16039>
2022-04-29 15:39:10 +00:00
Samuel Pitoiset 6873da0e42 aco: fix load_barycentric_at_{sample,offset} on GFX6-7
The computation was wrong.

Fixes dEQP-GLES31.functional.shaders.multisample_interpolation.interpolate_at_offset.*
with Zink on GFX6 (Pitcairn).

Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16099>
2022-04-25 07:17:39 +00:00
Jason Ekstrand 1b8a43a0ba util: Remove util_cpu_detect
util_cpu_detect is an anti-pattern: it relies on callers high up in the call
chain initializing a local implementation detail. As a real example, I added:

...a Mali compiler unit test
...that called bi_imm_f16() to construct an FP16 immediate
...that calls _mesa_float_to_half internally
...that calls util_get_cpu_caps internally, but only on x86_64!
...that relies on util_cpu_detect having been called before.

As a consequence, this unit test:

...crashes on x86_64 with USE_X86_64_ASM set
...passes on every other architecture
...works on my local arm64 workstation and on my test board
...failed CI which runs on x86_64
...needed to have a random util_cpu_detect() call sprinkled in.

This is a bad design decision. It pollutes the tree with magic, it causes
mysterious CI failures especially for non-x86_64 developers, and it is not
justified by a micro-optimization.

Instead, let's call util_cpu_detect directly from util_get_cpu_caps, avoiding
the footgun where it fails to be called.  This cleans up Mesa's design,
simplifies the tree, and avoids a class of a (possibly platform-specific)
failures. To mitigate the added overhead, wrap it all in a (fast) atomic
load check and declare the whole thing as ATTRIBUTE_CONST so the
compiler will CSE calls to util_cpu_detect.

Co-authored-by: Alyssa Rosenzweig <alyssa@collabora.com>
Reviewed-by: Marek Olšák <maraeo@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15580>
2022-04-20 18:44:35 +00:00
Georg Lehmann d12b5e7633 aco: Reuse previous -1 result in find_msb to avoid using VOP3.
Totals:
CodeSize: 388934388 -> 388933712 (-0.00%)

Totals from 208 (0.15% of 134913) affected shaders:
CodeSize: 2008016 -> 2007340 (-0.03%)

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16011>
2022-04-19 15:18:58 +00:00
Georg Lehmann a8b29094c2 aco: Remove some old comments in aco_opcodes.py.
s_cmovk_i32 isn't GFX8_GFX9 only and s_version doesn't need a comment to say
it's GFX10+ exclusive. The encoding list is enough to provide this information,
as for other GFX10+ instructions.

Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16006>
2022-04-18 15:59:38 +00:00
Rhys Perry 63e40adf8c aco: fix disassembly of SMEM with both SGPR and constant offset
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15890>
2022-04-14 20:58:36 +00:00
Rhys Perry c883abda76 aco: implement load_shared2_amd/store_shared2_amd
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13778>
2022-04-13 23:08:07 +00:00
Rhys Perry 5aa5af7776 aco: handle read2st64/write2st64 in optimizer
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13778>
2022-04-13 23:08:07 +00:00
Rhys Perry 2135c88d9c aco: fix signedness of DS_instruction::offset0/1
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13778>
2022-04-13 23:08:07 +00:00
Daniel Schürmann d703a0e808 aco: remove register hints entirely
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann 2fe005a3fe aco: remove occurences of VCC hint
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann b10c4d7dee aco: make program->needs_vcc independent of VCC hints
Totals from 5 (0.00% of 135048) affected shaders: (GFX9)
SGPRs: 208 -> 160 (-23.08%)
CodeSize: 2700 -> 2692 (-0.30%)
Instrs: 533 -> 531 (-0.38%)
Latency: 41688 -> 41680 (-0.02%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann 415a3820fc aco/ra: omit VCC affinity on VOPC_SDWA for GFX9+
VOPC_SDWA can also use arbitrary SGPR pairs on GFX9+.

Totals from 5607 (4.16% of 134913) affected shaders: (GFX10.3)
CodeSize: 42470760 -> 42452988 (-0.04%)
Instrs: 7943174 -> 7942883 (-0.00%)
Latency: 102887029 -> 102886305 (-0.00%); split: -0.00%, +0.00%
InvThroughput: 20454456 -> 20454338 (-0.00%); split: -0.00%, +0.00%
Copies: 376818 -> 376865 (+0.01%); split: -0.00%, +0.01%

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann 6ebc61d71b aco/ra: create VCC-affinities during RA
instead of using register hints.

Totals from 88367 (65.50% of 134913) affected shaders: (GFX10.3)
CodeSize: 322492184 -> 322252912 (-0.07%); split: -0.08%, +0.01%
Instrs: 60615809 -> 60541260 (-0.12%); split: -0.12%, +0.00%
Latency: 557067980 -> 557009210 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 109676757 -> 109674804 (-0.00%); split: -0.00%, +0.00%
SClause: 1939703 -> 1939924 (+0.01%); split: -0.01%, +0.02%
Copies: 4557567 -> 4487530 (-1.54%); split: -1.54%, +0.00%
Branches: 1941123 -> 1937453 (-0.19%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Daniel Schürmann 44fb9ba84a aco/ra: only use VCC if program->needs_vcc == true
A future commit will make VCC register assignment independent
from register hints. Up to GFX9, VCC can alternatively be used
as regular SGPR, so prevent overlap.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15408>
2022-04-13 21:52:43 +00:00
Rhys Perry 7478b00c7c aco: remove old global access intrinsics
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry 9d1bab3615 aco: increase global_load_params.max_const_offset_plus_one
The callback now supports this. This shouldn't have any effect yet except
on GFX6 with 12 byte loads.

fossil-db (Pitcairn):
Totals from 246 (0.18% of 135668) affected shaders:
VGPRs: 14684 -> 14768 (+0.57%); split: -0.44%, +1.01%
CodeSize: 1765792 -> 1738040 (-1.57%)
Instrs: 344605 -> 340055 (-1.32%)
Latency: 4892904 -> 4861942 (-0.63%)
InvThroughput: 2479599 -> 2446070 (-1.35%)
VClause: 8782 -> 8735 (-0.54%)
SClause: 9854 -> 9853 (-0.01%)
Copies: 47327 -> 45401 (-4.07%); split: -4.08%, +0.01%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry 3e9517c757 aco: implement _amd global access intrinsics
fossil-db (Sienna Cichlid):
Totals from 7 (0.01% of 134621) affected shaders:
VGPRs: 760 -> 776 (+2.11%)
CodeSize: 222000 -> 222044 (+0.02%); split: -0.01%, +0.03%
Instrs: 40959 -> 40987 (+0.07%); split: -0.01%, +0.08%
Latency: 874811 -> 886609 (+1.35%); split: -0.00%, +1.35%
InvThroughput: 437405 -> 443303 (+1.35%); split: -0.00%, +1.35%
VClause: 1242 -> 1240 (-0.16%)
SClause: 1050 -> 1049 (-0.10%); split: -0.19%, +0.10%
Copies: 4953 -> 4973 (+0.40%); split: -0.04%, +0.44%
Branches: 1947 -> 1957 (+0.51%); split: -0.05%, +0.56%
PreVGPRs: 741 -> 747 (+0.81%)

fossil-db changes seem to be noise.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry 391bf3ea30 aco: don't expand smem/mubuf global loads
For example, dwordx3->dwordx4 or ubyte3->dwordx2.

Global loads don't have the bounds checking that buffer loads have that
makes this safe.

The alignment checks are added to global_load_callback() in case
byte_align_loads=false, align=1 and bytes_needed=3. Without them, the
callback will create a dword load.

fossil-db (Sienna Cichlid):
Totals from 267 (0.20% of 134621) affected shaders:
CodeSize: 1603352 -> 1606568 (+0.20%)
Instrs: 294946 -> 295482 (+0.18%); split: -0.00%, +0.18%
Latency: 2997003 -> 2997052 (+0.00%); split: -0.02%, +0.02%
InvThroughput: 526645 -> 526659 (+0.00%)
SClause: 9179 -> 9185 (+0.07%); split: -0.02%, +0.09%
Copies: 25363 -> 25375 (+0.05%); split: -0.08%, +0.13%
Branches: 8298 -> 8299 (+0.01%)

fossil-db (Polaris10):
Totals from 267 (0.20% of 135668) affected shaders:
CodeSize: 1636672 -> 1638756 (+0.13%); split: -0.00%, +0.13%
Instrs: 308484 -> 308733 (+0.08%); split: -0.01%, +0.09%
Latency: 3446045 -> 3446904 (+0.02%); split: -0.00%, +0.03%
InvThroughput: 1206722 -> 1206828 (+0.01%); split: -0.00%, +0.01%
SClause: 9308 -> 9311 (+0.03%); split: -0.08%, +0.11%
Copies: 36933 -> 36921 (-0.03%); split: -0.08%, +0.05%

fossil-db (Pitcairn):
Totals from 275 (0.20% of 135668) affected shaders:
SGPRs: 17616 -> 17520 (-0.54%); split: -0.64%, +0.09%
VGPRs: 15428 -> 15540 (+0.73%); split: -0.23%, +0.96%
CodeSize: 1885792 -> 1929120 (+2.30%); split: -0.00%, +2.30%
MaxWaves: 1284 -> 1285 (+0.08%)
Instrs: 368963 -> 376095 (+1.93%); split: -0.00%, +1.94%
Latency: 5122922 -> 5168398 (+0.89%); split: -0.01%, +0.90%
InvThroughput: 2562866 -> 2604279 (+1.62%)
VClause: 9268 -> 9296 (+0.30%); split: -0.13%, +0.43%
SClause: 10702 -> 10705 (+0.03%); split: -0.05%, +0.07%
Copies: 48620 -> 50629 (+4.13%); split: -0.08%, +4.21%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry 6baad09711 aco: use saddr for global access with sgpr address
fossil-db (Sienna Cichlid):
Totals from 38 (0.03% of 134621) affected shaders:
CodeSize: 237196 -> 237060 (-0.06%); split: -0.09%, +0.03%
Instrs: 43895 -> 43894 (-0.00%); split: -0.02%, +0.01%
Latency: 914633 -> 916263 (+0.18%); split: -0.01%, +0.19%
InvThroughput: 468215 -> 468971 (+0.16%); split: -0.02%, +0.18%
SClause: 1239 -> 1242 (+0.24%)
PreSGPRs: 997 -> 1003 (+0.60%)
PreVGPRs: 936 -> 923 (-1.39%); split: -1.50%, +0.11%

Regression seems to be RA noise, creating a waitcnt.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Rhys Perry d957730b9b aco: use vcc for 64-bit vgpr addition
fossil-db (Sienna Cichlid):
Totals from 229 (0.17% of 134621) affected shaders:
CodeSize: 1520192 -> 1517644 (-0.17%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14124>
2022-04-13 16:23:35 +00:00
Timur Kristóf b874a2ed61 aco: Fix VOP2 instruction format in visit_tex.
There was a v_or_b32 that accidentally used SOP2.
It should use VOP2.

Issue found by looking at a gfxreconstruct trace posted by a user
in this bug: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5838

Cc: mesa-stable
Fixes: 93c8ebfa78 "aco: Initial commit of independent AMD compiler"

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15923>
2022-04-13 13:06:49 +00:00
Rhys Perry 773c7cbcbc radv,aco: implement 64-bit inline push constants
fossil-db (Sienna Cichlid):
Totals from 21 (0.02% of 134621) affected shaders:
CodeSize: 1932 -> 1560 (-19.25%)
Instrs: 357 -> 303 (-15.13%)
Latency: 6576 -> 5883 (-10.54%)
InvThroughput: 26304 -> 23532 (-10.54%)
SClause: 42 -> 24 (-42.86%)
Copies: 90 -> 105 (+16.67%); split: -10.00%, +26.67%
PreSGPRs: 144 -> 201 (+39.58%)

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
2022-04-12 11:44:30 +00:00
Rhys Perry 7f6262bb85 radv: allow holes in inline push constants
Use a dword mask instead of a range to track which push constants to
inline.

fossil-db (Sienna Cichlid):
Totals from 5724 (4.25% of 134621) affected shaders:
CodeSize: 20894044 -> 20815748 (-0.37%); split: -0.39%, +0.02%
Instrs: 4002568 -> 3988385 (-0.35%); split: -0.38%, +0.02%
Latency: 29285060 -> 29224414 (-0.21%); split: -0.22%, +0.01%
InvThroughput: 5529700 -> 5526893 (-0.05%); split: -0.05%, +0.00%
VClause: 78093 -> 78240 (+0.19%); split: -0.23%, +0.41%
SClause: 135495 -> 131027 (-3.30%); split: -3.30%, +0.00%
Copies: 330856 -> 324552 (-1.91%); split: -2.37%, +0.46%
PreSGPRs: 226031 -> 224778 (-0.55%); split: -0.61%, +0.05%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12145>
2022-04-12 11:44:30 +00:00
Erik Faye-Lund 28dbabec8e aco: do not use designated initializers
Designated initializers are a C++20 feature, but we don't use C++20. In
fact, enabling C++20 for ACO triggers new compiler errors due to some
equality semantics details.

So let's instead stop using designated initializers here.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15706>
2022-04-05 16:58:56 +00:00
Rhys Perry 5b4e41e4db aco: don't use v_mad_mix on GFX9 if 16-bit denormals must be preserved
This probably effectively disables the v_mad_mix optimization on GFX9.

fossil-db (Vega):
Totals from 11545 (7.15% of 161366) affected shaders:
MaxWaves: 43025 -> 42780 (-0.57%); split: +0.06%, -0.63%
Instrs: 18571635 -> 18734201 (+0.88%); split: -0.00%, +0.88%
CodeSize: 96483568 -> 96611012 (+0.13%); split: -0.11%, +0.24%
SGPRs: 1079056 -> 1077616 (-0.13%); split: -0.14%, +0.01%
VGPRs: 819248 -> 821868 (+0.32%); split: -0.04%, +0.36%
SpillSGPRs: 13313 -> 12464 (-6.38%)
Latency: 293804093 -> 295046122 (+0.42%); split: -0.09%, +0.51%
InvThroughput: 110002239 -> 110994978 (+0.90%); split: -0.03%, +0.93%
VClause: 342458 -> 342596 (+0.04%); split: -0.12%, +0.16%
SClause: 648566 -> 648046 (-0.08%); split: -0.12%, +0.04%
Copies: 1728225 -> 1726679 (-0.09%); split: -0.66%, +0.57%
Branches: 552973 -> 552963 (-0.00%); split: -0.02%, +0.02%
PreSGPRs: 862360 -> 856820 (-0.64%); split: -0.69%, +0.05%
PreVGPRs: 773689 -> 776818 (+0.40%); split: -0.02%, +0.42%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6178
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15718>
2022-04-04 19:27:12 +00:00
Daniel Schürmann 6d383159d4 aco/optimizer: check recursively if we can eliminate s_and exec
Totals from 2860 (2.12% of 134913) affected shaders: (GFX10.3)
CodeSize: 5990728 -> 5979164 (-0.19%); split: -0.20%, +0.01%
Instrs: 1094562 -> 1091653 (-0.27%); split: -0.28%, +0.01%
Latency: 8689841 -> 8684523 (-0.06%); split: -0.07%, +0.00%
InvThroughput: 1840533 -> 1840527 (-0.00%); split: -0.00%, +0.00%
SClause: 51437 -> 51439 (+0.00%)
Copies: 82461 -> 82472 (+0.01%)
PreSGPRs: 83136 -> 83172 (+0.04%)

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15675>
2022-04-01 15:35:26 +02:00
Daniel Schürmann db8c401f71 aco/ra: fix stride check on subdword parallelcopies for create_vector
On GFX6/7, info.rc is in full dwords.

Fixes: 9476986e6f ('aco/ra: special-case get_reg_for_create_vector_copy()')
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15393>
2022-03-31 08:55:07 +00:00
Samuel Pitoiset d32656bc65 radv: lower has_multiview_view_index in NIR
This lowering is done in a new NIR pass where the layer is written
before emit_vertex_with_counter for geometry shaders and after the
position for other vertex stages.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15456>
2022-03-31 07:34:21 +00:00
Georg Lehmann 141ca78634 radv, aco: Packed iadd_sat/uadd_sat.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00
Georg Lehmann 50f585254c aco: Implement scalar iadd_sat.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00
Georg Lehmann 989f2b1785 aco: Implement 64bit uadd_sat.
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15421>
2022-03-28 20:02:52 +00:00