Commit Graph

7 Commits

Author SHA1 Message Date
Georg Lehmann
ad9c340d86 aco: insert VALU s_delay_alu for WMMA
This should avoid some SIMD stalls.

I think this special case was added to try to handle this case:

First Instruction: WMMA
Second Instruction: WMMA instruction with same VGPR of previous WMMA instruction’s Matrix D as Matrix C
Stall if the first and second instruction are not the same type of WMMA or use ABS/NEG on SRC2 of the second instruction

If I read it correctly, we shouldn't need a delay if the type is the same and no
modifier is used. That's kind of complex to handle, so leave it for now.
Not inserting any delays likely hurts more than this.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36328>
2025-07-29 05:48:29 +00:00
Georg Lehmann
ec11cfc69d aco/insert_delay_alu: do not delay lane mask fast forwarding
The delay actually hurts performance in this case.

Foz-DB Navi31:
Totals from 30340 (38.21% of 79395) affected shaders:
Instrs: 30778999 -> 30726605 (-0.17%); split: -0.17%, +0.00%
CodeSize: 162380180 -> 162170808 (-0.13%); split: -0.13%, +0.00%
Latency: 228185562 -> 228186976 (+0.00%); split: -0.00%, +0.00%
InvThroughput: 39001151 -> 39000897 (-0.00%); split: -0.00%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31132>
2024-10-17 11:16:16 +00:00
Georg Lehmann
e4889fd4b5 aco/insert_delay_alu: consider more implicit waits
Foz-DB Navi31:
Totals from 37961 (47.81% of 79395) affected shaders:
Instrs: 34175286 -> 33978599 (-0.58%)
CodeSize: 180059352 -> 179190076 (-0.48%); split: -0.48%, +0.00%
Latency: 259826196 -> 259798474 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 42792700 -> 42789298 (-0.01%); split: -0.01%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31132>
2024-10-17 11:16:16 +00:00
Georg Lehmann
840b5841d3 aco: do not track ALU delay across jumps
This assumes that the best case jump latency is higher than the worst case
ALU latency.

Foz-DB Navi31:
Totals from 17720 (22.32% of 79395) affected shaders:
Instrs: 26009663 -> 25929989 (-0.31%); split: -0.31%, +0.00%
CodeSize: 136571496 -> 136254420 (-0.23%); split: -0.23%, +0.00%
Latency: 215731308 -> 215722059 (-0.00%); split: -0.01%, +0.00%
InvThroughput: 36534197 -> 36532070 (-0.01%); split: -0.01%, +0.00%

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31132>
2024-10-17 11:16:16 +00:00
Georg Lehmann
977f435f4c aco/ir: add function to parse depctr waits
No Foz-DB changes on Navi31.

Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31132>
2024-10-17 11:16:16 +00:00
Rhys Perry
7b92e11e16 aco: forget valu delays after certain s_waitcnt_depctr/LDSDIR
fossil-db (navi31):
Totals from 55242 (69.58% of 79395) affected shaders:
Instrs: 40507666 -> 40138006 (-0.91%); split: -0.91%, +0.00%
CodeSize: 212516104 -> 211025880 (-0.70%); split: -0.70%, +0.00%
Latency: 281643258 -> 281628053 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 46370668 -> 46369637 (-0.00%); split: -0.00%, +0.00%

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23337>
2024-08-22 13:57:01 +00:00
Rhys Perry
807651561e aco: split insert_wait_states into two
No fossil-db changes.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Acked-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23337>
2024-08-22 13:57:00 +00:00