Timur Kristóf
e0bcefc3a0
aco/wave32: Use lane mask regclass for exec/vcc.
...
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Daniel Schürmann
8861a82be7
aco: don't split live-ranges of linear VGPRs
...
Fixes: 93c8ebfa78 'aco: Initial commit of independent AMD compiler'
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-11-29 21:54:27 +01:00
Daniel Schürmann
b6f5085dfe
aco: preserve kill flag on moved operands during RA
...
Fixes: 93c8ebfa78 aco: Initial commit of independent AMD compiler
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-11-12 15:59:48 +00:00
Daniel Schürmann
a2a6880743
aco: fix invalid access on Pseudo_instructions
...
Fixes: 93c8ebfa78 aco: Initial commit of independent AMD compiler
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-11-12 15:59:48 +00:00
Daniel Schürmann
8023dcd71e
aco: fix live-range splits of phis
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-10-30 19:48:33 +00:00
Timur Kristóf
c52ebbcea4
aco: Introduce vgpr_limit to keep track of available VGPRs.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-10-28 23:52:50 +00:00
Rhys Perry
08d510010b
aco: increase accuracy of SGPR limits
...
SGPRs are allocated in groups of 16 on GFX8/GFX9. GFX10 allocates a fixed
number of SGPRs and has 106 addressable SGPRs.
pipeline-db (Vega):
SGPRS: 5912 -> 6232 (5.41 %)
VGPRS: 1772 -> 1780 (0.45 %)
Spilled SGPRs: 0 -> 0 (0.00 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 88228 -> 87904 (-0.37 %) bytes
LDS: 0 -> 0 (0.00 %) blocks
Max Waves: 559 -> 571 (2.15 %)
piepline-db (Navi):
SGPRS: 341256 -> 363384 (6.48 %)
VGPRS: 171536 -> 170960 (-0.34 %)
Spilled SGPRs: 832 -> 581 (-30.17 %)
Spilled VGPRs: 0 -> 0 (0.00 %)
Private memory VGPRs: 0 -> 0 (0.00 %)
Scratch size: 0 -> 0 (0.00 %) dwords per thread
Code Size: 14207332 -> 14190872 (-0.12 %) bytes
LDS: 33 -> 33 (0.00 %) blocks
Max Waves: 18072 -> 18251 (0.99 %)
v2: unconditionally count vcc as an extra sgpr on GFX10+
v3: pass SGPRs rounded to 8
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-10-23 19:11:21 +01:00
Mauro Rossi
c24ad565ae
android: aco: fix undefined template 'std::__1::array' build errors
...
Fixes a few building errors similar to the following:
In file included from external/mesa/src/amd/compiler/aco_instruction_selection.cpp:26:
In file included from external/libcxx/include/algorithm:639:
external/libcxx/include/utility:321:9:
error: implicit instantiation of undefined template 'std::__1::array<aco::Temp, 4>'
_T2 second;
^
Fixes: 93c8ebf ("aco: Initial commit of independent AMD compiler")
Signed-off-by: Mauro Rossi <issor.oruam@gmail.com >
2019-09-28 15:56:23 +02:00
Daniel Schürmann
93c8ebfa78
aco: Initial commit of independent AMD compiler
...
ACO (short for AMD Compiler) is a new compiler backend with the goal to replace
LLVM for Radeon hardware for the RADV driver.
ACO currently supports only VS, PS and CS on VI and Vega.
There are some optimizations missing because of unmerged NIR changes
which may decrease performance.
Full commit history can be found at
https://github.com/daniel-schuermann/mesa/commits/backend
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev >
Co-authored-by: Rhys Perry <pendingchaos02@gmail.com >
Co-authored-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
Co-authored-by: Connor Abbott <cwabbott0@gmail.com >
Co-authored-by: Michael Schellenberger Costa <mschellenbergercosta@googlemail.com >
Co-authored-by: Timur Kristóf <timur.kristof@gmail.com >
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Acked-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-09-19 12:10:00 +02:00