Samuel Pitoiset
e4c8491bdf
radv: implement VK_KHR_separate_depth_stencil_layouts
...
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-12-10 13:16:17 +01:00
Samuel Pitoiset
48ee62178f
radv: initialize HTILE for separate depth/stencil aspects
...
It either clears the whole HTILE buffer or part of it depending
on the HTILE mask parameter.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-12-10 13:09:29 +01:00
Samuel Pitoiset
41cebfc9c1
radv: do not init HTILE as compressed state when dst layout allows it
...
I don't think this makes much differences and a potential clear
following the initialization will overwrite HTILE anyways.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-12-10 13:09:26 +01:00
Samuel Pitoiset
b603cc8c84
radv: synchronize after performing a separate depth/stencil fast clears
...
For depth+stencil images, the driver might use an optimized path
if only one aspect is cleared. It either clears the depth or the
stencil part of HTILE. Because the two separate aspects might use
the same HTILE memory we have to synchronize.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-12-10 13:09:22 +01:00
Samuel Pitoiset
008fe909ca
radv: fix possibly wrong PA_SC_AA_CONFIG value for conservative rast
...
PA_SC_AA_CONFIG might be updated when conversative rasterization is
enabled. Because the driver only re-emits the multisample state if
the number of samples is different, that register value might not
be updated correctly.
Found by inspection, doesn't fix anything known.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-12-10 11:04:43 +01:00
Samuel Pitoiset
4f659224c8
radv: move emission of two PA_SC_* registers to the pipeline CS
...
They don't have to be updated dynamically.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-12-10 11:04:40 +01:00
Samuel Pitoiset
86dfe92bd0
radv: do not use VK_TRUE/VK_FALSE
...
For consistency.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl >
2019-12-09 09:21:26 +01:00
Daniel Schürmann
8259c97b2d
aco: propagate temporaries into expanded vectors
...
Gives a very slight decrease in code size:
Totals from affected shaders:
Code Size: 1708488 -> 1702768 (-0.33 %) bytes
Max Waves: 2858 -> 2855 (-0.10 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
df3e674fb3
aco: improve readfirstlane after uniform ssbo loads on GFX7
...
pipeline-db changes for GFX7:
80310 shaders in 40472 tests
Totals:
SGPRS: 3655900 -> 3643916 (-0.33 %)
VGPRS: 2678324 -> 2686324 (0.30 %)
Spilled SGPRs: 1730 -> 1634 (-5.55 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 15540 -> 15536 (-0.03 %) dwords per thread
Code Size: 136106120 -> 135457616 (-0.48 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601014 -> 600206 (-0.13 %)
Totals from affected shaders:
SGPRS: 307832 -> 295848 (-3.89 %)
VGPRS: 267864 -> 275864 (2.99 %)
Spilled SGPRs: 770 -> 674 (-12.47 %)
Spilled VGPRs: 14 -> 21 (50.00 %)
Scratch size: 16 -> 12 (-25.00 %) dwords per thread
Code Size: 22007488 -> 21358984 (-2.95 %) bytes
LDS: 65 -> 65 (0.00 %) blocks
Max Waves: 28668 -> 27860 (-2.82 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0837471463
aco: use soffset for MUBUF instructions on SI/CI
...
pipeline-db changes for GFX7:
80310 shaders in 40472 tests
Totals:
SGPRS: 3655300 -> 3655900 (0.02 %)
VGPRS: 2677732 -> 2678324 (0.02 %)
Spilled SGPRs: 1730 -> 1730 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 15540 -> 15540 (0.00 %) dwords per thread
Code Size: 136488364 -> 136106120 (-0.28 %) bytes
LDS: 1259 -> 1259 (0.00 %) blocks
Max Waves: 601039 -> 601014 (-0.00 %)
Totals from affected shaders:
SGPRS: 316312 -> 316912 (0.19 %)
VGPRS: 273844 -> 274436 (0.22 %)
Spilled SGPRs: 770 -> 770 (0.00 %)
Spilled VGPRs: 14 -> 14 (0.00 %)
Scratch size: 16 -> 16 (0.00 %) dwords per thread
Code Size: 22724904 -> 22342660 (-1.68 %) bytes
LDS: 114 -> 114 (0.00 %) blocks
Max Waves: 30861 -> 30836 (-0.08 %)
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
7b38d95b32
radv: Enable ACO on GFX7 (Sea Islands)
...
This patch also disables AMD_shader_ballot on GFX7 by default if ACO is used.
Note that shader_ballot works correctly, but performance seems inferior.
To enable shader_ballot use RADV_PERFTEST=shader_ballot.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
28c95cc402
aco: return to loop_active mask at continue_or_break blocks
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0f9447ccb0
radv: disable Youngblood app profile if ACO is used
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
746165e540
aco: implement exclusive scan for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
7ae227effd
aco: implement inclusive_scan for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
f895a8b1df
aco: implement (clustered) reductions for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
9254fb4fc7
aco: don't use a scalar temporary for reductions on GFX10
...
This patch also adds the scalar temporary for scans on SI/CI
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8ad43d8838
aco: flush denorms after fmin/fmax on pre-GFX9
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
21f67a3bdc
radv: only flush scalar cache for SSBO writes with ACO on GFX8+
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
79ce6c1b33
aco: disable disassembly for SI/CI due to lack of support by LLVM
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1c4afe38f2
aco: implement 64bit ine/ieq for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
1e1356b2ad
aco: implement 64bit i2b for SI /CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
da7ff58835
aco: make 1/2*PI a literal constant on SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
90fad7360d
aco: implement 64bit VGPR shifts for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6a586a6006
aco: split read/writelane opcode into VOP2/VOP3 version for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
23319add93
aco: fix disassembly of writelane instructions.
...
ACO writes an unused 3rd operand for internal usage
which makes LLVM recoginize it as illegal instruction.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
6fc9ddfef8
aco: recognize SI/CI SMRD hazards
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3eed4d2be5
aco: implement quad swizzles for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
bde9c1e3a1
aco: move buffer_store data to VGPR if needed
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
a8195bdf2e
aco: implement nir_op_isign on SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
b8783973cd
aco: only use scalar loads for readonly buffers on SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
f27783a667
aco: implement nir_op_fquantize2f16 for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
caea4bbfdc
aco: fix SMEM offsets for SI/CI
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
8aab92b393
aco: SI/CI - fix sampler aniso
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Dave Airlie
9b533a2ca3
aco: handle gfx7 int8/10 clamping on exports
...
Co-authored-by: Daniel Schürmann <daniel@schuermann.dev >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
0d42e4d7a0
aco: Initial GFX7 Support
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Daniel Schürmann
3177346bfc
aco: refactor visit_store_fs_output() to use the Builder
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
2019-12-07 11:23:11 +01:00
Timur Kristóf
637c5a1dd9
aco/wave32: Fix reductions.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
21db083504
aco/wave32: Allow setting the subgroup ballot size to 64-bit.
...
Previously, it would only work when the ballot size was set to the
lane mask. This patch makes is possible to set the ballot size
to either 32-bit or 64-bit for wave32 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
ed815d503e
aco/wave32: Use wave_size for barrier intrinsic.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
b8f2edb452
aco/wave32: Fix load_local_invocation_index to support wave32.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
e0bcefc3a0
aco/wave32: Use lane mask regclass for exec/vcc.
...
Currently all usages of exec and vcc are hardcoded to use s2 regclass.
This commit makes it possible to use s1 in wave32 mode and
s2 in wave64 mode.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
b4efe179ed
aco/wave32: Add wave size specific opcodes to aco_builder.
...
Several places in ACO we use SOP1 or SOP2 instructions to operate over the
exec mask or VCC, and these need to be adapted to the new size in wave32
mode.
This commit adds a way to deal with this problem in aco_builder: the caller
can specify a wave size specific opcode and the builder will translate that
to the correct opcode based on the current wave size.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
c44af6cbc7
aco/wave32: Introduce emit_mbcnt which takes wave size into account.
...
This is relevant because in wave32 mode the v_mbcnt_hi_u32_b32
instruction is superfluous.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
07754a9c9e
aco/wave32: Replace hardcoded numbers in spiller with wave size.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
c0dbf42a03
aco/wave32: Change uniform bool optimization to work with wave32.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
dd9dad731b
aco: Optimize load_subgroup_id to one bit field extract instruction.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
753670e902
aco: Remove lower_linear_bool_phi, it is not needed anymore.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
0d2d672020
aco: Remove superfluous argument from emit_boolean_logic.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00
Timur Kristóf
9a43d26b74
aco: Fix operand of s_bcnt1_i32_b64 in emit_boolean_reduce.
...
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
2019-12-04 10:36:01 +00:00