Commit Graph

1961 Commits

Author SHA1 Message Date
Samuel Pitoiset 0efbede949 ac: move push_constants to the ABI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-12 11:54:07 +01:00
Samuel Pitoiset 460d3ce726 ac: move tg_size to the ABI
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-12 11:54:04 +01:00
Samuel Pitoiset 054c92190c ac/nir: remove unused nir_to_llvm_context:{defs,phis}
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-12 11:54:02 +01:00
Timothy Arceri ef8082baf8 ac: convert nir_op_f2f32 src to a float
Fixes the following piglit test:

./bin/arb_vertex_attrib_64bit-check-explicit-location -auto -fbo

Where we would end up with the nir such as:

	vec1 64 ssa_11 = pack_64_2x32_split ssa_9, ssa_10
	vec1 32 ssa_12 = f2f32 ssa_2

And our pack_64_2x32_split nir to llvm code always produces
a 64bit integer as output.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-10 10:46:28 +11:00
Timothy Arceri 1b1e5f8edf ac: fix some 64bit unpack asserts
Previously the asserts did not take swizzles into account.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-10 10:46:28 +11:00
Samuel Pitoiset 3a2bb4db23 ac/nir: compute correct number of user SGPRs on GFX9
For merged shaders.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
2018-02-09 10:16:04 +01:00
Timothy Arceri c77078c942 ac: pass struct ac_llvm_context to emit_membar()
Fixes segfault in piglit test:

./bin/arb_shader_image_load_store-shader-mem-barrier --quick -auto -fbo

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-09 12:51:27 +11:00
Timothy Arceri 12a2350e6d ac: add 64bit support to ac_find_lsb()
v2: use LLVMBuildTrunc()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-09 09:42:59 +11:00
Timothy Arceri a9f6b392c7 ac: move get_elem_bits() to ac_llvm_build.c
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-09 09:42:59 +11:00
Timothy Arceri 19f9839f0b ac: add 64bit bitCount support
v2: use LLVMBuildTrunc()

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-09 09:42:59 +11:00
Samuel Pitoiset bb750d265c ac/nir: clean up handle_fs_outputs_post()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 22:14:33 +01:00
Samuel Pitoiset 528bc14fa5 ac/nir: add radv_load_output() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 22:14:30 +01:00
Samuel Pitoiset 834d9845ca ac/shader: scan info about output PS declarations
NIR->LLVM should only be a translation pass, and all scan stuff
should be done before.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 22:14:27 +01:00
Samuel Pitoiset a8e04e91de ac/nir: add radv_export_param() helper
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 22:14:26 +01:00
Samuel Pitoiset e3cfd6b805 ac/nir: remove set but unused export_mask
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 22:14:24 +01:00
Samuel Pitoiset 724136d590 ac/nir: remove dead code in handle_vs_outputs_post()
The memcpy can't be reached because the condition is always false.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 22:14:22 +01:00
Samuel Pitoiset c63d8d0284 ac/nir: remove useless check in si_llvm_init_export_args()
values can't be NULL because we use ac_build_export_null() now.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 22:14:20 +01:00
Samuel Pitoiset 26ab5a4269 ac/nir: use ac_build_export_null()
The number of enabled channels should be 0 when exporting null.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 22:11:44 +01:00
Samuel Pitoiset bd9f7b7635 ac: add ac_build_export_null() helper
Imported from RadeonSI.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-08 22:11:42 +01:00
Fredrik Höglund 5a38d8f103 radv: implement VK_EXT_external_memory_host
Ported from the radeonsi GL_AMD_pinned_memory implementation.

Signed-off-by: Fredrik Höglund <fredrik@kde.org>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-08 00:46:07 +01:00
Samuel Pitoiset 757d36ee70 ac/nir: use new pknorm_i16/u16 and pk_i16/u16 LLVM intrinsics
Ported from RadeonSI.

Only one F1 2017 shader is affected, code size decreased
from 532 to 488 on both Polaris10 and Vega10.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-07 12:42:13 +01:00
Samuel Pitoiset 2f54d7382d ac/nir: avoid loading unused VS input components
Polaris10:
Totals from affected shaders:
SGPRS: 122840 -> 120984 (-1.51 %)
VGPRS: 78812 -> 78440 (-0.47 %)
Spilled SGPRs: 177 -> 129 (-27.12 %)
Code Size: 2950028 -> 2941276 (-0.30 %) bytes
Max Waves: 17899 -> 17976 (0.43 %)

Vega10:
Totals from affected shaders:
SGPRS: 117144 -> 115776 (-1.17 %)
VGPRS: 77580 -> 77532 (-0.06 %)
Spilled SGPRs: 0 -> 152 (0.00 %)
Code Size: 3352656 -> 3347860 (-0.14 %) bytes
Max Waves: 19756 -> 19866 (0.56 %)

This increases SGPRs spilling a bit with Talos, but I have
some other ideas that might reduce it.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-07 12:42:09 +01:00
Samuel Pitoiset 1c57a6da5e ac/shader: scan vertex inputs usage mask
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-07 12:42:07 +01:00
Samuel Pitoiset 3488a3f033 radv: run nir_opt_shrink_load
LLVM can't shrink loads.

Polaris10:
Totals from affected shaders:
SGPRS: 62528 -> 59955 (-4.11 %)
VGPRS: 44708 -> 44616 (-0.21 %)
Spilled SGPRs: 16 -> 8 (-50.00 %)
Code Size: 1355504 -> 1355172 (-0.02 %) bytes
Max Waves: 11710 -> 11670 (-0.34 %)

Vega10:
Totals from affected shaders:
SGPRS: 51448 -> 50371 (-2.09 %)
VGPRS: 39140 -> 39048 (-0.24 %)
Spilled SGPRs: 16 -> 16 (0.00 %)
Code Size: 1307188 -> 1304296 (-0.22 %) bytes
Max Waves: 11312 -> 11292 (-0.18 %)

This reduces SGPRs spilling in MadMax, and it also reduces
number of SGPRs in DOW3 and F12017. The number of waves slightly
decreases in F1 but I don't see any performance changes after
benchmarking it. Talos and Serious Sam are not affected because
they don't use any push constants.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-06 23:08:44 +01:00
Timothy Arceri 9c52902c76 ac/radeonsi: add num_work_groups to the abi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Timothy Arceri f12e2f9c12 ac: implement nir_intrinsic_shader_clock
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Timothy Arceri b7b89bbddb ac/radeonsi: create ac_build_shader_clock() helper
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Timothy Arceri d116af383f ac/radeonsi: add load_local_group_size() to the abi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Timothy Arceri e3ebffdbb0 ac: don't call emit_outputs() for compute
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Timothy Arceri c8066cdfa7 ac/radeonsi: add local_invocation_ids to the abi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Timothy Arceri fa5239c153 ac/radeonsi: add workgroup_ids to the abi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-02-07 08:43:08 +11:00
Bas Nieuwenhuizen c7d640fbbf ac/nir: fix GS load input type.
Fixes: df1d5174fc "ac/nir: replace SI.buffer.load.dword with amdgcn.buffer.load"
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-06 21:52:38 +01:00
Dave Airlie e7e81f362d radv: don't support tc-compat on multisample d32s8 at all.
RX550 fails
dEQP-VK.renderpass.suballocation.multisample.d32_sfloat_s8_uint.samples_2

So increase the range of the workaround.

Fixes: f4c534ef6 (radv: don't enable tc compat for d32s8 + 4/8 samples (v1.1))

Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-02-06 19:56:00 +00:00
Samuel Pitoiset 0170ae1e23 ac/nir: remove emission of nir_op_fdiv
RadeonSI and RADV lower fdiv.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-05 23:09:34 +01:00
Samuel Pitoiset a1d568c830 ac/nir: fix a crash in load_gs_input() on pre-GFX9 chips
Fixes: df1d5174fc ("ac/nir: replace SI.buffer.load.dword with amdgcn.buffer.load")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-05 11:05:52 +01:00
Marek Olšák 3bf1e036e8 amd: remove support for LLVM 3.9
Only these are supported:
- LLVM 4.0
- LLVM 5.0
- LLVM 6.0
- master (7.0)

Reviewed-by: Dylan Baker <dylan@pnwbakers.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-02 23:47:40 +01:00
Marek Olšák 847d0a393d radeonsi: use pknorm_i16/u16 and pk_i16/u16 LLVM intrinsics
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-02 16:46:22 +01:00
Samuel Pitoiset df1d5174fc ac/nir: replace SI.buffer.load.dword with amdgcn.buffer.load
The old one generates useless instructions in there, found while
comparing geometry shaders between RadeonSI and RADV.

This improves all Vulkan demos that use geometry shaders, +4%
for deferredshadows, +9% for viewportarray, +7% for
geometryshader on Polaris10.

This seems to also improve DOW3 a little bit (+1%).

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by:  Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-02 12:32:21 +01:00
Bas Nieuwenhuizen 2ffe395cba radv: Don't expose VK_KHX_multiview on android.
deqp does not allow any KHX extensions, and since deqp is included
in android-cts, android does not allow any khx extensions.

So disable VK_KHX_multiview on android.

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
CC: 18.0 <mesa-stable@lists.freedesktop.org>
2018-02-01 23:32:48 +01:00
Marek Olšák b0a6053a99 ac/nir: use ac_build_buffer_load_format for image buffer loads
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-01 16:20:19 +01:00
Marek Olšák bac9fa9f17 ac: add glc parameter to ac_build_buffer_load_format
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-01 16:20:19 +01:00
Marek Olšák be973ed21f radeonsi: load the right number of components for VS inputs and TBOs
The supported counts are 1, 2, 4. (3=4)

The following snippet loads float, vec2, vec3, and vec4:

Before:
    buffer_load_format_x v9, v4, s[0:3], 0 idxen          ; E0002000 80000904
    buffer_load_format_xyzw v[0:3], v5, s[8:11], 0 idxen  ; E00C2000 80020005
    s_waitcnt vmcnt(0)                                    ; BF8C0F70
    buffer_load_format_xyzw v[2:5], v6, s[12:15], 0 idxen ; E00C2000 80030206
    s_waitcnt vmcnt(0)                                    ; BF8C0F70
    buffer_load_format_xyzw v[5:8], v7, s[4:7], 0 idxen   ; E00C2000 80010507

After:
    buffer_load_format_x v10, v4, s[0:3], 0 idxen         ; E0002000 80000A04
    buffer_load_format_xy v[8:9], v5, s[8:11], 0 idxen    ; E0042000 80020805
    buffer_load_format_xyzw v[0:3], v6, s[12:15], 0 idxen ; E00C2000 80030006
    s_waitcnt vmcnt(0)                                    ; BF8C0F70
    buffer_load_format_xyzw v[3:6], v7, s[4:7], 0 idxen   ; E00C2000 80010307

Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
2018-02-01 16:20:19 +01:00
Samuel Pitoiset 2ef5ce1198 radv: do not insert shaders in cache when it's disabled
When the application doesn't provide its own pipeline cache,
the driver uses a in-memory cache but it shouldn't insert any
entries when the cache is explicitely disabled by the user.

Found while running my experimental pipeline-db tool with a
ton of shaders, the memory footprint was just huge, and sometimes
the process was even killed...

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-01 09:40:11 +01:00
Samuel Pitoiset 4922e7f25c radv: use separate bindings for graphics and compute descriptors
The Vulkan spec says:

   "pipelineBindPoint is a VkPipelineBindPoint indicating whether
    the descriptors will be used by graphics pipelines or compute
    pipelines. There is a separate set of bind points for each of
    graphics and compute, so binding one does not disturb the other."

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104732
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-01 09:37:09 +01:00
Samuel Pitoiset cf224014dd radv: store the bind point when creating descriptors with templates
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-02-01 09:37:07 +01:00
Samuel Pitoiset a097a6f519 radv: do not dump meta shader stats
That's quite useless and that pollutes the output.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-31 14:10:26 +01:00
Samuel Pitoiset 26cc3e74b9 ac/nir: fix emission of ffract for 64-bit
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
2018-01-31 14:10:24 +01:00
Matthew Nicholls ef272b161e radv: remove predication on cache flushes
This can lead to a situation where cache flushes could get conditionally
disabled while still clearing the flush_bits, and thus flushes due to
application pipeline barriers may never get executed.

Fixes: a6c2001ace (radv: add support for cmd predication.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
2018-01-31 13:37:18 +10:00
Timothy Arceri d185190222 ac/radeonsi: add lookup_interp_param and load_sample_position to the abi
This will enable the interpolateAt builtins to work on the radeonsi
nir backend.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00
Timothy Arceri 97058168a4 radeonsi/nir: add prim_mask to the abi
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
2018-01-31 09:14:07 +11:00