Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Update to the internal master as of 2018-11-15.
This has a lot of gratuitous whitespace change, but on the plus
side it's built using the same tooling that's used for AMDVLK,
which should help going forward.
Our choices here are simply redundant as long as sin.flags is set
correctly.
(v2:
- remove unused function parameter)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
We won't have a var to load from, so don't try to the processing
required if we don't need it.
This avoids crashes in:
dEQP-VK.spirv_assembly.instruction.compute.variable_pointers.compute.workgroup_two_buffers
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
For variable pointers we really don't want to case the pointers to int
without a good reason, just add a wrapper for bcsel loading and result
storing.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Fixes an assertion in SoTTR.
Fixes: dd0172e865 ("radv: Use structured intrinsics instead of indexing workaround for GFX9.")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This avoids syncing the Micro Engine. This is only supported
for VI+ currently. There is probably a way for using
LOAD_CONTEXT_REG on previous chips but that could be done later.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This implementation should work and potential bugs can be
fixed during the release candidates window anyway.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The comment was wrong, since the loop above casts to a type with the
correct bitsize already.
Fixes: 7e7ee82698 ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
And implement ac_bulid_expand_to_vec4() on top of it.
Fixes: 7e7ee82698 ("ac: add support for 16bit buffer loads")
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
LLVMInt*Type() return types from the global context and therefore are
not safe for use in other contexts. Use types from our own context
instead.
Fixes frequent crashes seen when doing multithreaded pipeline creation.
Fixes: 4d0b02bb5a "ac: add support for 16bit load_push_constant"
Fixes: 7e7ee82698 "ac: add support for 16bit buffer loads"
Cc: "18.2" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Alex Smith <asmith@feralinteractive.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
It's unnecessary because we can just check if the timestamp
is to different to the default value when a pool is created
or resetted. Instead of waiting for the availability bit to
be 1, we have to emit a not equal WAIT_REG_MEM for checking
if the timestamp is ready.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>