Order-aware scan/reduce can trade-off LDS traffic for external atomics
memory traffic in producer/consumer compute shaders.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
On some GPUs, especially older Intel GPUs, some math instructions are
very expensive. On those architectures, don't reduce flow control to a
csel if one of the branches contains one of these expensive math
instructions.
This prevents a bunch of cycle count regressions on pre-Gen6 platforms
with a later patch (intel/compiler: More peephole select for pre-Gen6).
v2: Remove stray #if block. Noticed by Thomas.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Thomas Helland <thomashelland90@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
That flow control may be trying to avoid invalid loads. On at least
some platforms, those loads can also be expensive.
No shader-db changes on any Intel platform (even with the later patch
"intel/compiler: More peephole select").
v2: Add a 'indirect_load_ok' flag to nir_opt_peephole_select. Suggested
by Rob. See also the big comment in src/intel/compiler/brw_nir.c.
v3: Use nir_deref_instr_has_indirect instead of deref_has_indirect (from
nir_lower_io_arrays_to_elements.c).
v4: Fix inverted condition in brw_nir.c. Noticed by Lionel.
Signed-off-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We were not using the view mask for depth clears, causing only the
first view to be cleared.
Fixes: 2e86f6b259 "radv: Add multiview clears."
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
The switch directly after the check has a default case that returns
NULL too, so the effective return value is not changed. Also this
check is wrong once we start dealing with formats introduced by an
extension (e.g. YUV formats).
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
I thought the value was correctly propagated, but actually not.
Fixes: 2ac6d55f38 ("radv: bump reported version to 1.1.90")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This is a squash of a bunch of individual changes:
nir/builder: Generate 32-bit bool opcodes transparently
nir/algebraic: Remap Boolean opcodes to the 32-bit variant
Use 32-bit opcodes in the NIR producers and optimizations
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Use 32-bit opcodes in the NIR back-ends
Generated with a little hand-editing and the following sed commands:
sed -i 's/nir_op_ball_fequal/nir_op_b32all_fequal/g' **/*.c
sed -i 's/nir_op_bany_fnequal/nir_op_b32any_fnequal/g' **/*.c
sed -i 's/nir_op_ball_iequal/nir_op_b32all_iequal/g' **/*.c
sed -i 's/nir_op_bany_inequal/nir_op_b32any_inequal/g' **/*.c
sed -i 's/nir_op_\([fiu]lt\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ge\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]ne\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fiu]eq\)/nir_op_\132/g' **/*.c
sed -i 's/nir_op_\([fi]\)ne32g/nir_op_\1neg/g' **/*.c
sed -i 's/nir_op_bcsel/nir_op_b32csel/g' **/*.c
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Tested-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If we gave this function 0 counter buffers, we'd still try and
access pCounterBuffers[0] as this check was incorrect.
Fixes crash with ext_transform_feedback-pipeline-basic-primgen
on zink on radv.
Fixes: 677b496b6 (radv: fix begin/end transform feedback with 0 counter buffers.)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When hile_size is 0, we can't enable HTILE. This doesn't change
anything, except not calling radv_image_alloc_htile().
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Feral games aren't affected because they don't decompress DCC.
F1 2018 has one DCC decompression per frame, but I don't see
any performance improvements. This new predicate will be
probably more useful for DCC/MSAA.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
After going through the spec changelog, it looks like RADV
is up to date. Note that ANV also reports 1.1.90.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
The Vulkan working group recently discovered that we made a mistake in
assuming that PCI domains are 16-bit even though they can potentially be
32-bit values. To fix this, the next spec update will change the types
in the VK_EXT_pci_bus_info struct to be 32 bits which will be a
backwards-incompatible change. Normally, Khronos tries very hard to
never make backwards incompatible changes to specs. Hopefully, the
extension is new enough (2 months) that there are no shipping apps which
use the extension so this should be safe.
This commit disables the extension for both anv and radv in mesa and
should be back-ported to 18.3 ASAP so we avoid any potential issues with
new apps running on old drivers. I'll send out a commit (which we can
also back-port to 18.3 if we really care) to re-enable the extension in
both drivers once this week's spec update ships. The one known use of
this extension is internal to mesa and will continue working with the
extension disabled and will naturally update when we get a new header.
Cc: "18.3" <mesa-stable@lists.freedesktop.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Instead of a single i2b and b2i, we now have i2b32 and b2iN where N is
one if 8, 16, 32, or 64. This leads to having a few more opcodes but
now everything is consistent and booleans aren't a weird special case
anymore.
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>
Nothing to do, the compiler already handles that.
All new dEQP.VK.ubo.* and dEQP.VK.ssbo.* pass, except some
16-bit tests that are quite related to fdo bug #108114.
Only enable the extension on CIK+ because it might not work on SI.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
In case we are unlucky if the low part is 0xffffffff.
Fixes: 5d6a560a29 ("radv: do not use the availability bit for timestamp queries")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
If the driver used a compute shader for resetting a query pool,
it should be completed when caches are flushed.
This might reduce the number of stalls if operations are done
between vkCmdResetQueryPool() and vkCmdBeginQuery()
(or vkCmdWriteTimestamp()).
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Alex Smith <asmith@feralinteractive.com>
After investigating on this, it appears that COND_WRITE doesn't
work correctly in some situations. I don't know exactly why does
it fail to update DB_Z_INFO.ZRANGE_PRECISION, but as AMDVLK
also uses COND_EXEC I think there is a reason.
Now the driver stores a new metadata value in order to reflect
the last fast depth clear state. If a TC-compat HTILE is fast cleared
with 0.0f, we have to update ZRANGE_PRECISION to 0 in order to
work around that hardware bug.
This fixes rendering issues with The Forest and DXVK and doesn't
seem to introduce any regressions.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108914
Fixes: 68dead112e ("radv: update the ZRANGE_PRECISION value for the TC-compat bug")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This fixes some crucible 3d miptree tests I've been working on
when executed using the compute shader path.
Fixes: d08f267814 (radv/gfx9: fix 3d image to image transfers on compute queues.)
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
These days we don't always allocate scanout compatible textures anymore.
That does mean we have to fix the radv android WSI though.
Fixes: b1444c9ccb "radv: Implement VK_ANDROID_native_buffer."
Acked-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Update to the internal master as of 2018-11-15.
This has a lot of gratuitous whitespace change, but on the plus
side it's built using the same tooling that's used for AMDVLK,
which should help going forward.
Our choices here are simply redundant as long as sin.flags is set
correctly.
(v2:
- remove unused function parameter)
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
If all layers are bound we can perform a fast color or depth clear
instead of iterating over all layers. This has the advantage
to avoid trashing the framebuffer for nothing if you we end up by
doing a fast clear when calling radv_clear_image_layer(), and
clearing all layers in one shot is obviously faster.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>