in the case where a draw is triggered after a flush, zink_update_descriptor_refs
will be called to set batch tracking for descriptors. this function also
handles refs for fb attachments, and everything is usually fine there
the problem with this approach is that tracking is no longer set on view
objects at renderpass begin, which makes them susceptible to early deletion
if a rp isn't started from a draw call
instead, apply batch tracking to fb attachment resources on renderpass
begin if the BATCH_CHANGED flag is set (need to rename this at some point)
in order to guarantee that the resource (object) lifetime will match the
cmdbuf runtime [since imageviews are now only freed upon batch completion]
fixes#9059
Fixes: f6bbd7875a ("zink: remove batch tracking/usage from view types"
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23132>
Yuzu is running into a segfault because it writes the push descriptor
twice with 2 different layouts, but without a draw/dispatch in
between.
First vkCmdPushDescriptorSetKHR() writes descriptor 0 & 1 with a
uniform buffer. We toggle the 2 first bits of
anv_descriptor_set::generate_surface_states.
Second vkCmdPushDescriptorSetKHR() writes descriptor 0 with uniform
buffer and descriptor 1 with an image view. The first bit of
anv_descriptor_set::generate_surface_states stays, but the second bit
was already set before and it should now be off.
When we finally flush the push descriptor, we try to generate a
surface state for descriptor 1, but there is no valid buffer view for
it, we access an invalid pointer and segfault.
This fix resets the anv_descriptor_set::generate_surface_states when
the descriptor layout changes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: b49b18f0b7 ("anv: reduce BT emissions & surface state writes with push descriptors")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23156>
We have an optimization for non-uniform if/else where if all channels meet the
jump condition we emit a branch to jump straight to the ELSE block. Similarly,
if at the end of the THEN block we don't have any channels that would execute
the ELSE block, we emit a branch to jump straight to the AFTER block.
This optimization has a cost though: we need to emit the condition for the
branch and a branch instruction (which also comes with a 3 delay slot), so for
very small blocks (just a couple of ALU for example) emitting the branch
instruction is typically worse. Futher, if the condition for the branch is not
met, we still pay the cost for no benefit at all.
Here is an example:
nop ; fmul.ifa rf26, 0x3e800000, rf54
xor.pushz -, rf52, 2 ; nop
bu.alla 32, r:unif (0x00000000 / 0.000000)
nop ; nop
nop ; nop
nop ; nop
xor.pushz -, rf52, 3 ; nop
nop ; mov.ifa rf52, 0
nop ; mov.pushz -, rf52
nop ; mov.ifa rf26, 0x3f800000
The bu instruction here is setup to jump over the following 4 instructions
(the last 4 instructions in there). To do this, we pay the price of the xor
to generate the condition, the bu instruction, and the 3 delay slots right
after it, so we end up paying 6 instructions to skip over 4 which we pay
always, even if the branch is not taken and we still have to execute those
4 instructions. With this change, we produce:
nop ; fmul.ifa rf56, 0x3e800000, rf28
xor.pushz -, rf9, 3 ; nop
nop ; mov.ifa rf9, 0
nop ; mov.pushz -, rf9
nop ; mov.ifa rf56, 0x3f800000
Now we don't try to skip the small block, ever. At worse, if all channels
would have met the branch condition, we only pay the cost of the 4
instructions instead of 6, at best, if any channel wouldn't take the
branch, we save ourselves 5 cycles for the branch condition, the branch
instruction and its 3 delay slots.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23161>
This shouldn't have been enabled at all. Depth-stencil formats were
accidentally disabled but not depth-only or stencil-only formats.
This doesn't seem allowed by DX12 and both AMD/NVIDIA don't enable it.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23122>
Super sampling on a 4K screen could hit this. 16k seems pretty big
but this image is only created on RDNA2 and on-demand if VRS attachments
are used without depth-stencil attachments, which should be rare
enough to care.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23105>
The idea of this pass is to promote small bit-sizes to larger, supported
bit-sizes for certain operations. It doesn't handle emulating large
bit-size operations on smaller bit-sizes; passes like nir_lower_int64
and nir_lower_doubles handle that.
So, assert that we aren't shrinking the bit-size, as this will almost
certainly produce incorrect results.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23123>
We only support 32-bit versions of ufind_msb, find_lsb, and bit_count,
so we need to lower them via nir_lower_int64.
Previously, we were failing to do so on platforms older than Icelake
and let those operations fall through to nir_lower_bit_size, which
used a callback to determine it should lower them for bit_size != 32.
However, that pass only emulates small bit-size operations by promoting
them to supported, larger bit-sizes (i.e. 16-bit using 32-bit). It
doesn't support emulating larger operations (i.e. 64-bit using 32-bit).
So nir_lower_bit_size would just u2u32 the 64-bit source, causing us to
flat ignore half of the bits.
Commit 78a195f252 (intel/compiler: Postpone most int64 lowering to
brw_postprocess_nir) provoked this bug on Icelake and later as well,
by moving the nir_lower_int64 handling for ufind_msb until late in
compilation, allowing it to reach nir_lower_bit_size which broke it.
To fix this, we always set int64 lowering for these opcodes, and also
correct the nir_lower_bit_size callback to ignore 64-bit operations.
Cc: mesa-stable
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23123>
Using nir_gather_ssa_types works much better. There's 2 differences
compared to what I was doing before:
1. Multiple passes to allow data to propagate forward and backward
through the whole shader.
2. Allowing a value to have indeterminate types due to having both
int and float usages.
So this deletes some code and gets better results. Wish I'd known
this existed last week.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23062>
In i915 if the device has local memory it can only mmap bo with
I915_MMAP_OFFSET_FIXED, so all this set of ANV_BO_ALLOC_WRITE_COMBINE
were useless.
In Xe KMD there is no way to change mmap mode for all GPUs types.
So we can nuke bo->map_wc, ANV_BO_ALLOC_WRITE_COMBINE and related
dead code.
No changes in behavior expected here.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22483>
VK_MEMORY_PROPERTY_DEVICE_LOCAL_BIT is also set in all memory types of
integrated GPUs.
This flag means that memory will be allocated in the most efficient
place for the GPU to access, which is true in integrated GPUs.
However, this was causing ANV_BO_ALLOC_WRITE_COMBINE to be set in
integrated GPUs in the block right below when allocating in the non-cached memory type.
But the comment only talks about lmem, so to still keep the write
combine behavior for iGPUs it was used VkMemoryPropertyFlags in mmap_calc_flags().
Additionally, this was causing anv_bo.has_implicit_ccs to always be
set, which could change the expected behavior of
anv_BindImageMemory2() in MTL.
Fixes: fbd32a04da ("anv: add a third memory type for LLC configuration") added a new heap
Fixes: 582bf4d9f7 ("anv: flag BO for write combine when CPU visible and potentially in lmem")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/22483>