If you're only affecting one or a couple of drivers, it would be nice if
your pipeline buttons on the web UI weren't full of manual run buttons for
all the other drivers.
This is a bunch of duplicated lines, but less than it could have been now
that we have !references.
In some of these cases (i915g, nouveau, etnaviv), we have no non-manual
jobs for those drivers, so I could have just rewritten the original
"driver-rules" to "driver-manual-rules". I decided to keep things
consistent between drivers, though, because this is all esoteric enough to
readers already without making different drivers' rules look different.
Fixes: #4891
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17445>
The ACP entries created by copy propagation to track the implied
copies of LOAD_PAYLOAD instructions don't model the behavior of
LOAD_PAYLOAD correctly, since (as of 41868bb682) header
moves are implicitly retyped to UD and the destination of non-header
copies implicitly uses the same type as the corresponding source, even
though the ACP entries created for such copies could incorrectly
represent a type conversion, which can lead to mis-optimization of the
program.
According to Marcin, this fixes the func.mesh.ext.workgroup_id.task.q0
crucible test.
Fixes: 41868bb682 ("i965/fs: Rework the fs_visitor LOAD_PAYLOAD instruction")
Reported-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Tested-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18980>
Here adding kmd_type parameter to
intel_gem_read_render_timestamp(), intel_gem_can_render_on_fd() and
intel_gem_supports_protected_context().
Those 3 functions will have Xe implementations, the other functions
in intel_gem.h will not be called by Xe code paths so not adding
kernel_driver_type to it.
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20773>
See the comments in emit_apply_pipe_flushes(). Flushing HDC is not
sufficient in GPGPU mode, and we need to set the untyped data port flush
bit as well.
Fixes many dEQP-VK failures with INTEL_COMPUTE_CLASS=1 on Alchemist.
Fixes: 1067ec90a5 ("anv: Update PIPELINE_CONTROL flush when switching pipeline mode in TGL+")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20774>
Sometimes you want to diff 2 runs with INTEL_DEBUG=bat, but a tiny
allocation change can mess quite badly with offsets printed in the
decoding, making it hard to look at the diff with meld.
Fortunately our decoder can avoid printing offsets. We just need a
variable to specify that.
We still use the defaults specified by the driver but you can turn
things on/off with :
INTEL_DECODE=+color,-offsets,-floats INTEL_DEBUG=bat ./my_app
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20874>
I stumbled on this when I inserted some suboptimal lowering code after all
optimizations. Adding certain subset of optimizations after my lowering code
actually avoided this bug, so I think it's not possible to hit this on upstream.
Let's fix this for the next person generating suboptimal code...
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20857>
Wa_14015360517 mentions situations where HW produces invalid
occlusion query results when "Pixel Shader Does not write to RT"
bit is set.
"When Pixel Shader Kills Pixel is set, SW must perform a dummy render
target write from the shader and not set this bit, so that Occlusion
Query is correct."
Another situation is when writing to UAV or to NULL render target.
Patch sets field as 'must be zero' to discourage possible use of it.
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20849>
I've been running into failures with tests like :
dEQP-VK.robustness.robustness2.bind.notemplate.rgba32i.unroll.nonvolatile.uniform_buffer_dynamic.no_fmt_qual.len_4.samples_1.1d.frag
With the load_global_const_block_intel NIR intrinsic, you can load a
vec8/vec16 with a predicate. The predicate is correctly uniformized to
feed into the SEND instruction's flag register.
The problem is that a series of optimization first remove the
find_live_channel and then changes the broadcast into a simple MOV
instruction, on the assumption that the first channel is always active
if there is not control flow. This is correct.
But after that the cmod optimzation will remove this instruction :
mov.nz.f0.0(16) null:D, vgrf16+0.0<0>:D NoMask
because it seems to be equivalent to :
cmp.g.f0.0(16) vgrf16:D, vgrf12:D, 63d
In this case vgrf16 is the predicate to the load block SEND
instruction. Since the execution mask is different between both, some
of the channels of the SEND instruction end up not being loaded or
loaded with the wrong predication and we end up with incorrect UBO
data.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20852>
There was a VUID-VkImageViewCreateInfo-image-04739 in the Vulkan 1.3
spec that said:
If image was created with the
VK_IMAGE_CREATE_BLOCK_TEXEL_VIEW_COMPATIBLE_BIT flag and format is a
non-compressed format, viewType must not be VK_IMAGE_VIEW_TYPE_3D
That VUID has since been removed, and when a view of a 3D image is
created, with put the depth into the array_len, so it won't be always 1.
Reviewed-by: Mark Janes <markjanes@swizzler.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20803>
These are handled identically in almost all cases. There is one place
in the legacy surface lowering that was obtaining the bitsize from the
opcode, but the LSC-based lowering uses (type_sz(inst->dst.type) * 8)
for that and works just fine. If we just do that in the legacy lowering
too, then we don't need this plethora of opcodes.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
These are basically identical save for:
- shared has surface hardcoded to SLM rather than an SSBO index
- shared has to handle adding the 'base' const_index (SSBO have none)
- the NIR source index for data is shifted by one
It's not worth copy and pasting the entire function for this.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
These are now basically identical to their non-float counterparts. The
only thing that differed was the opcode checking to determine which
operands existed. Now that we have a unified opcode enum and a helper
for the number of data operands, we can just use that.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
The only reason for the separate opcode was because of the overlapping
BRW_AOP_* enums, making it impossible to tell whether a particular AOP
was the integer or float operation. Now that we use the lsc_opcode
enums, we can just have the legacy lowering inspect the opcode and
select the right descriptor. No need for a separate opcode.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
This gets our logical atomic messages using the lsc_opcode enum rather
than the legacy BRW_AOP_* defines. We have to translate one way or
another, and using the modern set makes sense going forward.
One advantage is that the lsc_opcode encoding has opcodes for both
integer and floating point atomics in the same enum, whereas the legacy
encoding used overlapping values (BRW_AOP_AND == 1 == BRW_AOP_FMAX),
which made it impossible to handle both sensibly in common code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20604>
This avoids a violation of the Vulkan memory model that was leading to
intermittent failures of at least 8k test-cases of the Vulkan CTS
(within the group dEQP-VK.memory_model.*) on TGL and DG2 platforms.
In theory the issue may be reproducible on earlier platforms like IVB
and ICL, but the SYNC.ALLWR instruction is not available on those
platforms so a different (likely costlier) fix will be needed.
The issue occurs within the sequence we emit for a NIR memory barrier
with acquire semantics requiring the synchronization of multiple
caches, e.g. in pseudocode for a barrier involving the TGM and UGM
caches on DG2:
x <- load.ugm // Atomic read sequenced-before the barrier
y <- fence.ugm
z <- fence.tgm
wait(y, z)
w <- load.tgm // Read sequenced-after the barrier
In the example we must provide the guarantee that the memory load for
x is completed before the one for w, however this ordering can be
reversed with the intervention of a concurrent thread, since the UGM
fence will block on the prior UGM load and potentially take a long
time, while the TGM fence may complete and invalidate the TGM cache
immediately, so a concurrent thread could pollute the TGM cache with
stale contents for the w location *before* the UGM load has completed,
leading to an inversion of the expected memory ordering.
v2: Apply the workaround regardless of whether the NIR barrier
intrinsic specifies multiple storage classes or a single one,
since an acquire barrier is required to order subsequent requests
relative to previous atomic requests of unknown storage class not
necessarily specified by the memory scope information of the
intrinsic.
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20690>
Three reasons for that:
0. The operation we're doing here is actually a reallocation.
1. The newer code is, IMHO, easier to read.
2. Realloc has this property where sometimes, when possible, it will
expand your array without moving it somewhere else, so it doesn't
need to copy the memory contents, returning the original pointer
back to you. I did some analysis and while that case is not common,
it does happen sometimes in real world applications (I could see it
happening in Shootergame and Aztec Ruins, but not Dota 2), so we're
able to save a few CPU cycles.
v2: Rebase.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20703>
This is the only code path where we don't run anv_execbuf_finish() in
case anv_execbuf_add_bo() fails. While there is not a bug in the
current tree, I recently made an (uncommitted) modification that
started leaking memory and made me realize the lack of cleanup here.
If we had anv_execbuf_finish() being called upon error like we're
going to have after this patch my modification wouldn't have caused
the memory leak.
I think it's much safer and future-proof if we're able to operate
under the assumption that whatever is allocated and set to anv_execbuf
will be dealt with upon failure of anything else related to it, so
functions that fail should only be required to free pointers not yet
assigned to anv_execbuf.
The dEQP-VK 'alloc_callback_fail' tests should exercise this code
path. The one I was specifically using here is:
dEQP-VK.api.object_management.alloc_callback_fail.device_group
v2: Rebase.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20703>
Because anv_execbuf_add_bo_bitset() calls anv_execbuf_add_bo(), which
can fail if its memory allocations fail.
I have seen dEQP tests exercising memory allocation failures during
anv_execbuf_add_bo(), but I don't think the path coming from
add_bo_biset() was specifically exercised. Anyway, add the error check
just in case.
v2: Rebase.
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20703>
In anv_execbuf_add_syncobj(), we try to not create or use
exec->syncobj_values if we don't need to. But when we figure we're
going to need it (i.e., when timeline_value is not zero), then we
create exec->syncobj_values with vk_zalloc, which means every previous
value is set to zero, as it should be. This is all correct.
The problem starts when we add a 16th element. In this case we double
exec->syncobj_array_length and realloc the buffer by using vk_alloc
and copying the old array to the new one. After that, we write the
timeline_value to the array only if it's not zero, and that's the
problem: since we just used vkalloc and memcpy, we don't have any
guarantees that the new array will be zero after the 16th element, and
if timeline_value is zero we write nothing to that position.
Once we start using exec->syncobj_values we have to commit to using
it, so the "if (timeline_value)" check near the end of the function
has to be changed to "if (exec->syncobj_values)", so we actually set
elements after the 16th to zero when they need to be zero. Another
approach to fix this would be to memset the new elements once we
double syncobj_array_length.
In practice, I couldn't find any application or deqp test that used
more than 3 elements in exec->syncobj_array_length, and we need more
than 16 elements in order to be able to reproduce the bug, so I'm not
aware of any real-world bug that goes away with this patch. This issue
was found while reading code.
If we craft a little Vulkan program that submits a ton of timeline and
binary semaphores on vkQueueSubmit, then waits for them, we get the
following error without this patch:
MESA: error: ../../src/intel/vulkan/anv_batch_chain.c:1910: execbuf2 failed: Invalid argument (VK_ERROR_DEVICE_LOST)
v2: Rebase.
Cc: mesa-stable
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20703>