Split anv_xe_wait_exec_queue_idle() into 2 functions, the first
function creates the syncobj and prepares it to be signaled when the
last workload in queue is completed.
And the second one that calls the first function, then waits for the
syncobj to be signaled and destroy the syncobj.
The main reason for that is that the first function can be reused in
Iris and a future patch will add another user, so lets share it.
No changes in behavior are expected here.
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30958>
We currently disable CCS if a 3D Ys/Yf surface uses miptails. However,
ISL generally configures surfaces to be compatible with compression. For
consistency, disable miptails on 3D Ys/Yf surfaces in order to allow
compression.
If drivers prefer to have a more compact layout, they can pass the
ISL_SURF_USAGE_DISABLE_AUX_BIT flag at surface creation time.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30081>
We currently disable CCS if a surface uses more than 11 slots in a
miptail. However, ISL generally configures surfaces to be compatible
with compression. For consistency, reduce the number of slots used in
miptails in order to allow compression.
If drivers prefer to have a more compact layout, they can pass the
ISL_SURF_USAGE_DISABLE_AUX_BIT flag at surface creation time.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30081>
Otherwise we can end up generating invalid assembly not following
destination/source alignments requirements.
Fixes the following tests:
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_4.tan_frag
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_2.tan_frag
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_1.tan_frag
dEQP-VK.spirv_assembly.instruction.graphics.float16.arithmetic_3.tan_frag
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Backport-to: 24.2
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31206>
The inner loop with p is dead, because n_passes_written is no longer
updated as of 56bd81ee21, so it is always
comparing a uint32_t < 0, which is never true. Since the inner loop is
dead code, the pass array is dead code, as it simply keeps writing to
element 0, and but never reads or uses it, along with all of the pass
count information.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31213>
It was always using device->context_id what is not valid in i915 when
has_vm_control is true or when running with Xe KMD.
But anv_AcquireProfilingLockKHR() don't have the queue information so
at least for now we will only support queries in a single queue.
And for consistency doing the same in
anv_QueueSetPerformanceConfigurationINTEL() although here we have the
queue parameter but queries are only supported in render engine
so it would only expose other queues if user set some parameters.
Cc: 24.2 <mesa-stable>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30652>
We have a couple of checks where we allow this to be NULL, but later we
unconditionally and unavoidably dereference the pointer, which means
there's no way that it ever could have been NULL. Change the assert at
the top to not allow NULL, and remove checks for it being NULL
CID: 1616544
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31156>
Our array has a fixed size of 32, and we know at the start of the block
that our type_count is < 32, but in the loop we grow the block, in
theory up to 31 times. Coverity notes that, and points out we could
write off the end of the array. Add an assert in the loop to ensure we
don't, and to help Coverity out.
CID: 1615171
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31173>
As coverity points out, if the second uint64_t was greater than the
first (I don't think it actually can be), then the overflow would result
in the check succeeding when it shouldn't. We could cast this to an
integer type, but since we have uint64_t, we'd need int128_t for that.
Instead, replace the comparison to 0 with a direct comparison, since
that would give the correct result without potential to overflow.
CID: 1604833
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31175>
In the error handling path we end up creating a vk_sync and then later
we vk_sync_wait() on it. If that wait fails somehow we'll end up calling
vk_queue_set_lost(&queue->vk, ...) which would segfault if queue is
NULL.
If we end up in this situation (no queue), return directly whatever the
backend's vm_bind function returned, propagating the error up if
necessary.
Fixes: dd5362c78a ("anv/xe: try harder when the vm_bind ioctl fails")
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31048>
We introduce a new fs_nir_emit_memory_access() helper that can handle
image, bindless image, SSBO, shared, global, and scratch memory, and
handles loads, stores, atomics, and block loads. It translates each
of these NIR intrinsics into the new MEMORY_*_LOGICAL intrinsics.
As a result, we delete a lot of similar surface access emitter code.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
This is more complicated. We map the MEMORY_*_LOGICAL opcodes to the
older HDC messages: typed and untyped surface read/write/atomic (whether
float or integer), DWord and Byte scattered messages, OWord block, and
both A64, BTI, and stateless messages.
- MEMORY_MODE_* is used to select stateless-scratch, typed, or untyped.
- MEMORY_FLAG_TRANSPOSE is used to select block access.
- MEMORY_BINDING_TYPE = FLAT and 64-bit address size selects A64.
- Alignment and data type size select between byte/dword scattered or
surface messages.
While we may not be able to handle the full generality of message
possibilities, we can handle everything we generate currently. The plan
here is to assert/validate that we don't generate MEMORY_*_LOGICAL ops
on HDC-based platforms which can't support those particular messages.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
The new MEMORY_*_LOGICAL intrinsics have a lot of control sources with
a bunch of LSC_* enums (opcode, memory type, address type, address and
data sizes), as well as flags, coordinate components vs. components...
they unfortunately are nigh-unreadable with the default printing since
there's just a string of unreadable UD immediates in some order.
To fix this, we add some basic pretty-printing. If a control source is
simply an enum whose value communicates the entire purpose, we print it.
If it has a numeric value (i.e. alignment, or data), we add a label.
For example:
memory_store(16) (null):UD store shared flat addr: %2:UD coord_comps:1u align:16u d32 comps:2u data0: %3:UD
memory_store(16) (null):UD store typed bti:%2+0.0<0>:UD addr: %3+0.0:D coord_comps:2u align:0u d32 comps:4u data0: %4:UD
This make them much easier to read.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
This is a new unified set of opcodes for memory access loosely patterned
after the new LSC-style data port messages introduced on Alchemist GPUs.
Rather than creating an opcode for every type of memory access, it has
only three opcodes: load, store, and atomic. It has various sources to
indicate the rest:
- Binding type (raw pointer, pointer to surface state, or BT index)
- Address size (A64, A32, A16)
- Data size (bit size, number of components)
- Opcode (atomic opcode, or LOAD/STORE vs. LOAD_CMASK/STORE_CMASK)
- Mode (typed vs. untyped vs. shared-local vs. scratch)
- Address (and its dimensionality)
- Data (0 for loads, 1 for stores, 2 for atomics)
- Whether we want block access
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
The intention of inst->is_partial_write() is that it should return true
when any REG_SIZE (32B) chunk of inst's destination is written but not
fully overwritten. This can be used to tell whether inst combines new
data with existing data, or screens off any previous writes, so the old
values are no longer required.
The existing (exec_size * brw_type_size_bytes(this->dst.type) < 32)
check doesn't work in a number of cases. For example, LSC block loads
have exec_size == 1 and force_writemask_all set, but may write multiple
full registers of data. (Currently, we only see them with exec_size 1
after logical-send-lowering, so our SHADER_OPCODE_SEND special case
was covering those.) We had also special cased UNDEF.
Instead, we can simply check:
1. Predication
2. !inst->dst.contiguous()
3. inst->dst.offset % REG_SIZE != 0
4. inst->size_written % REG_SIZE != 0
We had the first three already, but #4 is new. If either #3 or #4
are true, then that implies there is a REG_SIZE chunk of the destination
which is written, but not entirely written, so it's a partial write.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
The intention here is to detect ALU hardware instructions, but not
virtual instructions that haven't been explicitly whitelisted.
For some reason we had arbitrarily hardcoded 128 here, but our virtual
opcodes don't start at 128. They start at NUM_BRW_OPCODES. So, use
that instead.
This prevents regressions later when we delete some opcodes, shifting
some virtual opcodes into the 72-128 range.
Cc: mesa-stable
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30828>
Coverity alerts that the uint32_t pointer I was passing into
isl_color_value_pack() could possibly be used as an array. The value is
being used as such, but only the first element of that array should be
accessed. That's because the depth buffer formats I'm also passing into
the function only have a single channel, R. Nonetheless, let's update
the code to avoid the warning.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31123>
In c1a7d520f3, we disabled AUX usage for imported images when they are
using an explicit modifier that doesn't support it.
We need to do the same when the modifier is picked by the driver,
otherwise the memory requirements reported for an exported image don't
match those we report for import.
Fixes: c1a7d520f3 ("anv: Disable aux if the explicit modifier lacks it")
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31051>
Although we don't want to rely on hwconfig for devinfo->verx10 == 120,
due to the dependence on closed source software, we do check to see if
hwconfig reports different values in the DEVINFO_HWCONFIG macro.
Matt was seeing this warning on 8086:a7a0:
> MESA: warning: INTEL_HWCONFIG_TOTAL_PS_THREADS (128) != devinfo->max_threads_per_psd (64)
Reported-by: Matt Turner <mattst88@gmail.com>
Fixes: 3e4f73b3a0 ("intel/dev: Update hwconfig => max_threads_per_psd for Xe2")
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31077>