All the MI_SDI currently have forced write checks (meaning the command
streamer will stall until completion) on Gfx12.0+.
Now on Gfx12.0/12.5, the read commands have implicit waits on previous
writes (BSpec ). So if we're only dealing with CS writes & reads, we
don't need forced write checks.
In the few cases where CS is writing data for other bits of HW, we
need the forced write checks. This change adds an API that will let
the driver decide when to enable forced write checks.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29571>
With this change, the engine initialization batches are build and
submitted at vkCreateDevice() but the function doesn't wait for them
to complete. Instead we wait at vkDestroyDevice() or whenever another
submission happens on the queue, we check whether the initialization
batch has completed (without waiting) and free it if completed.
Seems to be about 25% reduction time of vkCreateDevice()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
The array pool does a single allocation and then splits it out. The
downside is that the pool is not lockless, but for border colors it
likely doesn't matter much as there is a max border colors for 4k.
Seems to be a 30% time reduction for vkCreateDevice()
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28975>
According to PRMs:
"All parameters are of type IEEE_Float, except those in the The ld*,
resinfo, and the offu, offv of the gather4_po[_c] instruction message
types, which are of type signed integer."
Currently, we load parameters with the correct types, but use them as send
sources with the default float type, which may confuse passes downstream.
Fix this by actually storing the retyped sources.
Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29581>
This AUX-TT is only updated on the CPU since ee6e2bc4a3 ("anv: Place
images into the aux-map when safe to do so"). So the only really
important invalidation that needs to happens is on the beginning of a
primary command buffer.
We are required to idle the pipes prior invalidation the AUX-TT. This
might not be happening when the invalidation is put at the beginning
of the secondary command buffers.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29671>
GFX versions older than GFX 20 have 'Thread Preemption disable' while
GFX 20 has 'Thread Preemption' with value flipped in compute walker
instruction.
So here by default enabling thread preemption, only disabling it
when BTD mode is enabled as instructed in Wa_14017794102.
Similar for 3DSTATE_BTD, enabling preemption by default and
only disabling when platform is affected by Wa_14017794102.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29616>
We were missing the following "newer" fields:
- ex_desc
- predicate_trivial
- sdepth
- rcount
- writes_accumulator
- no_dd_clear
- no_dd_check
- check_tdr
- send_is_volatile
- send_ex_desc_scratch
- send_ex_bso
- last_rt
- keep_payload_trailing_zeroes
- has_packed_lod_ai_src
We can actually just check ex_desc and the new "bits" union to handle
most of them with fewer checks.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
I want to be able to hash an fs_reg, including all the brw_reg fields.
It's easiest to do this if I can use the "bits" union field that
incorporates many of the other ones.
We also move the using declaration for "nr" down because that field was
moved to the second section a while back.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
Pass in the nir_src and check if it's constant, handling it via CPU-side
arithmetic instead of emitting instructions. While we can constant fold
these via our optimization passes, we have to do opt_algebraic to fold
the binary operation with constant sources into a MOV of an immediate,
then opt_copy_propagation to put it in the next expression, and so on,
until the entire expression is folded. This can take several iterations
of the optimization loop, which is inefficient.
For example, gfxbench5/aztec-ruins/normal/7 has load/store_scratch
intrinsics with constant sources, and this patch removes a number of
optimization passes according to INTEL_DEBUG=optimizer.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
Geometry shaders write outputs multiple times, with EmitVertex()
between them. The value of output variables becomes undefined after
calling EmitVertex(), so we don't need to preserve those. This lets
us recreate new registers after each EmitVertex(), assuming we aren't
in control flow, allowing them to have separate live ranges. It also
means that those registers are more likely to be written once, rather
than having multiple writes, which can make optimization easier.
This is pretty much a total hack, but it's helpful.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29624>
Prevents the next patch from causing the following assert failure:
Test case 'dEQP-VK.ycbcr.copy.g8_b8_r8_3plane_420_unorm.g8_b8_r8_3plane_444_unorm.linear_linear_disjoint'..
deqp-vk: ../../src/intel/vulkan/anv_private.h:4962: anv_aspect_to_plane: Assertion `!(aspect & ~all_aspects)' failed.
We still disable CCS for multiplane formats elsewhere. I've attempted
enabling CCS for those cases but end up with failures in CI that I
cannot reproduce locally. Hopefully this change gets the next person a
step closer towards enabling this feature.
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29094>