With this patch, VM binds remain synchronous in relation to vm_bind()
KMD backend calls. However, the syscalls required for VM bind is
reduce in 2(in the optimal cases), the syncobj create and destroy
syscall are replaced by he usage a timeline syncobj.
Next step will be make this completely asynchronous.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26805>
The extra assertions are just there to help validate
pack_lod_and_array_index (in nir_lower_tex.c).
v2: Split got_lod_or_bias into two variables. This simplifies some
changes that Sagar is working on. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Note: a future commit will expand the sampler message type to the 6 bits
used on Xe2.
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3: Drop XE2_SAMPLER_MESSAGE_SAMPLE_BIAS_MLOD as it does not actually
exist. This resulted in some bigger changes in brw_disasm.c. Noticed
by Sagar.
v4: Now that XE2_SAMPLER_MESSAGE_SAMPLE_MLODc conflicts with
GFX7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C, the determination of
min_lod_is_first must include devinfo->ver or previous platforms will
break.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
Message types are expanded to 6-bit encoding now. 5 bits are still the
same field from the Sampler Message Descriptor. The most significant bit
is now bit 31 of the Sampler Message Descriptor. The messages that have
'1 in bit 6 are only to support programmable offsets and those would
require message header. If a sampler type shows only 5 bits encoding, it
is implied bit 6 equal to 0 and there is no requirement for header.
v2 (idr): Trivial formatting changes.
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
This small refactor simplifies a later commit that will optionally emit
some opcodes before the switch (as is already done with the shadow
comparitor).
v2 (Francisco Jerez): Rebase on 07b9bfacc7 ("intel/compiler: Move
logical-send lowering to a separate file").
v3 (Jordan): SHADER_OPCODE_TXL => SHADER_OPCODE_TXL_LZ (was
SHADER_OPCODE_TXF_LZ).
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
The Bspec also says, "The table below describes the SIMD modes which
are supported. SIMD32 and SIMD64 are used for media-type operations
only." Perhaps this commit should just add
if (devinfo->ver >= 20)
return 16;
instead.
v2: Use reg_unit in get_sampler_lowered_simd_width. Suggested by Sagar.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27305>
The optimization pass will (eventually) turn the imul into a
umul_32x16. In many cases, the multiply will be converted to something
else.
I also tried cloning a bunch of existing imul algebraic patterns for
[iu]mul_32x16. This produced the same result, but it was a lot more
churn.
All of the shaders affected were ray tracing shaders in Q2RTX. This is
the only ray tracing workload in my fossil-db.
DG2
Totals:
Instrs: 191995626 -> 191995079 (-0.00%); split: -0.00%, +0.00%
Cycles: 14003803561 -> 14003798040 (-0.00%); split: -0.00%, +0.00%
Spill count: 108320 -> 108288 (-0.03%)
Fill count: 200695 -> 200663 (-0.02%)
Scratch Memory Size: 8755200 -> 8754176 (-0.01%)
Totals from 7 (0.00% of 652118) affected shaders:
Instrs: 14998 -> 14451 (-3.65%); split: -3.94%, +0.29%
Cycles: 137222 -> 131701 (-4.02%); split: -4.10%, +0.07%
Spill count: 32 -> 0 (-inf%)
Fill count: 32 -> 0 (-inf%)
Scratch Memory Size: 19456 -> 18432 (-5.26%)
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27161>
Previously we apply implicit sync to all external memory, which is a bit
redundant since we only need it for the dedicated image scenario (media
image imported into Vulkan). This change optimizes just like that while
also excluding wsi which has its own way of synchronizing with the
compositor.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27398>
In clear_depth_stencil() stencil_surf is defined but not initiaized.
Then in the same function if stencil_mask is calculated and if != 0
stencil_surf is initialized.
But blorp_clear_stencil_as_rgba() access stencil_surf before checking
stencil_mask, what could cause a read of a uninitialized valued.
clear_depth_stencil()
struct blorp_surf stencil_surf;
...
uint8_t stencil_mask = clear_stencil && stencil_res ? 0xff : 0;
if (stencil_mask) {
...
iris_blorp_surf_for_resource(&stencil_surf);
}
...
blorp_clear_depth_stencil(stencil_mask, stencil_surf)
blorp_clear_stencil_as_rgba(stencil_mask, stencil)
if (surf->surf->format ...)
....
Just inverting the order and checking stencil_mask first in
blorp_clear_stencil_as_rgba() fixes the issue.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27390>