The optimization pass will (eventually) turn the imul into a
umul_32x16. In many cases, the multiply will be converted to something
else.
I also tried cloning a bunch of existing imul algebraic patterns for
[iu]mul_32x16. This produced the same result, but it was a lot more
churn.
All of the shaders affected were ray tracing shaders in Q2RTX. This is
the only ray tracing workload in my fossil-db.
DG2
Totals:
Instrs: 191995626 -> 191995079 (-0.00%); split: -0.00%, +0.00%
Cycles: 14003803561 -> 14003798040 (-0.00%); split: -0.00%, +0.00%
Spill count: 108320 -> 108288 (-0.03%)
Fill count: 200695 -> 200663 (-0.02%)
Scratch Memory Size: 8755200 -> 8754176 (-0.01%)
Totals from 7 (0.00% of 652118) affected shaders:
Instrs: 14998 -> 14451 (-3.65%); split: -3.94%, +0.29%
Cycles: 137222 -> 131701 (-4.02%); split: -4.10%, +0.07%
Spill count: 32 -> 0 (-inf%)
Fill count: 32 -> 0 (-inf%)
Scratch Memory Size: 19456 -> 18432 (-5.26%)
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27161>
Previously we apply implicit sync to all external memory, which is a bit
redundant since we only need it for the dedicated image scenario (media
image imported into Vulkan). This change optimizes just like that while
also excluding wsi which has its own way of synchronizing with the
compositor.
Signed-off-by: Yiwei Zhang <zzyiwei@chromium.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27398>
In clear_depth_stencil() stencil_surf is defined but not initiaized.
Then in the same function if stencil_mask is calculated and if != 0
stencil_surf is initialized.
But blorp_clear_stencil_as_rgba() access stencil_surf before checking
stencil_mask, what could cause a read of a uninitialized valued.
clear_depth_stencil()
struct blorp_surf stencil_surf;
...
uint8_t stencil_mask = clear_stencil && stencil_res ? 0xff : 0;
if (stencil_mask) {
...
iris_blorp_surf_for_resource(&stencil_surf);
}
...
blorp_clear_depth_stencil(stencil_mask, stencil_surf)
blorp_clear_stencil_as_rgba(stencil_mask, stencil)
if (surf->surf->format ...)
....
Just inverting the order and checking stencil_mask first in
blorp_clear_stencil_as_rgba() fixes the issue.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27390>
Memory can be free before images it is bound to. When unmapping the
CCS range in the AUX-TT, we cannot rely on the anv_bo::offset field
because the anv_bo might have been freed.
Just save the mapping address/size and use those values at unmapping
time.
Fixes an assert on CI with :
dEQP-VK.synchronization.internally_synchronized_objects.pipeline_cache_graphics
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e519e06f4b ("anv: add missing alignment for AUX-TT mapping")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27304>
Buffers that are not dedicated can also be used for CCS mapped images,
so they need to be aligned to the AUX-TT requirements.
GTK+ is running into such case where it creates an image with a CCS
modifier. When requesting the alignment through
vkGetImageMemoryRequirements() the 64KB/1MB alignment is returned, but
the binding fails with an assert because the VkDeviceMemory has not
been aligned to the AUX-TT requirement and we cannot disable CCS since
the modifier requires it.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 4cdd3178fb ("anv: Meet CCS alignment reqs with dedicated allocs")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10433
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27258>
VK_ACCESS_2_SHADER_STORAGE_READ_BIT specifies read access to a
storage buffer, physical storage buffer, storage texel buffer, or
storage image in any shader pipeline stage.
Any storage buffers or images written to must be invalidated and
flushed before the shader can access them.
This fixes the following tests on LNL:
- dEQP-VK.synchronization2.op.single_queue.barrier.write\*_specialized_access_flag
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27212>
While enumerating devices on a system with multiple implementations,
unnecessary ioctls will be issued before a driver checks if it supports a
given device.
This patch makes the driver fail early based on a intel_device_info.ver
check with 2 new parameters added to intel_get_device_info_from_fd.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27166>
This adds a few new fields in the brw_cs_prog_data struct and then
uses them to fill in the relevant COMPUTE_WALKER fields.
Although the Tile Layout field theoretically has different settings for
32/64/128bpe, it appears that the recommended programming is to always
pick either TileY 32bpe or Linear. It's not very practical to look at
the surface formats involved, anyway.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27167>
With regards to implicit masking of the shift counts for 8- and 16-bit
types, the PRMs are incorrect. They falsely state that on Gen9+ only the
low bits of src1 matching the size of src0 (e.g., 4-bits for W or UW
src0) are used. The Bspec (backed by data from experimentation) state
that 0x3f is used for Q and UQ types, and 0x1f is used for **all** other
types.
To match the behavior expected for the NIR opcodes, explicit masks for
8- and 16-bit types must be added.
This fixes (the updated version, see crucible!138) of
func.shader.shift.int16_t on all Intel platforms. According to Karol,
this also fixes "integer_ops integer_rotate" tests in OpenCL CTS.
No shader-db or fossil-db changes on any Intel platform.
Tested-by: Karol Herbst <kherbst@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23001>
Launch with :
$ MESA_VK_TRACE=rmv MESA_VK_TRACE_TRIGGER=/tmp/trig ./my_app
In another terminal, trigger a capture :
$ touch /tmp/trig
The application with create a snapshot and print out :
RMV capture saved to '/tmp/my_app_2024.01.19_10.56.33.rmv'
Then just open it with RMV :
./RadeonMemoryVisualizer /tmp/my_app_2024.01.19_10.56.33.rmv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26843>