Implement HSD 16028171704/14025112257:
LSC state cache livelock:- Once state cache entries are full,
subsequent walker dispatches with two threads per thread group maybe
gets stuck infinitely because of state cache live lock.
One thread continuously stuck in loop doing UGM fence + evict and UGM
read is waiting on UGM read to have certain value. while other thread
supposed to update the value that first thread is waiting for. But
since entries are full in state cache, there is second thread never
make progress.
Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
Implement HSD 16028171704/14025112257:
LSC state cache livelock:- Once state cache entries are full,
subsequent walker dispatches with two threads per thread group maybe
gets stuck infinitely because of state cache live lock.
One thread continuously stuck in loop doing UGM fence + evict and UGM
read is waiting on UGM read to have certain value. while other thread
supposed to update the value that first thread is waiting for. But
since entries are full in state cache, there is second thread never
make progress.
Closes: #12352
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37128>
Layered surfaces (array textures) with video encode/decode usage bits
will have their slices aligned to make them addressable to the media
engine. Multi-planar layered surfaces will be stored with their slices
interleaved so that a relative offset can be programmed between the
gamma and chroma slices.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35651>
The Xe ioctl DRM_XE_DEVICE_QUERY_ENGINE_CYCLES provides accurate
timestamps correlated between the CPU and GPU. However, it is slow and
impacts performance while collecting Perfetto traces.
Instead, use Perfetto's GetBootTimeNs() to track when to emit the
BUILTIN_CLOCK_BOOTTIME clock sync event so it only occurs every 1
second. This reduces the impact of recording gpu.renderstages from
-8% to -4%.
More concretely, FPS measurements when tracing Unity BoatAttack demo on
an Intel ADL device:
* gpu.renderstages disabled: 48.044293667
* gpu.renderstages enabled: 38.119778333 (-20.66%)
* gpu.renderstages enabeled + this fix: 42.641818333 (-11.24%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37095>
v2: Rebase on ac2b072312 ("brw: Add more specific brw_builder
helpers"), and fix a bug that caused the new instruction to possibly be
put in the wrong place.
No shader-db changes on any Intel platform.
fossil-db:
All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 233675305 -> 233641585 (-0.01%)
Cycle count: 32593658094 -> 32591467794 (-0.01%); split: -0.01%, +0.00%
Totals from 33513 (4.25% of 789264) affected shaders:
Instrs: 5200332 -> 5166612 (-0.65%)
Cycle count: 1499831128 -> 1497640828 (-0.15%); split: -0.15%, +0.00%
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/35444>
The brw_shader::uniforms now is derived from the nir_shader. The
only exception is compute shaders for older Gfx versions, so we
move the adjust logic for that.
The benefit here is untangling the code for compilation variants,
that before needed to keep track of the first that compiled to,
in most cases, copy an integer.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
As we set prog_data->nr_params, allocate the array like elsewhere.
Current code is getting by because the logic for adding a new element
will realloc it. But later changes will make the array be accessed
before this reallocation.
This will make sure later patches won't cause tests like
dEQP-VK.query_pool.statistics_query.compute_shader_invocations.32bits_cmdcopyquerypoolresults_secondary
to fail in gfxver < 125. Note the bug appears when DRI option
to tweak the thresold to use these shaders is set to 0. This is
done by the GitLab CI, which allowed testing later patches to find
this issue.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33541>
The problem with the current code is that there is a disconnect between :
- the virtual register size allocated
- the dispatch size
- the size_written value
Only the last 2 are in sync and this confuses the spiller that only
looks at the destination register allocation & dispatch size to figure
out how much to spill.
The solution in this change is to make BROADCAST more like
MOV_INDIRECT, so that you can do a BROADCAST(8) that actually reads a
SIMD32 register. We put the size of the register read into src2.
Now the spiller sees correct read/write sizes just looking at the
destination register & dispatch size.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 662339a2ff ("brw/build: Use SIMD8 temporaries in emit_uniformize")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13614
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36564>
Those are for push constants, no point in doing that because :
- there is no HW constant offsets in push constants (payload
delivery), it's just register offset calculation
- if we have an dynamic value it's already using MOV_INDIRECT
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e103afe7be ("brw: run the nir_opt_offsets pass and set the maximum offset size")
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36958>
This avoids having to hardcode the proxy in the traces `download-url` or
jobs setting `PIGLIT_REPLAY_EXTRA_ARGS` and accidentally overriding the
default args when the author meant to append.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36955>
ANV currently carries a partial copy of the gralloc mapper's format
resolving code, while the ground truth solely resides inside the
gralloc. The local copy is delicate and unable to maintain compatibility
with different gralloc implementations because AHB formats like
Y8Cb8Cr8_420 and IMPLEMENTATION_DEFINED are flexible formats, and can be
resolved to different underlying drm fourcc formats depending on the
usage and media IPs.
The common impl is more correct as it relies on the info from gralloc
mapper side, and it only sets the minimal set of explicit formats to
avoid hitting spec corner case of allocating out AHB with flexible
formats (missing half of the media usage bits might end up allocating
something different that potentially get resolved to a different
VkFormat as well).
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
The vk_image::ahb_format is for drivers that support more than the
common explicit AHB formats. It is used on AHB image memory export
allocation path, and more specifically vk_device_memory_create will
use that AHB format to allocate the AHB out from gralloc. To be noted,
export allocation path only deals with explicit format but not external
format. So even with the obsolete HAL_PIXEL_FORMAT_NV12_Y_TILED_INTEL
private format, we don't need such either as multi-planar formats are
supposed to be reported as external format.
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
The current impl misses the probe against gralloc mapper, which is the
required handshake before advertising support. For simplicity, just
adopt the common AHB helper. It does not rely on driver specific format
mapping, since the query doesn't allow external format at all.
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>
AHB images are created with the right VkFormat when external format
isn't used. When external format does get used, the proper VkFormat has
already being set in the common runtime. Upon AHB props query, we
resolve external format to VkFormat and set to the externalFormat field
to be used by the app. The app would than chain the exact external
format when creating the AHB image if it wants to go down the external
format code path instead of being explicit. So in the end, the format we
resolve is the format we get. Thus no need to set it twice.
Reviewed-by: Lucas Fryzek <lfryzek@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36866>