L3 configurations with an ALL partition of 128 ways per bank or more
cannot be represented with the normal L3ALLOC partitioning mechanism
since the "All L3 client pool" field would overflow, instead the
L3FullWayAllocationEnable bit has to be set, which causes the whole L3
to be used in a unified cache configuration.
That's precisely the configuration we're currently using on recent
platforms, but previously we were relying on the L3 config tables
being empty and the selected L3 configuration being a NULL pointer to
detect this condition. This is about change, the L3 configuration
structure will be defined for gfx12.5+ platforms since they provide
useful information about the cache hierarchy to the drivers. Instead
of checking whether the pointer is NULL in order to apply a unified L3
cache configuration, use it when there is a single ALL partition
larger than can be represented via L3ALLOC.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25493>
Doing something like this is allowed :
vkCreateGraphicsPipeline(.., scissorState, &pipeline);
vkCmdBindPipeline(pipeline);
vkCmdSetScissor(...)
vkCmdBindPipeline(pipeline)
If we don't reapply the pipeline dynamic state, the command buffer
runtime state will keep the dynamic state set in between the 2 binds.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25915>
Use the hawkmoth c:auto* directives to incorporate isl documentation.
Convert @param style parameter descriptions to rst info field lists.
Add static stubs for generated headers. Fix a lot of references, in
particular the symbols are now in the Sphinx C domain, not C++
domain. Tweak syntax here and there.
Based on the earlier work by Erik Faye-Lund <kusmabite@gmail.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24507>
We can end up in situation where we are dispatched with a multisample
framebuffer but not at per-sample. In this case we would request the
at_sample value with the wrong message configuration.
Relying on the BRW_WM_MSAA_FLAG_MULTISAMPLE_FBO flag superseeds
BRW_WM_MSAA_FLAG_PERSAMPLE_DISPATCH.
Fixes piglit tests :
spec@arb_gpu_shader5@arb_gpu_shader5-interpolateatsample*
With Zink on Anv
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25854>
While the fs_visitor::validate() implementation is empty in release
build, we still emit calls to it since it is defined in a separate
compilation unit than its callers. To fix this, just expose an inline
empty function in the header for the release mode.
Fossil run time differences in TGL laptop (difference at 95.0% confidence):
```
Rise of The Tomb Rider (Native) [n=7]
-0.482857 +/- 0.010932
-1.60608% +/- 0.0363621%
Cyberpunk 2077 (DXVK) [n=7]
-0.987143 +/- 0.0904516
-0.82996% +/- 0.076049%
Batman Arkham City (DXVK) [n=7]
-7.74857 +/- 0.329561
-1.46298% +/- 0.0622231%
```
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25847>
This helps fix a performance regression on games such as F1 22 and RDR2.
Turning on non zero fast clears causes additional partial resolves for
these games that degrades performance. Let's turn off non zero fast
clears till we can eliminate the partial resolves.
Signed-off-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25863>
It was never actually async as it was doing a DRM_IOCTL_SYNCOBJ_WAIT
right after DRM_IOCTL_XE_VM_BIND but it was required to allow the
partial binds required by sparse.
But it is now fixed and we can switch back to sync vm bind.
In future we will switch back to async vm bind to improve performance
but this time it will be properly implemented.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25300>
Sync xe_drm.h with commit xxxxx ("drm/xe/uapi: Fix naming of XE_QUERY_CONFIG_MAX_EXEC_QUEUE_PRIORITY").
One not so straght forward change is that sync VM binds now don't
require a syncobj anymore, the uAPI will return as soon the VM bind
operations are done.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25300>
At image bind time, we require BOs to meet aux-map alignment
requirements in order to enable CCS on images. This is a heuristic
controlled by anv_bo_allows_aux_map().
To improve the chances of getting a properly aligned BO, we make use of
the dedicated allocation extension. Firstly, we report to applications a
preference for dedicated memory if an image would like to use the aux
map. Secondly, we align the VMA for dedicated allocations to meet
aux-map requirements.
To make enabling modifiers much easier on integrated gfx12, report
dedicated allocations as a requirement for modifiers which specify CCS.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25003>
At image bind time, if an image's addresses can be placed into the
aux-map without causing conflicts with a pre-existing mapping, do so.
The code aux management code in the binding function operates on a
per-plane basis. So, use the per-plane CCS memory range from the image
rather than the CCS memory region for the entire BO.
Another way to avoid aux-map conflicts is to rely solely on having a
dedicated allocation for an image. Unfortunately, not all workloads
change their behavior when drivers report a preference for dedicated
allocations. In particular, 3DMark Wild Life Extreme does not make more
dedicated allocations and such a solution was measured to perform ~16%
worse than this solution. With this solution, I did not measure a loss
of CCS on that benchmark.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6304
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25003>
Make intel_aux_map_add_mapping return false if a mapping is attempted
that would conflict with an existing one. If this function doesn't
return false, it will either fail to return or return true.
The Vulkan driver will make use of this feature to opportunistically
enable CCS if a BO's VMA range has not been already mapped.
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25003>
We need to shut down the runtime queue threads before tearing down
anything else.
Gets rid of helgrind errors like this :
==212772== Possible data race during write of size 4 at 0xADCBFB0 by thread #1
==212772== Locks held: 1, at address 0x6B8F260
==212772== at 0x8AC3EFF: simple_mtx_destroy (simple_mtx.h:97)
==212772== by 0x8ACB24D: intel_ds_device_fini (intel_driver_ds.cc:603)
==212772== by 0x6CBD4D4: anv_device_utrace_finish (anv_utrace.c:471)
==212772== by 0x6C71577: anv_DestroyDevice (anv_device.c:3679)
==212772== by 0x6B2F1E2: loader_layer_destroy_device (loader.c:4358)
==212772== by 0x6B3F10B: vkDestroyDevice (trampoline.c:983)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: cc5843a573 ("anv: implement u_trace support")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10010
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25805>
We have only 6 of these boards since one died in May, and 7 jobs allocated
to them. So you ended up with a 5 minute delay on each pipeline with an
otherwise-idle farm while you waited for the first batch of jobs to
complete so you could get the last one started. It turns out that piglit
was taking 3 minutes of runtime each, so we can just shard piglit 2 ways,
stay under runtime, and not over-allocate the farm
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25790>