Tigerlake PRM: Volume 2c: Command Reference: Registers Part 2 - Registers M through Z
RCU_MODE :: Compute Engine Enable
This bit indicates if Compute Engine (a.k.a Dual Context or Multi
Context) is enabled or not. This bit must be treated as global
control for enabling and disabling of compute engine. Hardware
allocates required resources for the compute engine based on this
bit.
....
HW reserves 4KB of URB space...
Right now no gen12 platform has Dual Context enabled in kernel side,
exposing a compute engine but that can change, so here adding
has_compute_engine to intel_device_info and only reserving URB space
if compute engine is available.
While at it also fixing the error path when pb_slabs_init() fails.
Bspec: 46034
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21031>
The script is broken, and nobody noticed so it wasn't used much.
Meson has had support for printing the options by pointing to the source
dir for a while (not sure the exact version though) so I think we can
just recommend users do that.
Signed-off-by: Eric Engestrom <eric@engestrom.ch>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21469>
Fixes: 5b205ef413
r600: Store nir shaders serialized to save memory
Direct leak of 4096 byte(s) in 1 object(s) allocated from:
#0 0x7faf89c3bb48 in __interceptor_realloc (/usr/lib64/libasan.so.6+0xb1b48)
#1 0x7faf7be5981d in grow_to_fit ../src/util/blob.c:67
#2 0x7faf7be5a538 in grow_to_fit ../src/util/blob.c:49
#3 0x7faf7be5a538 in blob_reserve_bytes ../src/util/blob.c:177
#4 0x7faf7be5a538 in blob_reserve_uint32 ../src/util/blob.c:190
#5 0x7faf7d248a8c in nir_serialize ../src/compiler/nir/nir_serialize.c:2109
#6 0x7faf7df4fdbb in r600_pipe_shader_create ../src/gallium/drivers/r600/r600_shader.c:401
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21443>
It's not connected up to anything at the moment, and even if I do enable
it for crocus HSW it only shaves 3 instructions off of one particular VS
in an old synthetic benchmark, not affecting anything else in shader-db.
I don't think anyone will care to ever fix or port this to NIR, let's just
retire it.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21353>
If the view's seqno increments, it needs to happen *before* the tex cache
key is constructed. Normally this happens when the sampler views are
bound. But if the texture backing a current sampler view is rebound we
need to handle this before the cache lookup.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21408>
Since 13cb41f666 PIPE_BIND_SHARED was used to allocate driver internal
video buffers. These buffers are never shared, but the intent was to
get non-suballocated buffers and SHARED was used as an indirect flag.
This commit switches to PIPE_BIND_CUSTOM which isn't used anywhere else,
and is now translated as "no suballocation".
The main benefit here is that this allows these buffers to set
use_reusable_pool to true reducing the CPU overhead a lot.
For instance, running the following command on my system:
ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi \
-i tears_of_steel_1080p.mov -an -c:v h264_vaapi output.mp4
takes 35 sec with this commit vs 45 sec without.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21416>
As we support VK_EXT_image_drm_format_modifier, we could receive
VK_IMAGE_ASPECT_MEMORY_PLANE_0/1/2_BIT_EXT flags.
Fixes several tests like this:
dEQP-VK.drm_format_modifiers.create_explicit_modifier.*
when using CTS 1.3.5.0
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21463>
We have specialized lowering passes dealing with most of that already:
1. gl_nir_lower_samplers_as_deref
2. nir_lower_samplers
3. nir_lower_cl_images
If we need more than that, those passes can deal with following deref
chains as well.
We _might_ need to improve nir_lower_cl_images a bit for more complex
kernels, but CL also doesn't allow indirect images, so we are always able
to optimize the entire deref chain away.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20161>
Some of the OpenCL tests are flaky, because they just take that long.
Builtins can generated really complex code and if we are unlucky they can
timeout.
Proper support for functions would also solve the issue, probably, but for
now increase the deqp-runner timeout so it's less of an annoyence.
Signed-off-by: Karol Herbst <kherbst@redhat.com>
Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20161>