Some instructions have a minimum latency before another instruction can
execute. It's a little unclear exactly what the details on these are.
For things like OpBar it's probably something to do with when stuff
actually convergres. For MemBar, maybe we only need to wait before we
do something that also touches memory? Unclear and the few docs seem to
imply that it's a straight-up stall. For now, we model it as an
execution latency where nothing is allowed to happen until then.
The blob also inserts a NOP with a delay of 2 in these cases. It's not
entirely clear why but it's probably best we do the same. Theses
instructions tend to be pretty heavy-weight anyway so 2 cycles isn't
really going to cost much compared to the chances that we're missing
some subtle HW issues somewhere.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26716>
The clear color BO is sometimes getting unnecessarily zeroed. For
example, if the resource will be fast-cleared, the app may pick a
non-zero color, causing the initial memset to be unneeded. So, skip the
memset and mark the clear color as unknown if it has not been freshly
allocated. For now, we leave the memsets on imported dmabufs alone for
simplicity.
On non-small BAR ACM systems, this also allows internal resources using
compression to be created without executing an IOCTL for memory mapping.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26675>
On small-BAR ACM systems, we don't initialize the clear color at
resource creation time. This could cause an issue with FCV_CCS_E.
Because the rendering and sampling fields may not be in agreement, FCV
may generate fast-cleared blocks during rendering and the sampler may
interpret them with the wrong values.
Thankfully, the register bit which enables FCV on ACM is disabled by
default, so there is not an actually issue today. Regardless, this patch
implements the fix for this issue now. The next patch will actually skip
explicit clear color initialization at resource creation time. So, this
fix will be useful for other systems that do have FCV enabled today.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26675>
Fresh suballocations from fresh allocations have already been zeroed by
the kernel. So, make BO_ALLOC_ZEROED a no-op for them.
This introduces a new field, iris_bo::zeroed, which will be reused for
similar optimizations in the future.
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26675>
Sync xe_drm.h with commit a8ff56e160bb ("drm/xe/uapi: Remove reset uevent for now").
This is the last xe_drm.h uAPI break.
The only relevant change for ANV and Iris is that now VM bind uAPI
is asynchronous only so I had to bring back the syncobj creation, wait
and destruction.
Is still in the Xe port TODO list to make VM binds truly asynchronous.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26699>
After all int64/double lowerings, there might still be 64b registers
left which ir3 currently doesn't handle. This only happens in a small
number of Piglit tests where those registers (or the variables they come
from) did not get DCE'd.
This patch handles 64b registers in ir3 by adding a NIR pass that does
the following:
- @decl_reg -> split in two 32b ones
- @store_reg -> unpack_64_2x32_split_x/y and two separate stores
- @load_reg -> two separate loads and pack_64_2x32_split
After this pass, the 64b vecs used for the original loads/stores are
still present and are also not handled yet by ir3. This patch removes
them by running nir_lower_alu_to_scalar and nir_copy_prop.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26175>
We only support the variablePointersStorageBuffer feature of variable pointers,
which basically ensures that pointers may only target one buffer. This means
that a particular pointer may change where it points within a given buffer but
it cannot change its value to point to some other buffer. This is a requirement
from us since we expect buffer indices on buffer loads and stores to be
constant, so we can't have a buffer load come through a pointer that may
be assigned to different buffers, since in that case the buffer index
would need to come from bcsel.
There is, however, a small complication: the spec still allows pointers to
be null, and NIR defines null pointers to use 0xffffffff for both the buffer
index and the offset, which will cause a problem in a scenario like this:
int *b = ...
if (cond) {
b = null;
discard;
}
ubo_load(b);
Here the buffer index for the ubo load may come from a bcsel choosing between
the null pointer (0xffffffff) and the valid address (let's say 0), so we don't
have a constant and we assert fail.
This change detects this scenario and upon finding it will rewrite the buffer
index on the null pointer branch of the bcsel to match that of the valid
branch so that later optimizations passes can remove the bcsel and we end up
with a constant index. This is fine because a null pointer dereference is
undefined behavior and it is not something we should see in valid applications.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
Otherwise we may emit a load from an invalid offset from
a lane that was discarded.
This fixes an simulator assert from triggering when
executing:
dEQP-VK.spirv_assembly.instruction.terminate_invocation.terminate.no_null_pointer_load
That test emits a conditional kill and then a buffer load
which would have invalid offsets for the lines killed. Since
the buffer load is in uniform control flow we were incorrectly
emitting a full quad load, including disabled lanes which would
prompt the simulator to assert on invalid offsets being loaded
coming from the lanes that had been killed in the shader.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26683>
ROI (region of interest) feature implementation
in va.
It does not support ROI priority, and supports
qp delta, and the maximum number of supported
region is defined as 32, the region sequence implies
the priority, the lower the sequence number, the higher
the region priority, when region overlapping happened,
the higher priority region overwrites the lower
priority one.
And specifically for AV1, the adjust step will be
rounded by 5 when rate control is used, for example,
if qp_delta (q index) is 6, it will use 5, if
qp_delta is 8, it will use 10.
For AVC/HEVC (RC/CQP) and AV1 CQP mode, the
qp_delta granularity is 1.
Reviewed-by: Boyuan Zhang <boyuan.zhang@amd.com>
Signed-off-by: Ruijing Dong <ruijing.dong@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26659>
Just a few implementation details that are worth mentioning:
- Panfrost doesn't support explicit VM management. This means
panfrost_kmod_vm_create() must always be created with
PAN_KMOD_VM_FLAG_AUTO_VA and panfrost_kmod_vm_bind(op=map) must
always be passed PAN_KMOD_VM_MAP_AUTO_VA. The actual VA is assigned
at BO creation time, and returned through
drm_panfrost_create_bo.offset, or can be queried through
DRM_IOCTL_PANFROST_GET_BO_OFFSET for imported BOs.
- Evictability is hooked up to DRM_IOCTL_PANFROST_MADVISE.
- BO wait is natively supported through DRM_IOCTL_PANFROST_WAIT_BO.
The rest is just a straightforward translation between the kmod API and
the existing panfrost ioctls.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
We have generic BO management and device management layers that
directly call kernel driver-specific ioctls. The introduction of
Panthor (the new kernel driver supporting CSF hardware) forces us to
abstract some low-level operations. This could be done directly in
pan_{bo,device,props}.{c,h}, but having the abstraction clearly defined
and separated from the rest of the code makes for a cleaner
implementation.
This is also a good way to get a low-level KMD abstraction that
we can use without pulling all sort of gallium-related details in,
which will be important for some refactoring we plan to do in panvk.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26357>
These should probably go on the instance but everything is tangled up
too badly right now. This at least moves them to some place where we
have them without a nouveau_ws_device. It's fine to do this because
debug flags are an environment variable and won't change across a run.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26615>