The issue we're addressing here is that we have 2 batches and the both
grow at different rate. We want to keep doubling the main batch size
as the application writes more and more commands to limit the number
of GEM BOs. But we don't want to have the generation batch size to be
linked to the main batch.
v2: remove gfx7 code
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15642>
I noticed a sequence like the following in a scheduled SuperTuxKart shader:
TEX_SINGLE.slot0 @r0:r1, ..
LD_VAR.wait0 @r2, ...
FMA r1, ...
Why do we stall waiting for the TEX_SINGLE instruction when it's not actually
read? Because its upper channels are *never* read, leading to a
write-after-write dependency when the register allocator puts some unrelated ALU
destination in there. By appropriately masking the texture instruction's write,
that false dependency disappears, avoiding the stall.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20426>
There are too many problems with indirect draws on v7 that we never got this
code path to the finish line, and none of us have a good plan (or reason) to fix
this. Proper indirect draws are only possible since v10 on Mali.
There was interest in using this path to implement indexed draws in PanVK, that
MR is stalled and it's not clear how much sense it makes to do Vulkan on
anything older than v9 or v10 at this point. This code isn't *gone*, it'll still
be in git history, but I don't see a lot of reason in keeping it in tree if it's
unused and complicating e.g. the sysval upload path of the driver.
Indirect dispatch remains supported on v7, as that path *is* working and flipped
on for end users. Indirect dispatch on v7 is considerably less complicated than
indirect draws.
Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20420>
Folding signed or unsigned i32 -> i16 conversion into sampling
instruction causes it to behave differently with out-of-bounds
values. The conversion expects higher bits being masked, however
folded variant does clamp the value.
A concrete example is that:
isaml.base0 (u16)(x)hr0.x
is not equal this:
isaml.base0 (u32)(x)r0.w
(sy)cov.u32u16 hr0.x, r0.w
Fixes misrendering in "Injustice 2".
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7869
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20396>
Linear PE causes a lot of issues in the ZS stage. While some of those issues
can be worked around on newer GPU cores by doing all ZS operations in the
late stage, GC600 r4653 exhibits spurious Z fails when linear PE is active
even though this GPU does not even have a early Z stage.
Disable linear PE for now, until the issue can be analyzed further. Leave the
debug option in place to allow to enable linear PE for testing.
Fixes: 43eb5e777e ("etnaviv: add debug option to disable linear PE feature")
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Marek Vasut <marex@denx.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20405>
Because commit b9403b1c47 moved dispatch enable handling away from the
compiler, the drivers must ensure correct dispatch enable values. This
is handled by the intel_set_ps_dispatch_state function.
v2: Fix gfx6 build and use brw_fs_get_dispatch_enables for gfx6 in
crocus
v3: Rebase, use intel_set_ps_dispatch_state, drop gfx6 handling
Fixes: b9403b1c47 ("intel: factor out dispatch PS enabling logic")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20267>
To debug the LAVA jobs locally, we have an option in the
lava_job_submitter script to ignore the JWT token to make it possible to
retry jobs without the need to get an unexpired token.
But this trick needs to modify the overlayed directory so that we would
need to download and extract it earlier in the run.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20051>
LAVA uses XMLRPC to send jobs information and control, more specifically
it sends device logs via YAML dumps encoded in UTF-8 bytes.
In Python, we have xmlrpc.client.Binary class as the serializer
protocol, we get the logs wrapped by this class, which encodes the data
as UTF-8 bytes data.
We were converting the encoded data to a string via the `str` function,
but this led the loaded YAML data to use single quotes instead of double
quotes for string values that made special characters, such as `\x1b` to
be escaped as `\\x1b`.
With this fix, we can now drop one of the hacks that fixed the bash
colors.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20051>
Only enable mesh+task shaders when IBs and gang submit are enabled.
We won't support gang submit with noibs.
Also remove the RADV_PERFTEST=ext_ms option.
Side note, GFX11 task/mesh support is still a TODO.
Don't skip the CTS tests which require GFX->ACE synchronization.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>
Add new preambles and postambles for synchronizing gang members in a
gang submission using semaphores.
These semaphores are both located in a small BO.
Gang wait preambles:
- gang leader writes 1 to a semaphore
- gang member waits for it to be written
When task shaders are used, make sure ACE waits until GFX starts to execute.
Userspace is required to emit this wait to make sure it behaves correctly
in a multi-process environment, because task shader dispatches are not
meant to be executed on multiple compute engines at the same time.
Gang wait postambles:
- gang member writes 1 to a semaphore
- gang leader waits for it to be written
This ensures that the gang leader waits for the whole gang,
which is necessary because the kernel signals the userspace fence
as soon as the gang leader is done, which may lead to bugs because the
same command buffers could be submitted again while still being executed.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20010>