As each anv_device has its own address space it was necessary create
one dummy_aux_bo per anv_device.
Also this workaround requires us to disable the
buffer_length_in_aux_addr optimization, that is done in the physical
device creating because isl_dev of physical device is copied
to isl_dev in anv_device.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29619>
This workaround ask us to set a dummy aux address to all
SURFTYPE_BUFFERs with AuxiliarySurfaceMode == AUX_NONE.
It also says that the same dummy aux address can be reused acrsoss all
buffers.
So here adding dummy_aux_address to isl_device, ANV and Iris will
set a value to when running a in a GPU affected.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29619>
This job actually takes just under 5 minutes[*] without any fraction, so
there is no need for this, we can have full coverage and stay below the
10min-per-job limit.
[*] I've seen up to 10min when the CI is busy, so let's put the timeout
at 3x the normal run time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29809>
I just saw more of these in other glsl versions, and it's likely that it
doesn't matter which version is active from our point of view, so let's
just put all of them in the same "flaky" bag.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29813>
Now that I got a hands on access to this GPU and could run deqp-vk, it uses
blob v676.0 and the values are different from v744.19. Not only they
are different, with the values from v744 there are CTS test faulures.
Fixes at least ASTC tests.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29786>
We need some manual logic to work out the size of pData, so we handroll this
one. This fixes push DUT with emulated secondaries.
Affects dEQP-VK.binding_model.shader_access.secondary_cmd_buf.*push*templ* if
emulated secondaries are used.
Neither panvk nor dozen support push DUT yet, so this isn't hurting anyone and
doesn't need to be cc'd stable. But hopefully panvk & dozen get on that :}
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28682>
On a740 TPL1_DBG_ECO_CNTL1.TP_UBWC_FLAG_HINT must be the same between
all drivers in the system, somehow having different values affects
BLIT_OP_SCALE. We cannot automatically match blob's value, so the
best thing we could do is a toggle.
Example:
FD_DEV_FEATURES=enable_tp_ubwc_flag_hint=0
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29754>
si_get_shader_prefetch_size is called each time a shader is changed.
Since the size of a given variant never changes, we can compute the
value once and store the result.
This has to be done in 2 places:
* si_create_shader_variant for all types of shaders
* si_create_compute_state_async for compute shader, when a shader
is loaded from the cache.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29304>
We no longer need to reserve registers for constructing spill/fill
messages. We have split sends and construct message headers in new
temporary registers with a very short lifespan which are simply added
to the existing interference graph as new nodes and allocated via the
normal mechanism.
This means that when we need to spill for the first time, we can avoid
discarding and recomputing the entire interference graph. We also avoid
needing to recreate all spill candidate information once ra_allocate()
fails, because the graph remains valid, and none of the existing nodes
had any changes to their interference. The existing spill candidates
remain valid.
This will slightly help improve compile time when needing to spill.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>
Instead of reserving a register to contain the spill header, which
gets marked live for the entire program, we can just emit the ALU
instructions to build it on the fly. (This is similar to the way
we handle scratch on Alchemist with the newer LSC data port.)
There are a couple of downsides that make this not obviously a win.
First, in order to construct the scratch header on Gfx9-12, we have
to use fields from g0, which will have to remain live anywhere that
scratch access is required. This could negate the register pressure
benefits of creating the header on the fly. However, g0 is oft used
in other places anyway, so it may already be there. Another is that
it's a non-trivial number of ALU instructions to construct the value.
Still, trading lower pressure (so fewer spills, less memory access
and stalls) for more cheap ALU seems like it ought to be a win.
There is another valuable benefit: by not reserving a register, we
eliminate the need to reconstruct the interference graph. (The next
patch will actually do so.)
shader-db on Icelake shows spills/fills at 54/53 helped, 4/10 hurt,
and an 8% increase in ALU on affected shaders. Synmark's OglCSDof
(a benchmark that spills) performance remains the same on Alderlake.
fossil-db on Icelake shows a 5.6%/5.1% reduction in spills/fills and a
4% reduction in scratch memory size on affected shaders. Instruction
counts go up by 11.07%, but cycle estimates only increase by 0.57%.
Assassin's Creed Odyssey and Wolfenstein Youngblood both see 20-30%
reductions in spills/fills, a significant improvement.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25811>