this should enable compression on more intermediate fb attachments
it also means that VK_IMAGE_CREATE_MUTABLE_FORMAT_BIT can now be set
on images where ZINK_BIND_MUTABLE is not set, so non-resource APIs need
to check ZINK_BIND_MUTABLE
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23514>
There's no reason to differentiate between primitive types and structs here. `cl_prop_for_struct`
can handle primitive types just fine.
Drop `cl_prop_for_type` and rename the existing `cl_prop_for_struct` to `cl_prop_for_type`.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
It's a layering violation and really the wrong tool for the job. Add a new fn to view a given slice
as a &[u8] instead of going though the clprop machinery which creates a new Vec.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23652>
This new approach handles things as follows:
1. Fences won't be attached to events anymore, applications only wait on
the cv attached to the event.
2. Only the queue is allowed to update event status for non user events.
This will eliminate all remaining status updating races between the
queue and applications waiting on events.
3. Queue minimized flushing by bundling events
4. Increase cv wait timeout as there is really no point in waking up too
often.
Reduces amount of emited fences on radeonsi in luxmark 3.1 luxball by 90%
Signed-off-by: Karol Herbst <git@karolherbst.de>
Reviewed by Nora Allen <blackcatgames@protonmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23612>
Currently SAMPLER_CONFIG0 is always emitted to either update the active
configuration or disable the sampler. With NTE this always emits 32 state
dwords, while there are a lot of cases that only use a small number of
samplers and never change the other samplers from their disabled state.
Track the active samplers from the last emit, so we can skip the state
emission when the sampler is already disabled. Only emit the full state
after a context flush where we don't know the previous sampler state of
the GPU.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Christian Gmeiner <cgmeiner@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23579>
In addition to bringing us one backend closer to the scoped-only future, this
improves the generated code in cases like:
memoryBarrierBuffer();
memoryBarrierShared();
controlBarrier();
With scoped_barriers + nir_opt_combine_barriers, we now emit only one MEMBAR
instruction (and a BARRIER) rather than two MEMBARs.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23191>
Specifically we would bail out previously when encountering any
control flow, now we would optimize it even when the second ARL/ARR
is inside a lower level if/else branch.
shader-db
RV530:
total instructions in shared programs: 132020 -> 131924 (-0.07%)
instructions in affected programs: 3374 -> 3278 (-2.85%)
helped: 4
HURT: 0
RV370:
no change (no control flow there)
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
We already do this for ARL, so just generalize the pass.
shader-db
RV530:
total instructions in shared programs: 132235 -> 132020 (-0.16%)
instructions in affected programs: 8492 -> 8277 (-2.53%)
helped: 41
HURT: 1
total temps in shared programs: 16900 -> 16887 (-0.08%)
temps in affected programs: 83 -> 70 (-15.66%)
helped: 13
HURT: 0
RV370:
total instructions in shared programs: 82395 -> 82320 (-0.09%)
instructions in affected programs: 4715 -> 4640 (-1.59%)
helped: 33
HURT: 1
total temps in shared programs: 12316 -> 12305 (-0.09%)
temps in affected programs: 75 -> 64 (-14.67%)
helped: 11
HURT: 0
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
Its particularly important to have the copy-propagate pass run first.
So that when the round is vectorized, we don't have to follow the MOVs
to find out if it leads to ARL or not (we don't vectorize ARR/ARL at the
moment).
No shader-db change.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>
Specifically after the first copy propagate run but before the
second one. Removal of ARLs will enable the copy propagate to be more
aggresive, as it is very carefull in such cases.
shader-db
RV530:
total instructions in shared programs: 131861 -> 131503 (-0.27%)
instructions in affected programs: 23949 -> 23591 (-1.49%)
helped: 199
HURT: 15
total temps in shared programs: 16997 -> 16903 (-0.55%)
temps in affected programs: 767 -> 673 (-12.26%)
helped: 69
HURT: 9
RV370:
total instructions in shared programs: 82360 -> 82027 (-0.40%)
instructions in affected programs: 19516 -> 19183 (-1.71%)
helped: 183
HURT: 15
total temps in shared programs: 12370 -> 12262 (-0.87%)
temps in affected programs: 664 -> 556 (-16.27%)
helped: 73
HURT: 0
The hurt programs are due to some constant load being copy propagated
which leads to bad interaction with source conflict resolve pass later.
v2: add missing shader type initialized to the tests. Previously we were
checking for has_omod which also practically means we have a fragment
shader, however its less readable.
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Filip Gawin <filip.gawin@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23560>