With descriptor buffers, we have no way to know how big the descriptor
set actually is, so we have no idea how many constants we can safely
push. If we use a UBO then it will still get pushed, because we normally
assume that we can freely access UBOs without any fear of faults due to
the range checking. This does the easiest thing of using raw pointer
loads, although performance will fall off a cliff, because we don't have
many better options.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19849>
While running VK-CTS with valgrind, the application hit the max
thread count of 500. After further investigation, this was due to
multiple instances being created with the disk cache spinning up
worker threads which wouldn't be cleaned as disk_cache_destroy
wasn't being called.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20178>
Otherwise, making a CS using the memory will use the uninitialized .map
value (when checking the size of the CS in in begin's tu_cs_is_empty()
check), causing valgrind noise in
dEQP-VK.binding_model.descriptorset_random.sets4.dynindexed.ubolimitlow.sbolimitlow.sampledimghigh.lowimgsingletex.iublimitlow.nouab.vert.noia.0
(thanks to vi_info->vertexBindingDescriptionCount==0).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20173>
When a pipeline has dynamic logicop state or blend state, we defer lrz
write decision to tu6_calculate_lrz_state. As such,
tu6_calculate_lrz_state should look at both states when either of them
is dynamic.
Fixes dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.21 on
angle, which uses dynamic logicop state and static blend state with
blending enabled.
Fixes: c8c7154c2e ("tu: Implement extendedDynamicState3ColorBlendEnable")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20136>
This could result in hangs if the entire descriptor set was inline
uniforms. Fixes
dEQP-VK.binding_model.descriptorset_random.sets4.dynindexed.ubolimitlow.nosbo.nosampledimg.outimgonly.iublimitlow.nouab.comp.noia.0
after 0a0a04bd made us prefetch descriptors again and uncovered this.
Fixes: 37cde2c6 ("tu: Rewrite inline uniform implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20101>
This seems to have been triggered by some recent CTS changes which
changed the random number generation. I'm seeing context faults in
dEQP-VK.binding_model.descriptorset_random.sets4.dynindexed.ubolimitlow.sbolimitlow.sampledimghigh.lowimgnotex.iublimitlow.nouab.comp.noia.0
that are fixed by this.
Fixes: 37cde2c634 ("tu: Rewrite inline uniform implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20039>
Now we always push the inline uniforms, using an indirect CP_LOAD_STATE.
There is just enough space to be able to always push them if we expose
the minimum possible per-stage limits. This aligns us with Qualcomm and
removes the need to setup a uniform descriptor which will be
problematic.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18703>
Set ycbcr feature bits only for ycbcr formats. Besides, we can control
chroma locations only for planar formats and we support
VK_FORMAT_FEATURE_SAMPLED_IMAGE_YCBCR_CONVERSION_SEPARATE_RECONSTRUCTION_FILTER_BIT
on newer gens.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19609>
All environment variables involved in utrace usage were very
fragmented and convoluted to decode the meaning of, this commit has
simplified them down into easier to understand flags which directly
indicate the resulting behavior (such as `perfetto` enabling queued
logs rather than needing to set a `queued` flag) while combining
them into a single envvar `GPU_TRACES` and updating existing
terminology in utrace to match up with the new options.
Signed-off-by: Mark Collins <mark@igalia.com>
Reviewed-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Ack-by: Chia-I Wu <olvaffe@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18271>
All arguments in Turnip code are fit to be moved to the start
event where they fit better as any sequential logging should print
the arguments with the scope start as it makes more sense than
printing arguments with the end of a scope.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18271>
aztec_ruins under ANGLE was getting LRZ writes disabled because 0xf out of
the 0x3 mask was enabled. The goal was to see if there are partial writes
being done, though. This caused a 2-3% performance regression.
Fixes: 85d0205db1 ("tu: Implement extendedDynamicState3ColorWriteMask")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19635>
It turns out that this actually is supported. GMEM can hold multiple
layers which are cleared, loaded, and resolved separately. The stride
between layers seems to be implicitly calculated based on the tile size,
and we have to match it when blitting to/from GMEM. One tricky thing is
that now we may realize that we don't have enough space for GMEM only
when computing the tiling config, because we may not know the number of
framebuffer layers until we have the framebuffer and too many
framebuffer layers will exhaust GMEM.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19505>