Extending the lines by half a pixel in each direction without doing
anything about the varyings makes the varyings interpolate over a
distance than intended. While this can be negligeble for long lines,
it can lead to big error for short lines.
Let's instead add extra geometry for each of the line-caps, so we can
make sure the varyings stay constant for the whole cap, and interpolate
over the intended distance instead.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19847>
This implements line-smoothing the same way as the draw-module does,
except using a geometry shader instead of a CPU pass.
Ideally, this should be enabled either by checking for the various
smooth-line caps, or by a DRIconf setting.
Unfortunately, RADV doesn't support he smooth-lines features, and we
don't want to force it down a pessimistic shader-key code-path. So that
plan is out the window for now.
While DRIconf is also neat, it's a bit of work to wire up, and we don't
really know of any real-world applications who would need this yet. So,
for now, let's just unconditionally enable is on the IMG proprietary
driver, which is going to need this for sure.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19847>
This was really derpy. There's two things wrong; first of all, we should
pick at LEAST VARYING_SLOT_VAR0, second, util_last_bit64 returns one
more than the index of the bit already, so we don't want to add twice
here.
Fixes: 4b17c099ca ("zink: add line-stippling lowering passes")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19847>
So far we have been only restoring dirty dynamic states used by meta
pipelines however, static state from meta pipelines will also clear
dirty flags, preventing follow-up draw calls in the command buffer
to honor these if they are flagged as dynamic states in their
pipelines. Fix this by always resetting all dirty state flags after
a meta operation so we re-emit all the state we need with the next draw
call.
Fixes:
dEQP-VK.dynamic_state.monolithic.image.clear
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20356>
Nothing in the spec seems to require that the number of stages for
which creation feedback is requested must match the number of stages
available in the pipeline. In fact, the spec explicitly mentions
that this number could be 0:
"If pipelineStageCreationFeedbackCount is not 0,
pPipelineStageCreationFeedbacks must be a valid pointer to an
array of pipelineStageCreationFeedbackCount
VkPipelineCreationFeedback structures"
Fixes an assert crash in:
dEQP-VK.pipeline.monolithic.creation_feedback.graphics_tests.vertex_stage_fragment_stage_no_cache_zero_out_feedback_cout
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20352>
This may allow applications making use of buffer age to save some effort
in some cases.
v2: (Simon Ser)
* Add space between struct member and "<" operator.
* Remove break statement which prevented the change from working as
intended in swrast_update_buffers.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18269>
If we can't use the TLB to do a subpass resolve we have a fallaback
that emits separate image resolves, but this fallback was only
handling color resolves. This adds depth/stencil as well.
Fixes some of the issues we have with CTS 1.3.4 in:
dEQP-VK.pipeline.monolithic.multisample.misc.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
For these we always want to use sample_0, averaging is reserved for
color formats. We were already doing this correctly for depth/stencil
resolved in render passes, but not for those happening through
vkCmdResolveImage.
Fixes some of the issues we have with CTS 1.3.4 in:
dEQP-VK.pipeline.monolithic.multisample.misc.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
attachment state is only relevant during render passes, however,
there is a corner case: if we can't resolve an attachment in a
subpass using the hardware, we emit a manual image resolve in the
driver which can trigger a meta operation via blit. In this case,
we pretend we are not in a render pass (since vulkan disallows
blits/resolves in a render pass) but we really want to keep the
attachment state after the meta operation.
Fixes some of the issues we have with CTS 1.3.4 in:
dEQP-VK.pipeline.monolithic.multisample.misc.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20331>
Pre-patch, anv_descriptor_pool used a free list for host allocations
that never merged adjacent free blocks. If the pool only allocated
fixed-sized blocks, then this would not be a problem. But the pool
allocations are variable-sized, and this caused over half of the pool's
memory to be consumed by unusable free blocks in some workloads, causing
unnecessary memory footprint.
Replacing the free list with util_vma_heap, which does merge adjacent
free blocks, fixes the memory explosion in the target workload.
Disdavantges of util_vma_heap compared to the free list:
- The heap calls malloc() when a new hole is created.
- The heap calls free() when a hole disappears or is merged with an
adjacent hole.
- The Vulkan spec expects descriptor set creation/destruction to be
thread-local lockless in the common case. For workloads that
create/destroy with high frequency, malloc/free may cause overhead.
Profiling is needed.
Tested with a ChromeOS internal TensorFlow benchmark, provided by
package 'tensorflow', running with its OpenCL backend on clvk.
cmdline: benchmark_model --graph=mn2.tflite --use_gpu=true --min_secs=60
gpu: adl
memory footprint from start of benchmark:
before: init=132.691MB max=227.684MB
after: init=134.988MB max=134.988MB
Reported-by: Romaric Jodin <rjodin@google.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20289>
This reverts commit b1126abb38.
This breaks all hell at least on DG2, as there are several cases left
where current_pipeline gets checked against GPGPU to decide what to do,
and the value doesn't match that of ANV_HW_PIPELINE_STATE_COMPUTE.
On top of that, it also misses checking for
ANV_HW_PIPELINE_STATE_RAYTRACING.
Then there's the fact that in some cases, current_pipeline will be
UINT32_MAX, because it's the original undefined state and also used
after executing a secondary command buffer because we are not tracking
on which pipeline did the secondary left us.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7910
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20349>
Buffer maps that don't invalidate their destination range work better
as direct CPU maps than staging blits. The application may write only
part of the range, effectively combining the new data with existing
data. So even if the map would stall, the staging blit path won't help
us, as we have to read the existing data to populate the staging buffer
before returning it. This incurs a stall anyway - plus a read and copy.
In contrast, a direct map doesn't need to read any data - it can just
write the destination and the existing data will still be there.
Fixes excessive blits for stalling buffer writes that don't invalidate
the buffer since my recent map heuristic rework.
Fixes: bec68a85a2 ("iris: Improve direct CPU map heuristics")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7895
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20330>
Based on Jianxun's ("iris: don't get format bits in AUX tables").
With gfx12.5+, the compression format is once again coming from the
surface state programming. MTL once again uses an aux-map, but it
ignores the format bits within the the aux-map metadata.
Ref: Bspec 44930: "Compression format from AUX page walk is ignored.
Instead compression format from Surface State is used."
gfx12.5+ also uses tile-4 rather than y-tiling, so if we don't see
y-tiling, we can return 0 from intel_aux_map_format_bits() for the
ignored format bits.
Rework:
* Just return 0 if not using y-tiling as suggested by Nanley.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20322>
Direct leak of 3912 byte(s) in 2 object(s) allocated from:
#0 0x7fbd4641b0 in __interceptor_malloc (/usr/lib64/libasan.so.6+0xa41b0)
#1 0x7f74413518 in parse_and_validate_cache_item ../src/util/disk_cache_os.c:549
#2 0x7f74414b84 in disk_cache_load_item ../src/util/disk_cache_os.c:599
#3 0x7f74410364 in disk_cache_get ../src/util/disk_cache.c:551
#4 0x7f775695ac in panfrost_disk_cache_retrieve ../src/gallium/drivers/panfrost/pan_disk_cache.c:125
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20336>
A bit count of es_accepted works for both when ngg is and isn't
dynamically enabled. Unlike the other sequence, this should only be a
single SALU instruction.
fossil-db (gfx1100, nggc):
Totals from 41388 (30.75% of 134574) affected shaders:
Instrs: 25783544 -> 25432959 (-1.36%); split: -1.36%, +0.00%
CodeSize: 127281160 -> 125878820 (-1.10%); split: -1.10%, +0.00%
Latency: 92849566 -> 92723047 (-0.14%); split: -0.14%, +0.00%
InvThroughput: 9542194 -> 9485012 (-0.60%); split: -0.60%, +0.00%
Copies: 2031074 -> 1928796 (-5.04%); split: -5.04%, +0.00%
Branches: 642407 -> 642409 (+0.00%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20321>
This replaces brw_fs_get_dispatch_enables(), which was added in
b9403b1c47 ("intel: factor out dispatch PS enabling logic"), but this
function will not work well for future changes to 3DSTATE_PS.
So, instead, this moves the related code into a "genX" file which can
directly update 3DSTATE_PS for the given platform.
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20329>