Dmitry Osipenko
7b40d32187
util/mesa-db: Open DB files during access time
...
Open DB files when DB is accessed and close them afterwards to reduce
number of FDs used by multi-part DB cache.
Fixes: fd9f7b748e ("util/mesa-db: Introduce multipart mesa-db cache")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11776
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11810
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Dmitry Osipenko
2a9378a0f9
util/mesa-db-multipart: Open one cache part at a time
...
Open one cache DB part at a time for a multi-part cache to reduce number
of FDs used by the cache. Previously multi-part DB cache instance was
consuming 100 FDs, now it's 2 and cache files are opened when cache
is read or written instead of opening them at the init time.
Fixes: fd9f7b748e ("util/mesa-db: Introduce multipart mesa-db cache")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11776
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Dmitry Osipenko
6a2f5cb556
util/mesa-db: Fix missing O_CLOEXEC
...
Use O_CLOEXEC flag for opened cache DB files to not leak cache FDs when
process forks.
Fixes: 32211788d0 ("util/disk_cache: Add new mesa-db cache type")
Suggested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11810
Signed-off-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Michel Dänzer
92893309bc
util/mesa-db: Further simplify mesa_db_compact
...
Taking advantage of the persistent array of index entries. In
particular, it's no longer necessary to read from the index file during
compaction.
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Michel Dänzer
031f2c2a69
util: Use persistent array of index entries
...
Instead of allocating separate memory for each index entry in the hash
table, use a single array (backed by a mapping of anonymous memory
pages, which allows efficient array resizes) which holds a copy of the
index file contents.
The hash table now references each entry via its offset in the index
file, so that the array address can change on resize.
This eliminates some index file reads and reduces memory management
overhead for the hash table entries. It should be more efficient in
general.
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Michel Dänzer
feef4bf828
util/mesa-db: Use single read for whole index
...
Instead of separate reads per index entry. Should be more efficient.
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Michel Dänzer
1ba3996fd5
util/mesa-db: Reserve hash table for total number of index entries
...
Without this, the hash table needed to be rehashed about
log2(<total number of entries>) times as it grew.
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Michel Dänzer
e596882dd1
util/mesa-db: Recreate files if header load or index update fails
...
The previous behaviour had these issues:
1. It meant that this part of the cache couldn't be used
this time.
2. It left the corrupted index/cache files unchanged, so the same failure
might happen again next time.
Recreating the index & cache files for this part means it can be used,
it just loses any previously cached contents.
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Michel Dänzer
13c44abaac
util/mesa-db: Make mesa_db_lock robust against signals
...
flock may be interrupted by a signal, in which case it returns with
EINTR error. In this case we need to retry until it returns success
or another error.
Fixes: 32211788d0 ("util/disk_cache: Add new mesa-db cache type")
Reviewed-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Tested-by: Dmitry Osipenko <dmitry.osipenko@collabora.com >
Acked-by: Timothy Arceri <tarceri@itsqueeze.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30988 >
2024-10-25 18:06:14 +00:00
Georg Lehmann
d01c1ba939
aco: move exec copy out of waterfall loops
...
Foz-DB Navi21:
Totals from 348 (0.44% of 79395) affected shaders:
CodeSize: 17944800 -> 17946268 (+0.01%); split: -0.02%, +0.03%
Latency: 29775973 -> 29774369 (-0.01%); split: -0.01%, +0.00%
InvThroughput: 10233380 -> 10232801 (-0.01%); split: -0.01%, +0.00%
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19070 >
2024-10-25 16:47:32 +00:00
Georg Lehmann
6c73a8a7f2
aco: optimize conditional divergent breaks at the end of loops
...
Removes one branch and one s_mov.
Foz-DB Navi21:
Totals from 1483 (1.87% of 79395) affected shaders:
Instrs: 6424114 -> 6373084 (-0.79%)
CodeSize: 35309320 -> 35091084 (-0.62%); split: -0.63%, +0.01%
Latency: 87950935 -> 88030841 (+0.09%); split: -0.03%, +0.12%
InvThroughput: 24784756 -> 24799536 (+0.06%); split: -0.02%, +0.08%
Copies: 588743 -> 561805 (-4.58%)
Branches: 242521 -> 215578 (-11.11%)
SALU: 877856 -> 850918 (-3.07%)
Signed-off-by: Georg Lehmann <dadschoorse@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19070 >
2024-10-25 16:47:32 +00:00
Georg Lehmann
075c5818cb
aco/ssa_elimination: don't assume exec writes can be removed based on block kind
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19070 >
2024-10-25 16:47:32 +00:00
Georg Lehmann
61ab33c883
aco/ssa_elimination: add instr_accesses helper
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19070 >
2024-10-25 16:47:32 +00:00
Valentine Burley
7ec0b62341
ir3: Don't lower to LCSSA before calling nir_divergence_analysis()
...
NIR can now calculate divergence without converting to LCSSA beforehand.
However, removing this particular instance of nir_convert_to_lcssa was
missed in commit 87cb42f953 ("treewide: don't lower to LCSSA before calling nir_divergence_analysis()")
Signed-off-by: Valentine Burley <valentine.burley@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31821 >
2024-10-25 16:12:51 +00:00
Valentine Burley
5bb0296e08
freedreno/devices: Establish a7xx sub-generations
...
We can differentiate three distinctive sub-generations on a7xx.
This reduces the number of copy-pasted quirks.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31821 >
2024-10-25 16:12:51 +00:00
Valentine Burley
0981f983ee
freedreno/devices: Enable 64-bit atomics on a735 and a740v3
...
The blob exposes VK_KHR_shader_atomic_int64 on these devices too,
but this was missed during initial enablement.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31821 >
2024-10-25 16:12:51 +00:00
Valentine Burley
da989edde8
freedreno/devices: Document common name for a635 speedbins
...
Signed-off-by: Valentine Burley <valentine.burley@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31821 >
2024-10-25 16:12:51 +00:00
Valentine Burley
45bb8002df
freedreno/devices: Inline a690 quirk
...
Similarly as on FD621, we only have one GPU-specific quirk, no
need to use a separate dictionary for it.
Signed-off-by: Valentine Burley <valentine.burley@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31821 >
2024-10-25 16:12:51 +00:00
Rob Clark
7f63fa34da
nir/lower_amul: Fix ASAN error
...
We shouldn't assume the bindings are sparse when we allocate an array
indexed on the binding. See, for example:
dEQP-GLES31.functional.program_interface_query.buffer_variable.random.55
Fixes: 2e833b16bc ("nir/lower_amul: Use num_ubos/ssbos instead of recomputing it.")
Signed-off-by: Rob Clark <robdclark@chromium.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31611 >
2024-10-25 15:38:51 +00:00
Rob Clark
e548f90edb
freedreno/ir3: Create UBO variables for driver-UBOs
...
Some nir passes, like lower_amul, expect to have varibles declared for
things that are accessed via load_ubo().
Fixes: 76e417ca59 ("turnip,ir3/a750: Implement consts loading via preamble")
Signed-off-by: Rob Clark <robdclark@chromium.org >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31611 >
2024-10-25 15:38:51 +00:00
Jocelyn Falempe
b24d4f0c86
gbm/dri: Fix color format for big endian.
...
Using wayland on s390x has all the colors wrong.
Mesa reports using GBM_FORMAT_XRGB8888 but inside the buffer, the
colors are in GBM_FORMAT_BGRX8888 order.
This patch fixes it for common formats, and also introduced BGRX8888
which is the default on big endian.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com >
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31707 >
2024-10-25 14:18:24 +00:00
Jocelyn Falempe
3814dee11a
gbm/dri: Use PIPE_FORMAT_* instead of using __DRI_IMAGE_*
...
__DRI_IMAGE formats are not well defined for big endian.
This patch has no functionnal change and prepare the work to better support
big endian.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com >
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31707 >
2024-10-25 14:18:24 +00:00
Jocelyn Falempe
c6d7ab7c1f
loader: Fix typo in __DRI_IMAGE_FORMAT_XBGR16161616 definition
...
The X and A format are inverted by mistake.
Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com >
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31707 >
2024-10-25 14:18:24 +00:00
Pierre-Eric Pelloux-Prayer
60f7b2fc9f
radeonsi/ci: mark *.tessellation_shader_tessellation.max_in_out_attributes as fixed
...
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31684 >
2024-10-25 13:36:54 +00:00
Pierre-Eric Pelloux-Prayer
9434ac65f4
glsl: use nir_io_add_const_offset_to_base in gl_nir_opts
...
This fixes:
KHR-GLES32.core.tessellation_shader.tessellation_shader_tessellation.max_in_out_attributes
Without this change the assert in gather_output is hit:
assert(!nir_src_is_const(offset) || nir_src_as_uint(offset) == 0)
Because nir_opt_algebraic determines that some ssa values are constant,
but the nir_io_add_const_offset_to_base wasn't run afterwards.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31684 >
2024-10-25 13:36:54 +00:00
Pierre-Eric Pelloux-Prayer
60578df33a
nir: skip offset=0 in nir_io_add_const_offset_to_base
...
When offset=0, the pass was a no-op but was setting the progress
flag which could cause infinite loops when this pass is going
to be added to gl_nir_opts.
Reviewed-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31684 >
2024-10-25 13:36:54 +00:00
David Rosca
f24c799c67
radeonsi/vcn: Only enable skip mode with matching references
...
Skip mode frames must match the reference frames otherwise skip
mode needs to be disabled.
Fixes: 1e1f078099 ("radeonsi/vcn: Add support for VCN5 AV1 compound")
Reviewed-by: Ruijing Dong <ruijing.dong@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31805 >
2024-10-25 13:09:15 +00:00
Samuel Pitoiset
38d7492391
ci: uprev VKCTS to 1.3.10.0
...
This tag contains tests for DGC EXT.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31789 >
2024-10-25 14:03:37 +02:00
Joshua Ashton
c66fd95d92
radv: Fix sample locations at 0 for X/Y
...
We cannot set the {X,Y}MAX_RIGHT_EXCLUSION bits
if we have a sample location at a pixel boundary.
CTS does not seem to be catching this.
Signed-off-by: Joshua Ashton <joshua@froggi.es >
Co-authored-by: Vitaliy Triang3l Kuzmin <triang3l@yandex.ru >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31839 >
2024-10-25 11:24:12 +00:00
Joshua Ashton
130a423118
radv: Enable variableSampleLocations
...
This should come for free now we are dynamic
rendering based.
This passes CTS on RX 7900XTX.
Signed-off-by: Joshua Ashton <joshua@froggi.es >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31839 >
2024-10-25 11:24:12 +00:00
Rhys Perry
8efc765a3d
nir/algebraic: fix shfr optimization with zero src2
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Fixes: 08903bbe89 ("nir: add mqsad_4x8, shfr and nir_opt_mqsad")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31808 >
2024-10-25 09:59:40 +00:00
Rhys Perry
b2abd3bdba
nir: fix shfr constant folding with zero src2
...
No fossil-db changes.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Fixes: 08903bbe89 ("nir: add mqsad_4x8, shfr and nir_opt_mqsad")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31808 >
2024-10-25 09:59:40 +00:00
Eric Engestrom
03f056ea71
ci: skip slow tests on all non-"full" jobs
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31828 >
2024-10-25 08:26:31 +00:00
Eric Engestrom
bedb2f8a86
ci: rename "merge-skips" to "slow-skips" as they're about to be used outside of merge piplines
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31828 >
2024-10-25 08:26:31 +00:00
Samuel Pitoiset
927a17f30a
amd: do not emit PA_SU_PRIM_FILTER_CNTL in the common GFX preamble
...
RADV needs to adjust this register for user sample locations because
it seems possible to have a sample on the -8 coordinate.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31815 >
2024-10-25 07:41:22 +00:00
Samuel Pitoiset
3d172d08b0
radv: do no emit PA_SC_CONSERVATIVE_RASTERIZATION_CNTL in the preamble on GFX12
...
It's already emitted as part of the cmdbuf.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31815 >
2024-10-25 07:41:22 +00:00
Samuel Pitoiset
56cffd4b9b
radv: simplify determining if a graphics pipeline uses NGG culling
...
has_ngg_culling can only be TRUE if the last VGT shader also uses NGG.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31829 >
2024-10-25 07:10:28 +00:00
Samuel Pitoiset
62efebfd70
radv: fix emitting NGG culling state for ESO
...
It's possible to enable NGG culling with ESO if shaders are linked, or
if the VS doesn't need a prolog or if TES is used. This wasn't
supposed to be enabled but I think it worked just by luck because the
user SGPR value was probably zero and NGGC was disabled at draw time.
Found by inspection.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31829 >
2024-10-25 07:10:27 +00:00
Samuel Pitoiset
982af1a2bc
radv: capture shader statistics when RGP is enabled
...
This is useful in order to correlate shader hashes between RGP and
Fossilize. This is because Fossilize needs to pass the capture
statistics flag for getting shader hashes and the pipeline key won't
match otherwise.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31820 >
2024-10-25 06:29:02 +00:00
Eric Engestrom
460c2eb967
ci: move shellcheck options to .shellcheckrc
...
That way, IDEs get to have the same behaviour as the CI
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31826 >
2024-10-24 22:43:03 +00:00
Francisco Jerez
e2eba3c7da
intel/brw/xe2+: Adjust performance analysis divergence weight due to EU fusion removal.
...
This reduces the penalty the heuristic gives to SIMD32 shaders
relative to SIMD16 in presence of discard control flow on Xe2+. The
penalty was meant to account for the inefficient divergence behavior
of SIMD32 shaders on Gfx12.x platforms, since Gfx12 hardware had EUs
bundled in groups of two, and each pair shared control flow logic so
both EUs could only execute instructions in lockstep, which meant that
SIMD32 shaders had an effective warp size of 64 on Gfx12.x.
This change switches back to more optimistic modelling of discard
divergence. With it we gain about 6% performance in a Shadow of the
Tomb Raider trace (tested on BMG).
One may wonder if there are still workloads that would suffer
materially from enabling SIMD32 for all pixel shaders on Xe2 instead
of using this heuristic, since Xe2 EUs have twice the GRF space, twice
the FPU throughput and better divergence behavior than Xe, but the
answer seems to be yes unfortunately: E.g. Superposition has some
pixel shaders where SIMD32 has substantially worse scheduling due to
the increased number of false dependencies due to higher register
pressure, and using SIMD32 for them reduces performance significantly.
The heuristic seems to model this correctly so it doesn't look like we
can do without it at least right now on Xe2.
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31697 >
2024-10-24 22:06:52 +00:00
Kenneth Graunke
7bed11fbde
intel/brw: Allow immediates in the BFE instruction on Gfx12+
...
We weren't allowing immediates in BFE at all. Gfx12+ supports
immediates in src0 (value) and src2 (width), but not src1 (offset).
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31437 >
2024-10-24 21:31:28 +00:00
Patrick Lerda
d19e2597ce
r600: fix spec ext_packed_depth_stencil getteximage
...
This very test was working until the commit 4da147a02b
("mesa: remove fallback for GL_DEPTH_STENCIL"). Indeed this
commit lets the driver handles this path and this was
failing on evergreen r600.
The test was processed through r600_blit() which loads the
fragment shader util_make_fs_blit_zs(). This fragment shader
loads two textures the stencil and depth. The texture depth
was processed properly but the other texture was generating
incorrect values. This issue, which seems to be related to
the hardware configuration, disappears when the underlying
surface is allocated using a width multiple of 32.
This change was tested on cayman and palm with the normal test:
"piglit/bin/ext_packed_depth_stencil-getteximage -auto -fb" and
the test was modified to test all the relevant width and height
values. The gpu rv770 was not affected by this issue. Here is
the result:
spec/ext_packed_depth_stencil/getteximage: fail pass
Cc: mesa-stable
Signed-off-by: Patrick Lerda <patrick9876@free.fr >
Acked-by: Gert Wollny <gert.wollny@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31757 >
2024-10-24 21:06:36 +00:00
Aditya Swarup
e98759c7f4
anv: Use RCS engine for copying stencil resource for gfx125
...
HSD 14021541470 lists a HW bug on blitter engine where the compression pairing bit is
not programmed correctly for stencil resources.
Use RCS Engine to perform copy instead.
Signed-off-by: Aditya Swarup <aditya.swarup@intel.com >
Reviewed-by: Tapani Pälli <tapani.palli@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31792 >
2024-10-24 20:14:13 +00:00
Chia-I Wu
5fea98c4a1
panvk: fix scissor box
...
Fix a typo in prepare_vp which causes incorrect scissor box with
non-zero X in viewport/scissor.
Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Signed-off-by: Chia-I Wu <olvaffe@gmail.com >
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31832 >
2024-10-24 12:49:02 -07:00
Chia-I Wu
029b8b11a0
panvk: fix gl_VertexIndex
...
According to pandecode, r32 is global attribute offset and r36 is vertex
offset. Follow panfrost to use r36 instead of r32 for both non-indexed
firstVertex and indexed vertexOffset.
With this, gl_VertexIndex stops being zero-based which is incorrect.
Fixes: 5544d39f44 ("panvk: Add a CSF backend for panvk_queue/cmd_buffer")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31810 >
2024-10-24 18:19:48 +00:00
Georg Lehmann
b79950fc1f
aco: remove heuristic that restricts VOP2/C with 2 sgprs
...
Looking at the stats, the slightly increased code size isn't a problem
compared to the benefits. This also only affects gfx10+, and those generations
aren't throughput limited by 64bit instructions like early gcn.
Foz-DB Navi21:
Totals from 12377 (15.59% of 79395) affected shaders:
MaxWaves: 269323 -> 269857 (+0.20%); split: +0.23%, -0.03%
Instrs: 16505304 -> 16472552 (-0.20%); split: -0.21%, +0.01%
CodeSize: 89815804 -> 90130344 (+0.35%); split: -0.02%, +0.37%
VGPRs: 661160 -> 658640 (-0.38%); split: -0.40%, +0.02%
SpillSGPRs: 3032 -> 3049 (+0.56%)
SpillVGPRs: 826 -> 796 (-3.63%)
Latency: 145800231 -> 145818568 (+0.01%); split: -0.14%, +0.15%
InvThroughput: 39026010 -> 38892467 (-0.34%); split: -0.36%, +0.02%
VClause: 325693 -> 325992 (+0.09%); split: -0.12%, +0.21%
SClause: 497938 -> 497208 (-0.15%); split: -0.23%, +0.08%
Copies: 1239036 -> 1204045 (-2.82%); split: -2.90%, +0.07%
Branches: 462952 -> 462934 (-0.00%); split: -0.01%, +0.00%
PreSGPRs: 586066 -> 587558 (+0.25%)
PreVGPRs: 550024 -> 547736 (-0.42%)
VALU: 11147608 -> 11114528 (-0.30%); split: -0.31%, +0.01%
SALU: 2105546 -> 2105131 (-0.02%); split: -0.03%, +0.01%
VMEM: 575983 -> 575923 (-0.01%)
Foz-DB Navi31:
Totals from 11544 (14.54% of 79395) affected shaders:
MaxWaves: 319612 -> 319804 (+0.06%)
Instrs: 17563158 -> 17527341 (-0.20%); split: -0.22%, +0.02%
CodeSize: 92366832 -> 92626280 (+0.28%); split: -0.03%, +0.31%
VGPRs: 667620 -> 665484 (-0.32%); split: -0.33%, +0.01%
SpillSGPRs: 3418 -> 3434 (+0.47%)
SpillVGPRs: 896 -> 858 (-4.24%)
Scratch: 4738048 -> 4736512 (-0.03%)
Latency: 141366653 -> 141399756 (+0.02%); split: -0.10%, +0.12%
InvThroughput: 26213994 -> 26165751 (-0.18%); split: -0.21%, +0.03%
VClause: 307956 -> 308124 (+0.05%); split: -0.12%, +0.18%
SClause: 477816 -> 477326 (-0.10%); split: -0.18%, +0.08%
Copies: 1161148 -> 1129386 (-2.74%); split: -2.81%, +0.08%
Branches: 411509 -> 411506 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 531354 -> 535027 (+0.69%)
PreVGPRs: 525201 -> 521861 (-0.64%)
VALU: 10360363 -> 10330274 (-0.29%); split: -0.30%, +0.01%
SALU: 1778044 -> 1777585 (-0.03%); split: -0.04%, +0.01%
VMEM: 551379 -> 551303 (-0.01%)
VOPD: 3539 -> 3471 (-1.92%); split: +0.14%, -2.06%
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31804 >
2024-10-24 17:44:13 +00:00
Georg Lehmann
54fa55a3f7
radv: don't use v_mqsad_u32_u8 on gfx7
...
According to tests on hawaii, v_mqsad_u32_u8 always uses saturating accumulation
while v_msad_u8 truncates. GFX8+ can control this with the VOP3 clamp bit,
on older hardware that's not supported.
We want truncation for the NIR opcode.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12062
Fixes: c3c138b10f ("radv: optimize msad_4x8 to mqsad_4x8")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31809 >
2024-10-24 17:20:56 +00:00
Eric Engestrom
a85ed2a28f
lavapipe/ci: document regression in the commit range 765d1c47...366f63fd
...
There's a cts uprev in one of these commits, so it's possible they're all just new tests.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31825 >
2024-10-24 16:50:44 +00:00
Eric Engestrom
150fd992b6
lavapipe/ci: skip builtin ray query tests that take too long and time out
...
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31825 >
2024-10-24 16:50:44 +00:00