Add the string length parameter to the set_name(),
set_value() function to remove the conversion from
char* to std::string which takes extra work like
calling strlen() to compute the string length.
From the callback sampling in the perfetto tracing,
the ratio of trace_payload_as_extra_intel_end_draw_indexed
to intel_ds_end_draw_indexed drops from 63.80% to 59.65%
with this change.
v2: Add the data of the callback sampling to the description.
Signed-off-by: Andy Hsu <hwandy@google.com>
Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38073>
Per ARB_vertex_program spec result registers are 4-component and initially
undefined, and the FF fragment program expects its intputs to be
4-component too. So, if the client's vertex program does not write the
whole vector it will cause misrenderings unless the same client also
supplies fragment program that expects less than 4 componens.
This commit adds a workaround that initializes results to vec4(0, 0, 0, 1)
which seems to be an expected behavior for such clients.
Cc: mesa-stable
Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38295>
NIR is actually pretty good at optimizing UBO, SSBO, and shared memory
access but in order to do so, we actually have to run the optimizations
before we lower it all. Same for I/O. By doing all our lowering in
panvk before we ever run the optimization loop, we risk hampering it
significantly.
Ignoring loop changes (several get unrolled now), fossil-db on Sascha
Willems demos and a few others looks lik
Instrs: 189054 -> 187802 (-0.66%); split: -0.67%, +0.01%
CodeSize: 1756160 -> 1747072 (-0.52%); split: -0.52%, +0.01%
Estimated normalized CVT cycles: 771.367106999997 -> 766.0311719999971 (-0.69%); split: -1.05%, +0.36%
Estimated normalized SFU cycles: 1407.21875 -> 1406.9375 (-0.02%); split: -0.03%, +0.01%
Estimated normalized Load/Store cycles: 17477.0 -> 16917.0 (-3.20%)
Maximum number of threads: 1257 -> 1213 (-3.50%); split: +0.08%, -3.58%
Number of hardware loops: 283 -> 278 (-1.77%)
Totals from 186 (19.81% of 939) affected shaders:
Instrs: 102588 -> 101336 (-1.22%); split: -1.23%, +0.01%
CodeSize: 834432 -> 825344 (-1.09%); split: -1.10%, +0.02%
Estimated normalized CVT cycles: 463.226562 -> 457.890627 (-1.15%); split: -1.74%, +0.59%
Estimated normalized SFU cycles: 1021.84375 -> 1021.5625 (-0.03%); split: -0.05%, +0.02%
Estimated normalized Load/Store cycles: 8425.0 -> 7865.0 (-6.65%)
Maximum number of threads: 334 -> 290 (-13.17%); split: +0.30%, -13.47%
Number of hardware loops: 63 -> 58 (-7.94%)
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38334>
We need to lower outputs to get rid of output reads and so that we can
fix up layer writes on Bifrost. However, there's really no point in
lowering reads besides moving them to the top. Even then, NIR can
probably copy propagate the copies and we'll end up reading straight
from the input variable anyway.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38334>
Neither nir_lower_io() nor nir_lower_indirect_derefs() know what to do
with copy_deref so we need to get rid of those first. Also, there are
some NIR passes which can insert more copy_deref or propagate an
indirect load to the I/O variable so we want to lower those away right
before lowering I/O.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Christoph Pillmayer <christoph.pillmayern@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38334>
Xe2 uses byte offsets rather than OWord offsets. We've been storing the
per-slot offsets in bytes on Xe2 for a while, but kept the global offset
immediate in OWords for some reason, choosing to lower it during logical
send lowering.
This patch makes both offsets (global immediate, per-slot) in the same
units, so they could be added together if necessary without scaling.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
I noticed that our backend was completely ignoring writemasks, despite
them appearing on many of the intrinsics we're implementing.
Rhys Perry pointed out that nir_lower_mem_access_bitsizes is removing
all non-trivial writemasking today, so ssbo/global/shared/scratch/etc.
stores should only ever see all components enabled. Which means what
we're doing is legitimate, if non-obvious. Add an assert to make it
obvious.
Thanks a lot to Rhys for helping me rediscover what made this work.
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
The backend has been fully ignoring all writemasks for a long time,
so it really doesn't make sense to have them on our custom intrinsics.
I'm not sure they even make sense for some of the block intrinsics.
Also, the store_ssbo -> store_ssbo_intel pass was not setting writemask
at all, leaving it at the default value of 0 (aka write nothing, if it
had been respected...)
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38343>
Just compare against the size that was declared.
This is probably overkill. I couldn't figure out what vulkan says wrt
OOB access of shared memory. D3D however (which is very strict about
these things) says that for TGSM writes the entire contents of the TGSM
becomes undefined, for reads the result is undefined. Hence, rather
than masking out such accesses, to avoid the segfaults it would be
enough to just clamp the offsets to valid values.
nir doesn't seem easily able to tell us if an access is guaranteed
in-bound (unlike for ssbo access), so assume always potentially OOB.
v2: fix rusticl - for cl we don't know the shared size at compilation
time, this is only provided at launch_grid() time, the nir shader info
shared_size might be zero. Hence pass through the size via cs jit
context, there already actually was a member in there which looks
like it was intended for that (interestingly enough, the cs jit context
was actually unused, since resources are passed elsewhere nowadays).
Reviewed-by: Brian Paul <brian.paul@broadcom.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38307>