If the expression tree that is being replaced has a unary operation at
its root, set the cursor (location where new instructions are inserted)
at the source instruction instead.
This doesn't do much now because there are very few patterns that have a
unary operation as the root. Almost all of the patterns that do have a
unary operation as the root have inot. All of the shaders that are
affected by this commit have expression trees with an inot at the root.
This change prevents some significant, spurious caused by the next
commit. There is further explanation in the large comment added in
the code.
I also considered a couple other options that may still be worth exploring.
1. Add some mark-up to the search pattern to denote where new
instructions should be added. I considered using "@" to denote the
cursor location. For example,
(('fneg', ('fadd@', a, b)), ...)
2. To prevent other kinds of unintended code motion, add the ability to
name expressions in the search pattern so that they can be reused in
the replacement. For example,
(('bcsel', ('ige', ('find_lsb=b', a), 0), ('find_lsb', a), -1), b),
An alternative would be to add some kind of CSE at the time of
inserting the replacements. Create a new instruction, then check to
see if it already exists. That option might be better overall.
Over the years I know Matt has heard me complain, "I added a pattern
that just deleted an instruction, but it added a bunch of spills!" This
was always in large, complex shaders that are very hard to analyze. I
always blamed these cases on the scheduler being dumb. I am now very
suspicious that unintended code motion was the real problem.
All Gen4+ Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 17611405 -> 17611333 (<.01%)
instructions in affected programs: 18613 -> 18541 (-0.39%)
helped: 41
HURT: 13
helped stats (abs) min: 1 max: 18 x̄: 4.46 x̃: 4
helped stats (rel) min: 0.27% max: 5.68% x̄: 1.29% x̃: 1.34%
HURT stats (abs) min: 1 max: 20 x̄: 8.54 x̃: 7
HURT stats (rel) min: 0.30% max: 4.20% x̄: 2.15% x̃: 2.38%
95% mean confidence interval for instructions value: -3.29 0.63
95% mean confidence interval for instructions %-change: -0.95% 0.02%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 338366118 -> 338365223 (<.01%)
cycles in affected programs: 257889 -> 256994 (-0.35%)
helped: 42
HURT: 15
helped stats (abs) min: 2 max: 120 x̄: 39.38 x̃: 34
helped stats (rel) min: 0.04% max: 2.55% x̄: 0.86% x̃: 0.76%
HURT stats (abs) min: 6 max: 204 x̄: 50.60 x̃: 34
HURT stats (rel) min: 0.11% max: 4.75% x̄: 1.12% x̃: 0.56%
95% mean confidence interval for cycles value: -30.39 -1.02
95% mean confidence interval for cycles %-change: -0.66% -0.02%
Cycles are helped.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
This change incurs a small amount of hurt now, but it enables a lot of
benefit on vec4 shaders on the next commit. nir_opt_algebraic_late
converts dph, dot3, etc. to dhp_replicated, dot_replicated3, etc. In
the process, it introduces extra moves. If the original NIR contained
vec1 32 ssa_45 = fdot4 ssa_51, ssa_44
vec1 32 ssa_46 = fneg ssa_45
nir_opt_algebraic_late will produce
vec4 32 ssa_18 = fdot_replicated4 ssa_1, ssa_15
vec1 32 ssa_19 = mov ssa_18.x
vec1 32 ssa_17 = fneg ssa_19
The algebraic pass added in the next commit can't see through the move
to know that the fneg applies to a fdot_replicated4.
Haswell, Ivy Bridge, and Sandybridge had similar results. (Haswell shown)
total cycles in shared programs: 187077604 -> 187079858 (<.01%)
cycles in affected programs: 350132 -> 352386 (0.64%)
helped: 174
HURT: 194
helped stats (abs) min: 2 max: 124 x̄: 23.60 x̃: 16
helped stats (rel) min: 0.12% max: 15.88% x̄: 4.98% x̃: 3.86%
HURT stats (abs) min: 2 max: 164 x̄: 32.78 x̃: 16
HURT stats (rel) min: 0.17% max: 22.82% x̄: 6.46% x̃: 0.86%
95% mean confidence interval for cycles value: 2.04 10.21
95% mean confidence interval for cycles %-change: 0.17% 1.93%
Cycles are HURT.
No shader-db changes on any other Intel platform.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/1359>
Tell users what the base address of the image needs to be aligned to.
These values are based on experimentation via passing an offset to
vkBindImageMemory with turnip and seeing if tests still pass. Note that
r8g8 is also special in this regard, however it actually has an
increased alignment (in bytes).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4357>
When we originally wrote a bunch of the allocation data structures, we
re-used the GPU memory for CPU-side data structures. It's a bit more
memory efficient and usually ok. However, this has a couple of
problems:
1. It makes it MUCH more likely that the GPU will accidentlly stomp
CPU-side data structures and cause nearly impossible to debug
crashes.
2. With discrete GPUs, the memory will be mapped somehow and that map
may be across the BAR so it could have horribly slow CPU access.
This is bad for our CPU-side data structures.
In the case of anv_state_stream, it also made the data structure
massively more complex than it needed to be.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Tested-by: Marge Bot <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
If we have an allocation that's exactly the block size, we end up
computing a new block size to allocate that's exactly the block size,
add in the header, and then assert fail. When computing the block size,
we need to account for the header.
Fixes: 955127db93 "anv/allocator: Add support for large stream..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4336>
With EGL_EXT_image_dma_buf_import, user can import dma_buf
with offset.
This is also used by AOSP GLConsumer::updateTexImage
with HAL_PIXEL_FORMAT_YV12 buffer which store YUV planes in
the same buffer with offset. Render sample from it using
GL_OES_EGL_image_external. This should fix some video
display problem when using MediaCodec soft decoding which
generates HAL_PIXEL_FORMAT_YV12 buffer and render it on
screen.
Test program:
https://github.com/yuq/gfx/tree/master/yuv2rgb/dma-buf
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Signed-off-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4362>
Now that its Gallium dependencies have been resolved, we can move this
all out to root. The only nontrivial change here is keeping the
pandecode calls in Gallium-panfrost to avoid creating a circular
dependency between encoder/decoder. This could be solved with a third
drm folder but this seems less intrusive for now and Roman would
probably appreciate if I went longer than 8 hours without breaking the
Android build.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4382>