Before, we would prefer to schedule inputs before kills, which works
assuming that the live range of the bary_ij system value don't get
split and therefore all bary.f are ready at the start of the block.
However live range splitting can mess up that assumption and cause a
kill to get scheduled before a move that leads to a bary.f.
This fixes even e.g. dEQP-GLES2.functional.shaders.discard.basic_always
on a3xx before introducing CSE of collect instructions, but even after
that it could be a problem theoretically as the register allocator
doesn't guarantee that any live ranges aren't split.
Reviewed-by: Eric Anholt <eric@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10143>
Prior to this commit
(nop3) mad.f32 r0.y, c0.x, r1.w, c0.y
was counted as 4 cat3 instructions (and still 3 cat0/nops) in shader-db
results. With this change, it is counted as only 1 cat3 instruction.
Probably never going to have better shader-db results than this in my
career:
total cat2 in shared programs: 1214667 -> 732058 (-39.73%)
cat2 in affected programs: 1194729 -> 712120 (-40.39%)
helped: 8551
HURT: 0
total cat3 in shared programs: 376448 -> 274745 (-27.02%)
cat3 in affected programs: 344918 -> 243215 (-29.49%)
helped: 7222
HURT: 0
Reviewed-by: Rob Clark <robdclark@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10116>
This implements a NIR pass that groups together constant UBO loads
for the same UBO index in order of increasing offset when the distance
between them is small enough that it enables the "skip unifa write"
optimization.
This may increase register pressure because it can move UBO loads
earlier, so we also add a compiler strategy fallback to disable the
optimization if we need to drop thread count to compile the shader
with this optimization enabled.
total instructions in shared programs: 13557555 -> 13550300 (-0.05%)
instructions in affected programs: 814684 -> 807429 (-0.89%)
helped: 4485
HURT: 2377
Instructions are helped.
total uniforms in shared programs: 3777243 -> 3760990 (-0.43%)
uniforms in affected programs: 112554 -> 96301 (-14.44%)
helped: 7226
HURT: 36
Uniforms are helped.
total max-temps in shared programs: 2318133 -> 2333761 (0.67%)
max-temps in affected programs: 63230 -> 78858 (24.72%)
helped: 23
HURT: 3044
Max-temps are HURT.
total sfu-stalls in shared programs: 32245 -> 32567 (1.00%)
sfu-stalls in affected programs: 389 -> 711 (82.78%)
helped: 139
HURT: 451
Inconclusive result.
total inst-and-stalls in shared programs: 13589800 -> 13582867 (-0.05%)
inst-and-stalls in affected programs: 817738 -> 810805 (-0.85%)
helped: 4478
HURT: 2395
Inst-and-stalls are helped.
total nops in shared programs: 354365 -> 342202 (-3.43%)
nops in affected programs: 31000 -> 18837 (-39.24%)
helped: 4405
HURT: 265
Nops are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
This adds a minimum thread count parameter to each compilation strategy with
the intention to limit the minimum allowed thread count that can be used to
register allocate with that strategy.
For now all strategies allow the minimum thread count supported by the
hardware, but we will be using this infrastructure to impose a more
strict limit in an upcoming optimization.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10100>
This can be called outside a render pass so we should not expect to have
a job available. Also, we should not be emitting state here, instead we
should do in the pre-draw handler with all the other draw call state.
Fixes cases of crashes in RenderDoc when selecting elements in the
Event Browser.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10130>
first_component is an uint, and thus if it takes value 0 we can't know
if it is because writemask has its first bit to 1, or all bits to 0.
As we want to ensure that at least one bit is set, apply the assertion
in writemask.
Fixes CID#1472829 "Macro compares unsigned to 0 (NO_EFFECT)".
v2:
- Restore "first_component <= last_component" assertion (Iago)
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10103>
SB si known to be buggy and the ultimate aim is to make it go away. To
test workloads with better optimizations it makes sense to be able to
enable SB, but for the NIR backend it should not be enabled together
with NIR the default. Therefore an a specific debug option "nirsb" that
enables NIR with SB.
Fixes: 3b27243b01
r600: Enable sb also for NIR
Signed-off-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10108>
This format-string seems to have been incorrect since it's inception.
But there's also been commits that have both forgotten to add and remove
flags as appropriate as well.
Let's correct the format-list. This was done by counting by hand. A
better solution for the long-term is coming in a future commit.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9896>
Whenever we have an ALU op that's operating on a double, we'll unpack
it as an integer, then repack it as a float. When we have an ALU op that
returns a double, we'll unpack it as a double, then repack it as an integer.
Then, simple algebraic opts will remove any redundant unpack/repack ops,
so we should be left with constructing and deconstructing doubles using
the right operations.
Reviewed-by: Enrico Galli <enrico.galli@intel.com>
Reviewed-by: Michael Tang <tangm@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
HLSL doesn't support bitcasting a 64bit integer to a double. DXIL
doesn't have generic pack/unpack instructions, so we lower those to
integer bitwise ops. As a result, NIR generic double pack/unpack would
require our backend to emit a bitcast to get a double, but we want
to match HLSL semantics and emit MakeDouble/SplitDouble.
Adding a dedicated opcode for double pack/unpack allows us to add a
pass to emit that instead, which lets our backend emit the right
instruction to pack and unpack doubles.
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10063>
unsetting zink from GALLIUM_DRIVER is required in order for lavapipe to
work, but setting it back is totally broken in the case where an app
creates a ton of screens simultaneously
instead, just leave it set to llvmpipe, and if a race condition occurs,
at least llvmpipe isn't going to fail a test that zink passes
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10120>