The old pass had a few bugs:
- It tried to avoid folding f2f32 into f2f16, but didn't consider
conversions that were already folded in.
- It didn't prevent folding an f2f16 or f2f32 into a non-floating-point
op.
In addition it wasn't written in a manner which made handling integer
conversions practical. This rewrites the pass to instead calculate the
"type" of the conversion source and then check whether folding the
conversion is allowed. This allows us to cleanly separate the
declarative part where we describe how the HW works from the policy part
where we decide whether the transform is allowed, and makes it simple to
add support for folding integer conversions.
Closes: #3208
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10859>
Many fragment shaders do a discard using relatively little information
but still put the discard fairly far down in the shader for no good
reason. If the discard is moved higher up, we can possibly avoid doing
some or almost all of the work in the shader. When this lets us skip
texturing operations, it's an especially high win.
One of the biggest offenders here is DXVK. The D3D APIs have different
rules for discards than OpenGL and Vulkan. One effective way (which is
what DXVK uses) to implement DX behavior on top of GL or Vulkan is to
wait until the very end of the shader to discard. This ends up in the
pessimal case where we always do all of the work before discarding.
This pass helps some DXVK shaders significantly.
v2 (Jason Ekstrand):
- Fix a couple of typos (Grazvydas, Ian)
- Use the new nir_instr_move helper
- Find all movable discards before moving anything so we don't
accidentally re-order anything and break dependencies
v3 (Pierre-Eric): remove the call to nir_opt_conditional_discard based
on Daniel Schürmann comment.
v4 (Pierre-Eric):
- handle demote intrinsics and drop derivatives_safe_after_discard
- add early return if discards/demotes aren't used
v5 (Pierre-Eric):
- use pass_flags instead of instr set (Daniel Schürmann)
v6 (Daniel Schürmann):
- cleanup and fix pass_flags handling
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10522>
With the addition of createImageWithModifiers usage flags were
dropped, as it was believed at the time that modifers will be a
full replacement for the usage flags. This has turned out to be
untrue, as modifiers are not able to describe buffer placement.
Add a new version of the interface, that allows to specifiy
use flags in addition to the modifier.
Signed-off-by: Simon Ser <contact@emersion.fr>
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8106>
The DRI image extension already has two different ways to allocate an
image (with and without a modifier) and will soon grow a third one.
Add a helper, which handles calling the appropriate implementation to
get rid of code duplication in the winsys.
This convert the two obvious call sites (GBM dri and EGL wayland)
that profit from the code dedup.
Signed-off-by: Lucas Stach <l.stach@pengutronix.de>
Reviewed-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8106>
This is the vec4 equivalent of d0d039a4d3, required for proper UBO
pushing in vertex stages for Vulkan on HSW. Sadly, the implementation
requires us to do everything in ALIGN1 mode and the vec4 instruction
scheduler doesn't understand HW_GRF <-> UNIFORM interference so it's
easier to do the whole thing in the generator. We add an instruction
to the top of the program which just means "emit the blob" and all the
magic happens in codegen.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
In order to avoid switching pull constants to push constants and then
having to back to pull, compute the push ranges up-front. This way we
know by the time we emit code exactly what ranges are pushable. This is
a bit inefficient in the case where the "normal" push constants get
compacted. However, most apps don't use giant piles of dead uniforms
combined with substantial UBO use so this should be ok.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
The way we handle spilling for fp64 in vec4 is to emit a series of MOVs
which swizzles the data around and then a pair of 32-bit spills. This
works great except that the next time we go to pick a spill reg, the
compiler isn't smart enough to figure out that the register has already
been spilled. Normally we do this by looking at the sources of spill
instructions (or destinations of fills) but, because it's separated from
the actual value by a MOV, we can't see it. This commit adds a new
opcode VEC4_OPCODE_MOV_FOR_SCRATCH which is identical to MOV in
semantics except that it lets RA know not to spill again.
Fixes: 82c69426a5 "i965/vec4: support basic spilling of 64-bit registers"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
The vec4 back-end can't push UBOs just yet but it soon will be able.
When it starts pushing UBOs, it will have a lower limit than scalar due
to a crummy register allocator. Mirror that limit in ANV so we don't
run into asserts due to ANV and the back-end making different choices.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10571>
This change adds a gallium D3D10 state tracker that works as a WDDM UMD
software driver, similar to Microsoft WARP, but using llvmpipe/softpipe.
The final deliverable is a d3d10sw.dll, which is similar to WARP's
d3d10warp.dll.
This has been used to run Microsoft Windows HCK wgf11* tests with
llvmpipe, and they were at one point passing 100%.
Known limitations:
- TGSI (no NIR)
- D3D10 only (no D3D11 support yet)
- no WINE integration (WINE doesn't implement WDDM DDI.)
For further details see:
- src/gallium/frontends/d3d10umd/README.md
- src/gallium/targets/d3d10sw/README.md
v2: Drop the DXBC-based disassembly. Add missing break statements.
v3: Incorporate Jesse's feedback.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Acked-by: Jesse Natalie <jenatali@microsoft.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10687>
Another lavapipe leak found with LeakSanitizer.
This happens when using tessellation without geometry shader but with a
fragment shader that consumes primitive ID, therefore requiring
primitive assembler stage.
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10835>