There are a few cross-stage states that we absolutely have to wait to
emit until we know more than one stage. Pull these into the program
config draw state, so that we can split up the program draw state into a
per-stage draw state. For VK_EXT_shader_object, these will have to
emitted at draw time.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25076>
Emit the registers solely based on whether FS reads PrimID, and assume
the HW will do the right thing and disable PrimID passthru when GS is
enabled. This untangles these registers so we can set them from the FS
draw state in the future.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25076>
It turns out that the hardware automatically selects whether PrimID
passthrough needs to happen based on whether GS is enabled, which means
that it's safe to always set these registers based whether PrimID is
read by the FS and the hardware will ignore them when GS is enabled. Use
the real names for these registers to make it less confusing when we
start to do that.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25076>
This removes more draw states that are commonly set together. We still
have a separate draw state for RB_DEPTH_CNTL, because it depends on
other things like the attachment state and depth clamp and it would be
more difficult for layers like zink to use a combined depth/stencil
state.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25076>
There's no need to separate them except that it was easier before, no
one will enable the second without also enabling the first. Now that
mesa will merge the states for us we can go ahead and merge them.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25076>
We only need to emit MSAA state once per subpass at most, unless the
pipeline switches primitive types or for framebuffer-less subpasses
(which always use sysmem anyway). Therefore it seems like draw state
skipping isn't going to bring much benefit here, and having it as a draw
state in the first place is a remnant of how this used to be part of the
pipeline state.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25076>
While we don't need to emit all of the unused mesh/task states when mesh
is disabled, if we don't have them we fail some assertions in the
difference checks due to the corresponding state being empty.
This may happen when going from a mesh pipeline to a non-mesh one, or
one that uses task shaders to one that doesn't.
It may be possible to avoid having to do this, but I'd rather start from
a working state and optimize it later.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25109>
For some specific texture sizes, notably some texture sizes with width
4096, block stride calculation could end up calculating stride 256 which
is an invalid value.
In those specific cases, this could cause rendering artifacts or
application/driver crashes.
Cc: mesa-stable
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25084>
this has a lot of caveats:
* extension must be supported
* resource must have usage bit set
* resource must not have any pending batch usage
* resource must be in supported layout
if all of these conditionals pass, then HIC can be used for direct image subdata
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24775>
Flushing and invalidating caches isn't necessary for workgroup scope
fences. In fact, the DP_FLUSH_TYPE docs (BSpec 54041) say:
"If the fence scope is Local or Threadgroup, HW ignores the flush
type and operates as if it was set to None(no flush)"
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842>
With the new nir_opt_barrier_modes() pass, we may encounter control
barriers with no memory modes set, such as:
@barrier () (execution_scope=WORKGROUP, memory_scope=WORKGROUP, mem_semantics=ACQ|REL, mem_modes=0)
The DXIL validator documentation [1] mentions an
INSTR.BARRIERMODENOMEMORY validation rule:
"sync must include some form of memory barrier - _u (UAV) and/or
_g (Thread Group Shared Memory). Only _t (thread group sync) is
optional."
We were generating a dx.op.barrier instruction with only one flag,
DXIL_BARRIER_MODE_SYNC_THREAD_GROUP. This seems to run afoul of the
above validator rule. So, this patch adjusts the code generator to
set DXIL_BARRIER_MODE_UAV_FENCE_THREAD_GROUP too, whenever
UAV_FENCE_GLOBAL isn't required.
[1] https://github.com/microsoft/DirectXShaderCompiler/blob/main/docs/DXIL.rst
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842>
Most drivers will want nir_opt_barrier_modes() to optimize out
unnecessary memory barrier modes. However, virgl has to translate
back to GLSL, which means it can really only handle partial memory
barriers in compute shaders today, because there isn't a proper
way to express them otherwise. Just ask nir_to_tgsi to promote
these back to full barriers as a workaround.
See KHR-GL43.shader_storage_buffer_object.advanced-readWrite-case1
on virpipe-on-gl as a case where this hack is needed.
Acked-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/24842>