This changes the trace rays logic to always use
VkTraceRaysIndirectCommand2KHR and implements
vkCmdTraceRaysIndirect2KHR. I renamed the
load_sbt_amd to sbt_base_amd and moved the SBT
load lowering from ACO to NIR.
Note that we can not just upload one pointer to
all the trace parameters because that would
be incompatible with traceRaysIndirect.
Signed-off-by: Konstantin Seurer <konstantin.seurer@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16430>
according to the spec, atomic counters can be bound at any offset divisible by 4,
which means that any driver that uses the ssbo lowering pass and doesn't have
a min offset align of 4 is potentially broken
to handle this, use a statevar to inject the misaligned remainder of the offset
into the shader as a uniform. for well-aligned counter binds, the uniform offset
will be 0
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16749>
Doing this in NIR should give better results, but also allows us to
stop calling more GLSL IR optimisations passes.
v2: Skip 8bit and 16bit type that would require further processing
I believe this is an existing bug in the GLSL IR pass also.
v3: rebuild constant initialisers as we want to call this pass
after nir has already lowered them and performed optimisations.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> (v1)
Acked-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16770>
For the NIR XFB gathering as well as all the Vulkan drivers, buffer
strides in nir_xfb_info are in bytes. When Marek started using
nir_xfb_info for GLSL on radeonsi, he copied directly from the GLSL
struct which has strides in dwords. This inconsistency didn't show up
until I went through and started us using the NIR passes for GL drivers
directly without going through the GLSL structs. We could change the
nir_xfb_buffer_info field to be in dwords to be consistent with
shader_info but that would mean changing all the Vulkan drivers but, for
now, it's easier to always use bytes in nir_xfb_info.
Fixes: 2a22885a45 ("st,nir: Use nir_shader::xfb_info in nir_lower_io_passes")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16819>
A 1D texture operation may need to do a mov to turn a reference to a
channel of an SSA value into a scalar value to be passed as the texture
coordinate (since texture srcs can't do swizzles). Seen in
amnesia-the-dark-descent/low/46.shader_test() for example, where a 1D
texture is used to remap each of r,g,b from a previous texture result.
Besides, the nir_op_is_vec() case will (perhaps surprisingly) look through
a mov, anyway.
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16616>
This function allows to only scalarize instructions down to a desired
vectorization width. nir_lower_alu_to_scalar() was changed to use the
new function with a width of 1.
Swizzles outside vectorization width are considered and reduce
the target width.
This prevents ending up with code like
vec2 16 ssa_2 = iadd ssa_0.xz, ssa_1.xz
which requires to emit shuffle code in backends
and usually is not beneficial.
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
The callback allows to request different vectorization factors
per instruction depending on e.g. bitsize or opcode.
This patch also removes using the vectorize_vec2_16bit option
from nir_opt_vectorize().
Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13080>
During certain control-flow manipulation passes, we go out-of-SSA
temporarily in certain areas of the code to make control-flow
manipulation easier. This can result in registers being in phi sources
temporarily. If two sub-passes run before we get a chance to do
clean-up, we can end up doing some out-of-SSA and then a bit more
out-of-SSA and trigger this case. It's easy enough to handle.
Fixes: a620f66872 ("nir: Add a couple quick-and-dirty out-of-SSA helpers")
Fixes: 79a987ad2a ("nir/opt_if: also merge break statements with ones after the branch")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/6370
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16111>
The previous logic would just set the block to the instructions
original location if we couldn't evict it from a loop.
For now we only push const loads to a later block inside ifs
but we can add more heuristics later. This change helps a
hand full of shaders but also stops a CTS regression caused
by excess spilling after a series I'm working on to disable
more of the GLSL IR optimisation passes.
Shader-db results iris (BDW):
total instructions in shared programs: 17529759 -> 17529749 (<.01%)
instructions in affected programs: 15929 -> 15919 (-0.06%)
helped: 5
HURT: 2
helped stats (abs) min: 1 max: 5 x̄: 2.40 x̃: 2
helped stats (rel) min: 0.06% max: 0.15% x̄: 0.11% x̃: 0.12%
HURT stats (abs) min: 1 max: 1 x̄: 1.00 x̃: 1
HURT stats (rel) min: 0.06% max: 0.06% x̄: 0.06% x̃: 0.06%
95% mean confidence interval for instructions value: -3.34 0.49
95% mean confidence interval for instructions %-change: -0.14% 0.02%
Inconclusive result (value mean confidence interval includes 0).
total cycles in shared programs: 861109994 -> 861099681 (<.01%)
cycles in affected programs: 7027698 -> 7017385 (-0.15%)
helped: 95
HURT: 72
helped stats (abs) min: 1 max: 7995 x̄: 138.54 x̃: 9
helped stats (rel) min: <.01% max: 15.96% x̄: 0.54% x̃: 0.11%
HURT stats (abs) min: 1 max: 474 x̄: 39.56 x̃: 12
HURT stats (rel) min: <.01% max: 1.17% x̄: 0.20% x̃: 0.11%
95% mean confidence interval for cycles value: -159.05 35.54
95% mean confidence interval for cycles %-change: -0.45% 0.01%
Inconclusive result (value mean confidence interval includes 0).
total spills in shared programs: 17606 -> 17605 (<.01%)
spills in affected programs: 323 -> 322 (-0.31%)
helped: 1
HURT: 0
total fills in shared programs: 22599 -> 22598 (<.01%)
fills in affected programs: 1348 -> 1347 (-0.07%)
helped: 1
HURT: 0
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14940>
1. Lowers NV_mesh_shader TASK_COUNT output to launch_mesh_workgroups.
2. Removes all code after launch_mesh_workgroups, enforcing the
fact that it's a terminating instruction.
3. Ensures that task shaders always have at least one
launch_mesh_workgroups instruction, so the backend doesn't
need to implement a special case when the shader doesn't have it.
4. Optionally, implements task_payload using shared memory when
task_payload atomics are used.
This is useful when the backend is otherwise not capable of
handling the same atomic features as it can for shared memory.
If this is used, the backend only has to implement the basic
load/store operations for task_payload.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16720>