I don't know of any GPUs doing 16-bit atomic accesses, nor do I know of
anybody wanting that in shaders. But deqp has GLES CTS cases that set
mediump on shared variables, so just skip lowering for those vars.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18452>
Optimizes patterns which are created by recent versions of vkd3d-proton,
when constant folding doesn't eliminate it entirely:
- ubitfield_extract(value, offset, umin(bits, 32-(offset&0x1f)))
- ibitfield_extract(value, offset, umin(bits, 32-(offset&0x1f)))
- bitfield_insert(base, insert, offset, umin(bits, 32-(offset&0x1f)))
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13225>
For non-bindless textures, get the base address of the texture
descriptor array, so we can crawl descriptors in the shader. For
bindless, this isn't needed (since the bindless handle will be the
address itself).
jekstrand suggested the idea of the descriptor crawl. It worked out
pretty well, all considered.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18525>
The option struct passed to nir_lower_blend doesn't have a "blending
disabled" flag. Unless blending is skipped due to logic ops or
framebuffer formats, nir_lower_blend always blends, even if the blend
mode is "replace" (corresponding to the API level blend disable).
That's mostly okay, since NIR can optimize out the code, at the expense
of a little compile time. However, there's a catch: nir_lower_blend
emits fsat at the start of the shader (for UNORM framebuffers, or
fsat_signed for SNORM). We can expect hardware to saturate the input to
store_output itself, so these operations are redundant, but it's tricky
to optimize these instructions out otherwise. Don't even try: detect the
replace blend mode and don't call nir_blend in that case. Colour masking
is still applied as usual.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Emma Anholt <emma@anholt.net>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18535>
When the workgroup is 1 dimensional, simply use a vec3
filled with zeroes and the local invocation index.
This is is better than lower_id_to_index + constant folding,
because this way we don't leave behind extra ALU instrs.
Note, this is relevant to mesh shaders on RDNA2 because
it enables us to better detect cross-invocation output
access.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18464>
This also prevents some small regressions in "glsl: remove GLSL IR
inverse comparison optimisations".
shader-db results:
All Sandy Bridge and newer Intel platforms had similar results. (Ice Lake shown)
total instructions in shared programs: 19941025 -> 19940805 (<.01%)
instructions in affected programs: 52431 -> 52211 (-0.42%)
helped: 188 / HURT: 6
total cycles in shared programs: 858451784 -> 858431633 (<.01%)
cycles in affected programs: 2119134 -> 2098983 (-0.95%)
helped: 183 / HURT: 12
LOST: 2
GAINED: 0
Iron Lake and GM45 had similar results. (Iron Lake shown)
total instructions in shared programs: 8364668 -> 8364670 (<.01%)
instructions in affected programs: 753 -> 755 (0.27%)
helped: 2 / HURT: 4
total cycles in shared programs: 248752572 -> 248752238 (<.01%)
cycles in affected programs: 87290 -> 86956 (-0.38%)
helped: 2 / HURT: 4
fossil-db results:
Skylake, Ice Lake, and Tiger Lake had similar results. (Ice Lake shown)
Instructions in all programs: 144909184 -> 144909130 (-0.0%)
Instructions helped: 6
Cycles in all programs: 9138641740 -> 9138640984 (-0.0%)
Cycles helped: 8
Reviewed-by: Timothy Arceri <tarceri@itsqueeze.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18006>
Ever since 4246c2869c and 7d85dc4f35 loop unrolling can no
longer depend on inot being eliminated from the loop
terminator condition so we need to be able to handle it.
Here we simply check to see if the inot contains a simple
terminator condition we previously handled. We also update
the previous users of this function to use a newly name
copy of the previous behaviour
nir_is_terminator_condition_with_two_inputs().
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18006>
A task shader must use this instruction to specify the dimensions
of the launched mesh shader workgroups.
It is a terminating instruction.
When the task shader doesn't have the optional payload, use the
pre-existing launch_mesh_workgroups intrinsics.
When the task shader has a payload, use a new
launch_mesh_workgroups_with_payload_deref intrinsics which has
a deref that refers to the payload variable.
We also add this new intrinsic to nir_lower_io which lowers this
to the pre-existing explicit intrinsic.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18366>
This reverts commit 6f25d45877, replacing it
with GLSL_PRECISION_MEDIUM. The previous commit ended up not being the
right approach, as it affected only nir vars for spirv phis and not other
nir vars, and we want a tool that does both. The new
nir_lower_mediump_vars pass can do that for you.
No fossil-db change for my angle fossils run on radv.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18259>
SPIRV and GLSL are reasonable at converting ALU ops to mediump, but
variable storage would be wrapped in a 2f32/2mp on store/load, and if
nir_vars_to_ssa doesn't make that storage go away then you'd have extra
conversions. For compute shader shared mem, you'd waste memory too.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18259>
Take this real-world (trimmed) shader:
precision highp float;
in lowp vec4 var_varVertexColor;
layout(location = 0) out vec4 out_FragColor0;
void main() {
vec4 textureColor0 = vec4(1.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00);
vec3 color = vec3(1.000000e+00, 1.000000e+00, 1.000000e+00);
vec4 outColor = vec4(vec3((color).rgb), 1.000000e+00);
(outColor *= vec4(var_varVertexColor));
(out_FragColor0 = outColor);
}
After opts, it's just a store from input to output. If we decide to lower
the input to 16-bit, then as long as the driver can handle 16-bit outputs,
it would be a good idea to demote the output and save the conversions.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18003>