This is not safe when we have conditional spills since we could be
spilling disabled lanes with undefined values that could overwrite
valid data for those lanes from a previous spill of the same temp
that was unconditional (or that condionally enabled those same
lanes).
Fixes some Piglit OpenCL tests as well as the following OpenCL tests:
integer_divideAssign
integer_moduloAssign
integer_mad_sat
integer_ops integer_divideAssign
integer_ops integer_mad_sat
integer_ops integer_moduloAssign
integer_ops quick_char_math
integer_ops quick_short_math
math_brute_force half_powr
math_brute_force pow
math_brute_force pown
math_brute_force powr
math_brute_force rootn
Fixes: 597560e27c ('broadcom/compiler: always enable per-quad on spill operations')
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
The multop instruction implicitly writes rtop which is not preserved
acrosss thread switches. We can spill the sources of the multop
(since these would happen before multop) and the destination of
umul24 (since that would happen after umul24).
Fixes some OpenCL tests when V3D_DEBUG=opt_compile_time is used to
choose a different compile configuration.
cc: mesa-stable
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29909>
The semantics of discard differ between GLSL and HLSL and
their various implementations. Subsequently, numerous application
bugs occurred and SPV_EXT_demote_to_helper_invocation was written
in order to clarify the behavior. In NIR, we now have 3 different
intrinsics for 2 things, and while demote and terminate have clear
semantics, discard still doesn't and can mean either of the two.
This patch entirely removes nir_intrinsic_discard and
nir_intrinsic_discard_if and replaces all occurences either with
nir_intrinsic_terminate{_if} or nir_intrinsic_demote{_if} in the
case that the NIR option 'discard_is_demote' is being set.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27617>
We can emit spill setup before RA if we use scratch. In that case
we have the same situation as during spilling, with the caveat that
we have already emitted the instructions so we need to find them
(they should be the only instructions ones before the instructions
accessing payload registers) and flag them as such.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
We read our payload registers first in the shader so we generally don't have
to care about temps being allocated to them and stomping their value before
we can read them. Hoewer, spilling setup instructions are an exception since
these will be inserted first when there is any spilling in the program.
To fix this, we flag RA nodes involved with these instructions so we can
then try to avoid assiging these registers to them.
Fixes CTS failures with V3D_DEBUG=opt_compile_time, particularly:
dEQP-VK.binding_model.buffer_device_address.set0.depth2.basessbo.convertcheckuv2.nostore.single.std140.comp_offset_nonzero
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29343>
In the past where backends had to deal with nir_register we needed
a specific path for them here because nir_def_components_read only
worked on ssa defs, but now that we got rid of nir_register we
only have nir_def and we don't need to go out of our way to do this
and we can just always use nir_def_components_read.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
This is similar to what we do for textures where we program a number
of channels matching the number of componentes actually read by the
shader.
Makes tests like dEQP-VK.image.load_store.with_format.2d.r32_uint
drop from 18 instructions to 14 by emitting a single ldtmu instead
of 4.
From dEQP-VK.image.load_store.*:
total instructions in shared programs: 12681 -> 12093 (-4.64%)
instructions in affected programs: 4866 -> 4278 (-12.08%)
helped: 256
HURT: 0
helped stats (abs) min: 1 max: 4 x̄: 2.30 x̃: 2
helped stats (rel) min: 4.76% max: 25.00% x̄: 12.35% x̃: 11.11%
95% mean confidence interval for instructions value: -2.40 -2.19
95% mean confidence interval for instructions %-change: -12.99% -11.71%
Instructions are helped.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28978>
We take advantage of the needs_quad_helper_invocation information to
only enable the PER_QUAD TMU access on Fragment Shaders when it is needed.
PER_QUAD access is also disabled on stages different to fragment shader.
Being enabled was causing MMU errors when TMU was doing indexed by vertexid
reads on disabled lanes on vertex stage. This problem was exercised by some
shaders from the GTK new GSK_RENDERER=ngl that were accessing a constant buffer
offset[6], but having PER_QUAD enabled on the TMU access by VertexID was
doing hidden incorrect access to not existing vertex 6 and 7 as TMU was
accessing the full quad.
cc: mesa-stable
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28740>
This is achieved by the following steps:
#ifndef DEBUG => #if !MESA_DEBUG
defined(DEBUG) => MESA_DEBUG
#ifdef DEBUG => #if MESA_DEBUG
This is done by replace in vscode
excludes
docs,*.rs,addrlib,src/imgui,*.sh,src/intel/vulkan/grl/gpu
These are safe because those files should keep DEBUG macro is already excluded;
and not directly replace DEBUG, as we have some symbols around it.
Use debug or NDEBUG instead of DEBUG in comments when proper
This for reduce the usage of DEBUG,
so it's easier migrating to MESA_DEBUG
These are found when migrating DEBUG to MESA_DEBUG,
these are all comment update, so it's safe
Replace comment /* DEBUG */ and /* !DEBUG */ with proper /* MESA_DEBUG */ or /* !MESA_DEBUG */ manually
DEBUG || !NDEBUG -> MESA_DEBUG || !NDEBUG
!DEBUG && NDEBUG -> !(MESA_DEBUG || !NDEBUG)
Replace the DEBUG present in comment with proper new MESA_DEBUG manually
Signed-off-by: Yonggang Luo <luoyonggang@gmail.com>
Acked-by: David Heidelberg <david.heidelberg@collabora.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28092>
The algorithm used to rendering smooth lines worked under the assumption
that line coords were in the [0, 1] range. This was correct when using
an orthogonal projection, but not when using a perspective projection.
With a perspective projection (where the value for 1/Wc set in the VPM
is not 1.0), line coords values are also affected by this projection, so
the values are not in this range.
To deal with this, we normalize the line coords using the Wc value so
the range becomes [0, 1], and the smooth line rendering works as
expected.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10496
Fixes: ee4d51f8b2 ("v3d: Add a lowering pass for line smoothing")
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28072>
It seems the hardware behavior for this is as per-spec and we are
supposed to identify as active entire quads. Particularly, there
are some derivative tests with dynamic control flow that use
subgroup ballot and require this.
However, we still need to exclude terminted lanes (OpTerminate). For
that, we keep track of the sample mask at the start of a fagment
shader start and compare it with the current sample mask.
Fixes: ('broadcom/compiler: support subgroup reduction operations from fragment shaders')
Fixes: dEQP-VK.glsl.derivate.dynamic_loop.*
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27409>
In fragment shaders these instructions consider a lane active when
any lane in the same quad is active, which is not what we want, so
we need to include the current sample mask in the condition mask
used with these instructions to limit lane selection to those that
are really active.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/27211>