Letting the caller zero-initialize the program object is error prone,
not to mention that resources attached to the program might not be freed
by the caller. Let's simplify that by letting the compiler allocate the
panfrost_program object. Those objects should be freed with ralloc_free().
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7206>
Clip & cull distances, which are compact arrays, exposed a lot of holes
because they can take up multiple slots and partially overlap.
I wanted to eliminate our dependence on knowing the layout of the
variables, as this can get complicated with things like partially
overlapping arrays, which can happen with ARB_enhanced_layouts or with
clip/cull distance arrays. This means no longer changing the layout
based on whether the i/o is part of an array or not, and no longer
matching producer <-> consumer based on the variables. At the end of the
day we have to match things based on the user-specified location, so for
simplicity this switches the entire i/o handling to be based off the
user location rather than the driver location. This means that the
primitive map may be a little bigger, but it reduces the complexity
because we never have to build a table mapping user location to driver
location, and it reduces the amount of work done at link time in the SSO
case. It also brings us closer to what the other drivers do.
While here, I also fixed the handling of component qualifiers, which was
another thing broken with clip/cull distances.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6959>
I think the rationale for not setting the size for inputs is that
when passed between geometry stages the clip and cull distances are
supposed to be treated like any other varying. However, this isn't 100%
the case for the FS, since when it's read by the FS it's also used by
the fixed-function stage. In freedreno we setup varying locations when
compiling the FS, and then tack on VS-only outputs like gl_Position at
the end. Furthermore there's code to compact input locations based on
what's actually read. But this compaction can't happen for clip and cull
distances, because then we won't have space for components that are only
read by the clipper. So, we need to know the original number of
components for both arrays. Modify this pass so that we don't have to go
digging around for it ourselves.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6959>
This patch fixes mesa compiler crash in i965 on shaders like the following one:
```
in VS_OUTPUT {
mat4 data;
} vs_output;
out vec4 fs_output;
vec4 convert(in float val) {
return vec4(val);
}
void main()
{
fs_output = vec4(0.0);
for (int a = -1; a < 5; a++) {
for (int b = -1; b < 5; b++) {
fs_output += convert(vs_output.data[b][a]);
}
}
}
```
Section 5.11 (Out-of-Bounds Accesses) of the GLSL 4.60 spec says:
In the subsections described above for array, vector, matrix and
structure accesses, any out-of-bounds access produced undefined
behavior....
Out-of-bounds reads return undefined values, which
include values from other variables of the active program or zero.
Out-of-bounds writes may be discarded or overwrite
other variables of the active program.
GL_KHR_robustness and GL_ARB_robustness encourage us to return zero
for reads.
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Signed-off-by: Andrii Simiklit <andrii.simiklit@globallogic.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/6560>
This creates a directory and save various logs (dmesg, umr,
pipeline, gpu info, etc) instead of printing stuff to stdout/stderr.
This dumps the following files when a GPU hang is detected:
- dmesg.log
- gpu_info.lo
- options.log
- pipeline.log (shaders including SPIR-V if spirv-dis found)
- registers.log
- trace.log
- vm_fault (if a VM fault is detected)
- umr_ring.log (if UMR found)
- umr_waves.log (if UMR found)
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7233>
We don't natively support BGRA format, instead we handle these
as RGBA and we lower the loads to swap components R and B.
However, the driver emits VPM loads based on the size of the
input variables so when we have a vec2 or float BGRA input,
it would only emit VPM loads for components 0 and 1, which is
not correct since we emit a load of component 2 to swap with
component 0.
v2: handle GL legacy vertex inputs gracefully.
Fixes:
dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-highp-output-vec2
dEQP-VK.draw.output_location.array.b8g8r8a8-unorm-mediump-output-vec2
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7271>
There are two bits for the vector size of a vertex input, with the
value 0 meaning 4 components. The CLE decoder seems to try and dump
"4" instead of "0" is to be a bit more user friendly, but it had an
off-by-one error that would cause it to dump "2" instead.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7271>
Fix defects reported by Coverity Scan.
Uninitialized scalar field (UNINIT_CTOR)
uninit_member: Non-static class member buffer_access_type is not
initialized in this constructor nor in any functions that it
calls.
uninit_member: Non-static class member progress is not initialized
in this constructor nor in any functions that it calls.
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7243>
Handle compressed stencil buffer transition from one layout to another
on gen12+.
When stencil compression is enabled, we have to initialize buffer via
stencil clear (HZ_OP) before any renderpass.
v2:
- Pass predicate bit false to anv_image_ccs_op (Nanley Chery)
v3:
- update aspect assertion (Nanley Chery)
v4:
- Make state decision based on anv_layout_to_aux_state instated of
anv_layout_to_aux_usage (Sagar Ghuge)
v5:
- No need to handle stencil CCS resolve case (Jason Ekstrand)
- Initialize buffer using HZ_OP (Nanley Chery)
v6: (Nanley Chery)
- Pass correct layer/level count.
- Remove local variable.
v7:
- Skip stencil initialization with HZ_OP packet if followed by fast
clear. (Nanley Chery)
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2942>
All these instructions replicate the result of a N-component dot-product
to a vec4. Naming them fdot_replicatedN gives the impression that are
some sort of abstract dot-product that replicates the result to a vecN.
They also deviate from fdph_replicated... which nobody would reasonably
consider naming fdot_replicatedh.
Naming these opcodes fdotN_replicated more closely matches what they
are, and it matches the pattern of fdph_replicated.
I believe that the only reason these opcodes were named this way was
because it simplified the implementation of the binop_reduce function in
nir_opcodes.py. I made some fairly simple changes to that function, and
I think the end result is ok.
The bulk of the changes come from the sed rename:
sed --in-place -e 's/fdot_replicated\([234]\)/fdot\1_replicated/g' \
$(grep -r 'fdot_replicated[234]' src/)
v2: Use a named parameter to binop_reduce instead of using
isinstance(name, str). Suggested by Jason.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5725>
With swap interval 0, i.e. sync-to-vblank disabled.
This can be necessary for unthrottled drawing with Xwayland:
1) One buffer can be scanned out
2) One buffer can be pending in the kernel for a page flip
3) One buffer can be pending in the Wayland compositor
Therefore, with 3 buffers, the frame-rate could be capped much lower
than the throughput the GPU is capable of, in the worst case at the
Wayland compositor refresh rate.
(The native Wayland EGL backend always uses up to 4 buffers)
Leave the maximum number of buffers at 3 for swap interval != 0, it's
sufficient in that case to always be able to queue one frame ahead of
time.
https://gitlab.gnome.org/GNOME/mutter/-/issues/1455https://gitlab.gnome.org/GNOME/mutter/-/issues/1462
Cc: mesa-stable
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7033>
We'd previously take the copy path. If we were actually flipping (in
which case skipped frames are more likely to occur), we'd ping-pong
between a smaller and larger number of back buffers, and frame-rate
could vary / take a dip due to the buffer management overhead.
While I'm not sure this is actually possible to hit at this point, it
definitely will be with the next change.
Cc: mesa-stable
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7033>
Previously, we would always allocate 3 buffers for page flipping. But 2
buffers can suffice for clients which always wait for buffer swaps to
complete before starting a new frame.
Therefore, keep track of the maximum number of buffers separately from
the current number, and only bump the latter if both current buffers are
busy.
Cc: mesa-stable
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/7033>