radv_shader_nir_to_asm actually had 3 functions: compiling the NIR to
asm, uploading the shaders and generating debug info for them.
This reduces the functionality of radv_shader_nir_to_asm to only compile
NIR to asm. Uploading the shader and generating debug info is split into
separate functions.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23516>
The dumb buffer backing the renderonly_scanout is only destroyed if the
refcount reaches zero. If a driver does not correctly initialize the
refcount, the refcount may be negative and the buffer will never be
freed.
Add an assert to ensure that drivers correctly initialize the refcount.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23743>
If the GEM is closed before setting the BO in the sparse array to zero,
a newly allocated GEM may be associated with a stale BO that is left in
the cache reusing an old BO.
Zero the BO before closing the GEM to make sure that the BO is removed
from the cache and won't be associated with a different GEM.
Signed-off-by: Michael Tretter <m.tretter@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23744>
Account for the possibility that the scissor is outside the render area. Fixes
the usual assertion fail:
glcts: ../src/gallium/drivers/asahi/agx_state.c:1015:
agx_upload_viewport_scissor: Assertion `maxx > minx && maxy > miny' failed.
on the following dEQP tests with my conformance build:
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_line
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_point
dEQP-GLES3.functional.fragment_ops.scissor.outside_render_tri
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
The GPU ABI requires varyings to be grouped as follows:
- Position
- Smooth shaded fp32
- Flat shaded fp32
- Linear shaded fp32
- Smooth shaded fp16
- Flat shaded fp16
- Linear shaded fp16
- Point size
Use the flat shaded mask info we now have in the vertex shader key to
sort things properly, and pass the counts to the hardware.
FP16 is still TODO.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
We need to propagate shading model metadata from the FS to the VS in
order to correctly lay out the uniforms in the right order. This means
we need VS variants depending on this data.
We could use the existing shader info structure, but that applies to
compiled shaders which would introduce a dependency from the VS compile
to the FS compile. This information does not change with FS variants, so
we can introduce an agx_uncompiled_shader_info structure and gather it
early at precompilation time.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
An offset may be negative, indexing backwards from the array base.
When we right shift an offset by the format shift, we need to use a
signed shift to ensure that the resulting offset is still negative.
Fixes Nautilus faults/pink crashes.
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
In 97a1bbeaf26 ("agx: Fix discards"), we made our discard lowering very simple,
since we had just discovered the underlying instruction behaviour and needed a
hotfix for misrendering in the wild. Now that we understand the behaviour, we
can do better. There are two potential performance issues with the lowering in
that commit:
1. It generates extra sample_mask instructions. For a shader that has a single
discard_if at root level, it would generate two instructions
sample_mask foo, 0
sample_mask ~0, ~0
rather than a single
sample_mask ~0, ~foo
2. It runs depth/stencil testing/updates at the end of the shader, even when it
could be run immediately after the discard. This might cause pipeline stalls.
The solution is to insert the "trigger testing" sample_mask instruction as soon
after the "discard" instruction as possible, fusing them if they would be next
to each other. There are two cases:
1. The last discard is executed unconditionally. In this case, we can test
immediately after, unconditionally, and fuse together.
2. The last discard is executed conditionally. In this case, we test in the
first unconditional block after the discard. Example shader:
...
loop {
if .. {
loop {
discard_if <-- discard here
...
}
..
}
...
}
<---- we test here
...
store_output
Together this covers all the usual patterns for single-sampled discard. We could
still do better with multisampling, but whatever.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>
When lowering discards, it will be convenient to generate the pattern:
(cond ? 255 : 0) ^ 255
Add rules to optimize that to
(cond ? 0 : 255)
This is not part of the main algebraic optimizer since this lowering happens
late.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/23998>