Most of the relevant work happens in the v3d_nir_lower_io. Since
geometry shaders can write any number of output vertices, this pass
injects a few variables into the shader code to keep track of things
like the number of vertices emitted or the offsets into the VPM
of the current vertex output, etc. This is also where we handle
EmitVertex() and EmitPrimitive() intrinsics.
The geometry shader VPM output layout has a specific structure
with a 32-bit general header, then another 32-bit header slot for
each output vertex, and finally the actual vertex data.
When vertex shaders are paired with geometry shaders we also need
to consider the following:
- Only geometry shaders emit fixed function outputs.
- The coordinate shader used for the vertex stage during binning must
not drop varyings other than those used by transform feedback, since
these may be read by the binning GS.
v2:
- Use MAX3 instead of a chain of MAX2 (Alejandro).
- Make all loop variables unsigned in ntq_setup_gs_inputs (Alejandro)
- Update comment in IO owering so it includes the GS stage (Alejandro)
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
While lowering vpm outputs we look for the NIR variables matching
particular store output instructions and we expect to find a match,
so assert on that.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
BEGIN_RING() could decide we can't fit the next packet in the current
cmdstream segment, and grow a new segment. So we need to grab ring->cur
*after* BEGIN_RING(), otherwise we are writing cmdstream past the end of
the previous segment.
Fixes: bdd98b892f ("freedreno: New struct packing macros")
Signed-off-by: Rob Clark <robdclark@chromium.org>
The current strategy using the suballocator with fixed size doesn't
scale and causes some programs with large number of vertices (like some
glmark2 scenes) to crash.
Change it to dynamically allocate a separate bo to accomodate for
arbitrary number of vertices.
This also fixes the buffer read/write flags for gp.
Signed-off-by: Erico Nunes <nunes.erico@gmail.com>
Reviewed-by: Vasily Khoruzhick <anarsoul@gmail.com>
Reviewed-by: Andreas Baierl <ichgeh@imkreisrum.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/merge_requests/2445>
I feel really bad about this but this one test is flaking. I don't want
to do a mass revert (and bisection is extremely difficult with
nondeterministic/Heisenbugs), but it's Friday night and master needs to
pass. This commit should be reverted asap (once the flake is solved)
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This can be used to start/stop statistics capturing from the command
line.
v3:
- Install script (Lionel)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
By default, if an output_file is specified, the overlay layer will start
capturing data immediately. After this commit, when a control socket is
used, the capture starts disabled by default, and is only enabled when a
command ":capture=1;" is received.
when the capture is enabled, we might have already accumulated some
stats. To avoid capturing such noise, we discard and reset the fps and
stats, updating the display and capturing only data from that point on.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Add support for socket from which the overlay layer can receive
commands. This control socket can be useful to allow setting options
once the application is already running. For instance, triggering the
capture of fps data at a certain point.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Add some infrastructure to trace scheduler decisions. The next patch
will add some more traces, just splitting this out to reduce clutter.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Sometimes sched changes that are a win in terms of instruction count
and/or register pressure, are worse in real life, due to keeping varying
storage locked for too long. Add a shader-db stat to give this more
visibility.
Signed-off-by: Rob Clark <robdclark@chromium.org>
We'll need this so we can allocate a stack for the batch large enough
for all the jobs within it.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
The size of the scratchpad (as well as some tiler details) depend on the
contents of the batch, so we need to wait to defer filling out the FBD
until after all draws are queued.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>