tu: Add documentation for VK_EXT_fragment_density_map
This has gotten complicated enough that we need somewhere outside of the driver itself to give an overall flow of how the feature is implemented. This includes a few things that are enabled in the subsequent commits, specifically the LRZ parts. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36475>
This commit is contained in:
187
docs/drivers/freedreno/fdm.rst
Normal file
187
docs/drivers/freedreno/fdm.rst
Normal file
@@ -0,0 +1,187 @@
|
||||
Fragment Density Map
|
||||
====================
|
||||
|
||||
``VK_EXT_fragment_density_map`` is an extension which is intended to allow
|
||||
users to render parts of the screen at a lower resolution. It is designed to be
|
||||
implemented on tiled rendering GPU architectures such as Adreno, and the
|
||||
intention is that it is implemented by rendering some of the tiles at a lower
|
||||
resolution and scaling them up when resolving to system memory or when sampling
|
||||
the resulting image. This inherently means that it is "all or nothing," that
|
||||
is, it must be enabled or disabled for the entire render pass. While the idea is
|
||||
simple, the implementation in turnip is very subtle with lots of
|
||||
interactions with various different features. This page attempts to document
|
||||
the main principles behind the implementation.
|
||||
|
||||
Coordinate Space Soup
|
||||
---------------------
|
||||
|
||||
In order to render a tile at lower resolution, we have to override the user's
|
||||
viewport and scissor for each tile depending on the scaling factor provided by
|
||||
the user. This becomes complicated fast, so let's start by defining a few
|
||||
coordinate spaces that we'll have to work with.
|
||||
|
||||
Framebuffer space
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
This is the space of the final rendered image. From the user's perspective
|
||||
everything is specified in this space, and fragments created by the rasterizer
|
||||
appear to be larger than 1 pixel. But this is not what actually happens in the
|
||||
hardware, it is a fiction created by the driver. The other spaces below are
|
||||
what the hardware actually "sees".
|
||||
|
||||
GMEM Space
|
||||
^^^^^^^^^^
|
||||
|
||||
This space exists whenever tiled rendering/GMEM is used, even without FDM. It
|
||||
is the space used to access GMEM, with the origin at the upper left of the
|
||||
tile. The hardware automatically transforms rendering space into GMEM space
|
||||
whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The
|
||||
origin of this space will be called :math:`b_{cs}`, the common bin start, for
|
||||
reasons that are explained below. When using FDM, coordinates in this space
|
||||
must be multiplied by the scaling factor :math:`s` derived from the fragment
|
||||
density map, or equivalently divided by the fragment area (as defined by the
|
||||
Vulkan specification), with the origin still at the upper left of the tile. For
|
||||
example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been
|
||||
without FDM and all coordinates in this space must be divided by 2.
|
||||
|
||||
Rendering space
|
||||
^^^^^^^^^^^^^^^
|
||||
|
||||
This is the space in which the hardware rasterizer operates and produces
|
||||
fragments. Normally this is the same as framebuffer space, but with FDM it is
|
||||
not. We transform the viewport and scissor from framebuffer space to
|
||||
rendering space by patching them per-tile in the driver and then when we
|
||||
resolve the tile we scale the resulting tile back to the correct resolution by
|
||||
blitting from the rendering space source to the framebuffer space destination.
|
||||
|
||||
In order to come up with the correct transform from framebuffer space to
|
||||
rendering space, it has to shrink the coordinates by :math:`s` while
|
||||
mapping the original bin start in framebuffer space :math:`b_s` to
|
||||
:math:`b_{cs}`. Since :math:`b_{cs}` is entirely defined by the driver when
|
||||
programming ``*_WINDOW_OFFSET``, one tempting way to do this is to just
|
||||
multiply by :math:`s` and define :math:`b_{cs} = b_s * s`. It turns out,
|
||||
however, that this doesn't work. A key requirement is to handle cases where the
|
||||
same scene is rendered in multiple different views at the same time using
|
||||
``VK_KHR_multiview``, as in VR use-cases, and in this case we want :math:`s` to
|
||||
vary per view, but :math:`b_{cs}` is always the same for every view because
|
||||
there is only one ``*_WINDOW_OFFSET`` register for all layers (hence the name).
|
||||
|
||||
We follow the blob by leaving :math:`b_{cs}` the same regardless of whether FDM
|
||||
is enabled or not. This means that normally :math:`b_s = b_{cs}`, although this
|
||||
is not the case if ``VK_EXT_fragment_density_map_offset`` is in use and the
|
||||
bins are shifted per-view. Since the coordinates need to be scaled by :math:`s`,
|
||||
we know that the transform needs to look like :math:`x' = s * x + o`, where
|
||||
only the offset :math:`o` is free. Plugging in the constraint that :math:`b_s`
|
||||
maps to :math:`b_{cs}`, we get that :math:`b_{cs} = s * b_s + o` or
|
||||
:math:`o = b_{cs} - s * b_s`. This is the function computed by
|
||||
``tu_fdm_per_bin_offset`` and used to calculate the transform for the viewport,
|
||||
scissor, and ``gl_FragCoord``. One critical thing is that the offset must be an
|
||||
integer, or in other words the framebuffer space bin start :math:`b_s` must be
|
||||
a multiple of :math:`1 / s`. This is a natural constraint anyway, because if
|
||||
it wasn't the case then the bin would start in the middle of a fragment which
|
||||
isn't possible to handle correctly.
|
||||
|
||||
Viewport and Scissor Patching
|
||||
-----------------------------
|
||||
|
||||
In order to have :math:`s` differ per view, we have to be able to override the
|
||||
viewport per view. That is, we need to transform the viewport for each view
|
||||
differently. If there is only one viewport, then we duplicate the user's
|
||||
viewport for each view and transform it using the :math:`b_s` and :math:`s` for
|
||||
that view, and we set a "per-view viewport" bit to select the viewport per view
|
||||
instead of using the default viewport 0. When
|
||||
``VK_VALVE_fragment_density_map_layered`` is in use, we instead have to insert
|
||||
shader code to achieve the same thing.
|
||||
|
||||
If the user specifies multiple viewports but they are per-view because
|
||||
``VK_QCOM_multiview_per_view_viewport`` is enabled, then we can just set the
|
||||
per-view viewport bit and transform each user viewport individually by the
|
||||
corresponding scale. But if the user explicitly writes ``gl_ViewportIndex``,
|
||||
then there is nothing we can do and we have to make :math:`s` the same for all
|
||||
views by conservatively taking the minimum. Then we apply :math:`s` to all of
|
||||
the user-specified viewports.
|
||||
|
||||
Because the bin size is now per-view, the usual mechanism of
|
||||
``*_WINDOW_SCISSOR`` for clipping fragments outside the bin doesn't work.
|
||||
Instead the driver needs to intersect the transformed user-specified scissor
|
||||
with the transformed rendering-space bin coordinates, effectively replacing
|
||||
``*_WINDOW_SCISSOR``.
|
||||
|
||||
Fragment density map offset
|
||||
---------------------------
|
||||
|
||||
In order to "properly" implement ``VK_EXT_fragment_density_map_offset``, we
|
||||
need to add an extra row/column of bins at the end and then shift the binning
|
||||
grid up and to the left by an offset :math:`b_o`. This offset is based on the
|
||||
user's offset but has the opposite sign, i.e. when shifting the FDM to the left
|
||||
we have to shift the binning grid to the right, and once the user's offset
|
||||
becomes large enough then we "wrap around" and shift over the scaling factor
|
||||
:math:`s` to the next bin. This has to happen per-view. In turnip the function
|
||||
that computes :math:`b_o` is called ``tu_bin_offset``. Each tile then gets an
|
||||
offseted start :math:`b_s = b_{cs} - b_o` except for the first row/column which
|
||||
only shrink in height/width respectively.
|
||||
|
||||
If we cannot make :math:`s` per-view, then we also cannot make :math:`b_s`
|
||||
per-view and so we cannot shift the bins over. Therefore we fall back to only
|
||||
shifting where :math:`s` is sampled from, which produces jittery and jarring
|
||||
transitions when a bin suddenly changes resolution.
|
||||
|
||||
Bin merging
|
||||
-----------
|
||||
|
||||
FDM shrinks the size of the bin in GMEM, which results in a lot of wasteful
|
||||
unused extra space in GMEM. a7xx mitigates this by introducing "bin merging".
|
||||
If two tiles next to each other have the same scaling for each view, then we
|
||||
combine them into one tile, as long as the combined size in rendering space
|
||||
isn't larger than the original size of an unscaled bin in framebuffer space. We
|
||||
can even merge larger groups of tiles. The only hardware feature needed for
|
||||
this to work is the ability to merge the visibility streams for the tiles,
|
||||
which was added on a7xx by a new bitmask in ``CP_SET_BIN_DATA5`` and variants.
|
||||
Only bins within the same visibility stream/VSC pipe can be merged.
|
||||
|
||||
Hardware scaling registers and LRZ
|
||||
----------------------------------
|
||||
|
||||
One disadvantage of FDM on a6xx is that low-resolution tiles cannot use
|
||||
LRZ, because the LRZ hardware is not aware of the transform between framebuffer
|
||||
space and rendering space and applies the framebuffer-space LRZ values to the
|
||||
rendering-space fragments. In order to fix this, a740 adds new offset and scale
|
||||
registers. The offset :math:`o'` is applied to fragment coordinates during
|
||||
rasterization *after* LRZ, so that viewport, scissor, and LRZ are in a
|
||||
new "LRZ space" while the other operations (resolves and unresolves, and
|
||||
attachment writes) still happen in the rendering space which is now offset.
|
||||
:math:`o'` is specified for each layer. The scale :math:`s` is the same as
|
||||
before, and it is used to multiply the fragment area covered by each LRZ pixel.
|
||||
|
||||
Without ``VK_EXT_fragment_density_map_offset``, we can simply make LRZ space
|
||||
equal to framebuffer space scaled down by :math:`s`. That is, we can set
|
||||
:math:`o'` to what :math:`o` was before and then set :math:`o` to 0, only
|
||||
scaling down the viewport but not shifting it and letting the hardware handle
|
||||
the shift. Then LRZ pixels will be scaled up appropriately and everything will
|
||||
work. However, this doesn't work if there is a bin offset :math:`b_o`. In order
|
||||
to make binning work, we shift the viewport and scissor by :math:`b_o` when
|
||||
binning. Unfortunately the offset registers do not have any effect when
|
||||
binning, so rendering space and LRZ space have to be the same when binning, and
|
||||
the visibility stream is generated from rendering space. This means that LRZ
|
||||
space also has to be shifted over compared to framebuffer space, and the LRZ
|
||||
buffer must be overallocated when FDM offset might be used with it (which is
|
||||
signalled by ``VK_IMAGE_CREATE_FRAGMENT_DENSITY_MAP_OFFSET_BIT_EXT``) because
|
||||
the LRZ image will be shifted by :math:`b_o`.
|
||||
|
||||
In order for LRZ to work, LRZ space when rendering must be equal to LRZ space
|
||||
when binning scaled down by :math:`s`. The origin of LRZ space when binning is
|
||||
:math:`-b_o`, and this must be mapped to 0. The transform from
|
||||
framebuffer space to LRZ space is :math:`x' = x * s + o`, and the transform
|
||||
from framebuffer space to rendering space is :math:`x'' = x * s + o + o'`.
|
||||
We get that :math:`o + o' = b_{cs} - b_s * s`, similar to before, and
|
||||
:math:`0 = -b_o * s + o` so that :math:`o = b_o * s` and finally
|
||||
:math:`o' = b_{cs} - b_s * s - b_o * s`, or after rearranging
|
||||
:math:`o' = b_{cs} - (b_s + b_o) * s`. For all tiles except those in the first
|
||||
row or column, this simplifies to :math:`o' = b_{cs} - b_{cs} * s` because
|
||||
:math:`b_{cs} = b_s + b_o`. For tiles in the first row or column, :math:`b_s`
|
||||
and :math:`b_{cs}` are both 0 in one of the coordinates, so it becomes
|
||||
:math:`o' = -b_o * s` in that coordinate. This isn't representable in hardware,
|
||||
both because it is negative (which can be worked around by artifically
|
||||
shifting :math:`b_{cs}`) but more importantly because it may not meet the
|
||||
alignment requirements for the hardware register (which is currently 8 pixels).
|
||||
We have to just disable LRZ in this case.
|
||||
@@ -2583,13 +2583,13 @@ struct apply_viewport_state {
|
||||
};
|
||||
|
||||
/* It's a hardware restriction that the window offset (i.e. common_bin_offset)
|
||||
* must be the same for all views. This means that GMEM coordinates cannot be
|
||||
* a simple scaling of framebuffer coordinates, because this would require us
|
||||
* to scale the window offset and the scale may be different per view. Instead
|
||||
* we have to apply a per-bin offset to the GMEM coordinate transform to make
|
||||
* sure that the window offset maps to the per-view bin coordinate, which will
|
||||
* be the same if there is no offset. Specifically we need an offset o to the
|
||||
* transform:
|
||||
* must be the same for all views. This means that rendering coordinates
|
||||
* cannot be a simple scaling of framebuffer coordinates, because this would
|
||||
* require us to scale the window offset and the scale may be different per
|
||||
* view. Instead we have to apply a per-bin offset to the rendering coordinate
|
||||
* transform to make sure that the window offset maps to the per-view bin
|
||||
* coordinate, which will be the same if there is no offset. Specifically we
|
||||
* need an offset o to the transform:
|
||||
*
|
||||
* x' = s * x + o
|
||||
*
|
||||
|
||||
Reference in New Issue
Block a user