tu: Add documentation for VK_EXT_fragment_density_map

This has gotten complicated enough that we need somewhere outside of the driver itself to give an overall flow of how the feature is implemented. This includes a few things that are enabled in the subsequent commits, specifically the LRZ parts. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36475>
2025-07-28 14:23:29 -04:00
parent cf7a52d2a6
commit 10e7f63734
2 changed files with 194 additions and 7 deletions
--- a/docs/drivers/freedreno/fdm.rst
+++ b/docs/drivers/freedreno/fdm.rst
@@ -0,0 +1,187 @@
+Fragment Density Map
+====================
+
+``VK_EXT_fragment_density_map`` is an extension which is intended to allow
+users to render parts of the screen at a lower resolution. It is designed to be
+implemented on tiled rendering GPU architectures such as Adreno, and the
+intention is that it is implemented by rendering some of the tiles at a lower
+resolution and scaling them up when resolving to system memory or when sampling
+the resulting image. This inherently means that it is "all or nothing," that
+is, it must be enabled or disabled for the entire render pass. While the idea is
+simple, the implementation in turnip is very subtle with lots of
+interactions with various different features. This page attempts to document
+the main principles behind the implementation.
+
+Coordinate Space Soup
+---------------------
+
+In order to render a tile at lower resolution, we have to override the user's
+viewport and scissor for each tile depending on the scaling factor provided by
+the user. This becomes complicated fast, so let's start by defining a few
+coordinate spaces that we'll have to work with.
+
+Framebuffer space
+^^^^^^^^^^^^^^^^^
+
+This is the space of the final rendered image. From the user's perspective
+everything is specified in this space, and fragments created by the rasterizer
+appear to be larger than 1 pixel. But this is not what actually happens in the
+hardware, it is a fiction created by the driver. The other spaces below are
+what the hardware actually "sees".
+
+GMEM Space
+^^^^^^^^^^
+
+This space exists whenever tiled rendering/GMEM is used, even without FDM. It
+is the space used to access GMEM, with the origin at the upper left of the
+tile. The hardware automatically transforms rendering space into GMEM space
+whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The
+origin of this space will be called :math:`b_{cs}`, the common bin start, for
+reasons that are explained below. When using FDM, coordinates in this space
+must be multiplied by the scaling factor :math:`s` derived from the fragment
+density map, or equivalently divided by the fragment area (as defined by the
+Vulkan specification), with the origin still at the upper left of the tile. For
+example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been
+without FDM and all coordinates in this space must be divided by 2.
+
+Rendering space
+^^^^^^^^^^^^^^^
+
+This is the space in which the hardware rasterizer operates and produces
+fragments. Normally this is the same as framebuffer space, but with FDM it is
+not. We transform the viewport and scissor from framebuffer space to
+rendering space by patching them per-tile in the driver and then when we
+resolve the tile we scale the resulting tile back to the correct resolution by
+blitting from the rendering space source to the framebuffer space destination.
+
+In order to come up with the correct transform from framebuffer space to
+rendering space, it has to shrink the coordinates by :math:`s` while
+mapping the original bin start in framebuffer space :math:`b_s` to
+:math:`b_{cs}`. Since :math:`b_{cs}` is entirely defined by the driver when
+programming ``*_WINDOW_OFFSET``, one tempting way to do this is to just
+multiply by :math:`s` and define :math:`b_{cs} = b_s * s`. It turns out,
+however, that this doesn't work. A key requirement is to handle cases where the
+same scene is rendered in multiple different views at the same time using
+``VK_KHR_multiview``, as in VR use-cases, and in this case we want :math:`s` to
+vary per view, but :math:`b_{cs}` is always the same for every view because
+there is only one ``*_WINDOW_OFFSET`` register for all layers (hence the name).
+
+We follow the blob by leaving :math:`b_{cs}` the same regardless of whether FDM
+is enabled or not. This means that normally :math:`b_s = b_{cs}`, although this
+is not the case if ``VK_EXT_fragment_density_map_offset`` is in use and the
+bins are shifted per-view. Since the coordinates need to be scaled by :math:`s`,
+we know that the transform needs to look like :math:`x' = s * x + o`, where
+only the offset :math:`o` is free. Plugging in the constraint that :math:`b_s`
+maps to :math:`b_{cs}`, we get that :math:`b_{cs} = s * b_s + o` or
+:math:`o = b_{cs} - s * b_s`. This is the function computed by
+``tu_fdm_per_bin_offset`` and used to calculate the transform for the viewport,
+scissor, and ``gl_FragCoord``. One critical thing is that the offset must be an
+integer, or in other words the framebuffer space bin start :math:`b_s` must be
+a multiple of :math:`1 / s`.  This is a natural constraint anyway, because if
+it wasn't the case then the bin would start in the middle of a fragment which
+isn't possible to handle correctly.
+
+Viewport and Scissor Patching
+-----------------------------
+
+In order to have :math:`s` differ per view, we have to be able to override the
+viewport per view. That is, we need to transform the viewport for each view
+differently. If there is only one viewport, then we duplicate the user's
+viewport for each view and transform it using the :math:`b_s` and :math:`s` for
+that view, and we set a "per-view viewport" bit to select the viewport per view
+instead of using the default viewport 0. When
+``VK_VALVE_fragment_density_map_layered`` is in use, we instead have to insert
+shader code to achieve the same thing.
+
+If the user specifies multiple viewports but they are per-view because
+``VK_QCOM_multiview_per_view_viewport`` is enabled, then we can just set the
+per-view viewport bit and transform each user viewport individually by the
+corresponding scale. But if the user explicitly writes ``gl_ViewportIndex``,
+then there is nothing we can do and we have to make :math:`s` the same for all
+views by conservatively taking the minimum. Then we apply :math:`s` to all of
+the user-specified viewports.
+
+Because the bin size is now per-view, the usual mechanism of
+``*_WINDOW_SCISSOR`` for clipping fragments outside the bin doesn't work.
+Instead the driver needs to intersect the transformed user-specified scissor
+with the transformed rendering-space bin coordinates, effectively replacing
+``*_WINDOW_SCISSOR``.
+
+Fragment density map offset
+---------------------------
+
+In order to "properly" implement ``VK_EXT_fragment_density_map_offset``, we
+need to add an extra row/column of bins at the end and then shift the binning
+grid up and to the left by an offset :math:`b_o`. This offset is based on the
+user's offset but has the opposite sign, i.e. when shifting the FDM to the left
+we have to shift the binning grid to the right, and once the user's offset
+becomes large enough then we "wrap around" and shift over the scaling factor
+:math:`s` to the next bin.  This has to happen per-view. In turnip the function
+that computes :math:`b_o` is called ``tu_bin_offset``. Each tile then gets an
+offseted start :math:`b_s = b_{cs} - b_o` except for the first row/column which
+only shrink in height/width respectively.
+
+If we cannot make :math:`s` per-view, then we also cannot make :math:`b_s`
+per-view and so we cannot shift the bins over. Therefore we fall back to only
+shifting where :math:`s` is sampled from, which produces jittery and jarring
+transitions when a bin suddenly changes resolution.
+
+Bin merging
+-----------
+
+FDM shrinks the size of the bin in GMEM, which results in a lot of wasteful
+unused extra space in GMEM. a7xx mitigates this by introducing "bin merging".
+If two tiles next to each other have the same scaling for each view, then we
+combine them into one tile, as long as the combined size in rendering space
+isn't larger than the original size of an unscaled bin in framebuffer space. We
+can even merge larger groups of tiles. The only hardware feature needed for
+this to work is the ability to merge the visibility streams for the tiles,
+which was added on a7xx by a new bitmask in ``CP_SET_BIN_DATA5`` and variants.
+Only bins within the same visibility stream/VSC pipe can be merged.
+
+Hardware scaling registers and LRZ
+----------------------------------
+
+One disadvantage of FDM on a6xx is that low-resolution tiles cannot use
+LRZ, because the LRZ hardware is not aware of the transform between framebuffer
+space and rendering space and applies the framebuffer-space LRZ values to the
+rendering-space fragments. In order to fix this, a740 adds new offset and scale
+registers. The offset :math:`o'` is applied to fragment coordinates during
+rasterization *after* LRZ, so that viewport, scissor, and LRZ are in a
+new "LRZ space" while the other operations (resolves and unresolves, and
+attachment writes) still happen in the rendering space which is now offset.
+:math:`o'` is specified for each layer. The scale :math:`s` is the same as
+before, and it is used to multiply the fragment area covered by each LRZ pixel.
+
+Without ``VK_EXT_fragment_density_map_offset``, we can simply make LRZ space
+equal to framebuffer space scaled down by :math:`s`. That is, we can set
+:math:`o'` to what :math:`o` was before and then set :math:`o` to 0, only
+scaling down the viewport but not shifting it and letting the hardware handle
+the shift. Then LRZ pixels will be scaled up appropriately and everything will
+work. However, this doesn't work if there is a bin offset :math:`b_o`. In order
+to make binning work, we shift the viewport and scissor by :math:`b_o` when
+binning. Unfortunately the offset registers do not have any effect when
+binning, so rendering space and LRZ space have to be the same when binning, and
+the visibility stream is generated from rendering space. This means that LRZ
+space also has to be shifted over compared to framebuffer space, and the LRZ
+buffer must be overallocated when FDM offset might be used with it (which is
+signalled by ``VK_IMAGE_CREATE_FRAGMENT_DENSITY_MAP_OFFSET_BIT_EXT``) because
+the LRZ image will be shifted by :math:`b_o`.
+
+In order for LRZ to work, LRZ space when rendering must be equal to LRZ space
+when binning scaled down by :math:`s`. The origin of LRZ space when binning is
+:math:`-b_o`, and this must be mapped to 0. The transform from
+framebuffer space to LRZ space is :math:`x' = x * s + o`, and the transform
+from framebuffer space to rendering space is :math:`x'' = x * s + o + o'`.
+We get that :math:`o + o' = b_{cs} - b_s * s`, similar to before, and
+:math:`0 = -b_o * s + o` so that :math:`o = b_o * s` and finally
+:math:`o' = b_{cs} - b_s * s - b_o * s`, or after rearranging
+:math:`o' = b_{cs} - (b_s + b_o) * s`. For all tiles except those in the first
+row or column, this simplifies to :math:`o' = b_{cs} - b_{cs} * s` because
+:math:`b_{cs} = b_s + b_o`. For tiles in the first row or column, :math:`b_s`
+and :math:`b_{cs}` are both 0 in one of the coordinates, so it becomes
+:math:`o' = -b_o * s` in that coordinate. This isn't representable in hardware,
+both because it is negative (which can be worked around by artifically
+shifting :math:`b_{cs}`) but more importantly because it may not meet the
+alignment requirements for the hardware register (which is currently 8 pixels).
+We have to just disable LRZ in this case.
--- a/src/freedreno/vulkan/tu_pipeline.cc
+++ b/src/freedreno/vulkan/tu_pipeline.cc
@@ -2583,13 +2583,13 @@ struct apply_viewport_state {
 };

 /* It's a hardware restriction that the window offset (i.e. common_bin_offset)
- * must be the same for all views. This means that GMEM coordinates cannot be
- * a simple scaling of framebuffer coordinates, because this would require us
- * to scale the window offset and the scale may be different per view. Instead
- * we have to apply a per-bin offset to the GMEM coordinate transform to make
- * sure that the window offset maps to the per-view bin coordinate, which will
- * be the same if there is no offset. Specifically we need an offset o to the
- * transform:
+ * must be the same for all views. This means that rendering coordinates
+ * cannot be a simple scaling of framebuffer coordinates, because this would
+ * require us to scale the window offset and the scale may be different per
+ * view. Instead we have to apply a per-bin offset to the rendering coordinate
+ * transform to make sure that the window offset maps to the per-view bin
+ * coordinate, which will be the same if there is no offset. Specifically we
+ * need an offset o to the transform:
 *
 * x' = s * x + o
 *