diff --git a/docs/drivers/freedreno/fdm.rst b/docs/drivers/freedreno/fdm.rst new file mode 100644 index 00000000000..a3b8d879e94 --- /dev/null +++ b/docs/drivers/freedreno/fdm.rst @@ -0,0 +1,187 @@ +Fragment Density Map +==================== + +``VK_EXT_fragment_density_map`` is an extension which is intended to allow +users to render parts of the screen at a lower resolution. It is designed to be +implemented on tiled rendering GPU architectures such as Adreno, and the +intention is that it is implemented by rendering some of the tiles at a lower +resolution and scaling them up when resolving to system memory or when sampling +the resulting image. This inherently means that it is "all or nothing," that +is, it must be enabled or disabled for the entire render pass. While the idea is +simple, the implementation in turnip is very subtle with lots of +interactions with various different features. This page attempts to document +the main principles behind the implementation. + +Coordinate Space Soup +--------------------- + +In order to render a tile at lower resolution, we have to override the user's +viewport and scissor for each tile depending on the scaling factor provided by +the user. This becomes complicated fast, so let's start by defining a few +coordinate spaces that we'll have to work with. + +Framebuffer space +^^^^^^^^^^^^^^^^^ + +This is the space of the final rendered image. From the user's perspective +everything is specified in this space, and fragments created by the rasterizer +appear to be larger than 1 pixel. But this is not what actually happens in the +hardware, it is a fiction created by the driver. The other spaces below are +what the hardware actually "sees". + +GMEM Space +^^^^^^^^^^ + +This space exists whenever tiled rendering/GMEM is used, even without FDM. It +is the space used to access GMEM, with the origin at the upper left of the +tile. The hardware automatically transforms rendering space into GMEM space +whenever GMEM is accessed using the various ``*_WINDOW_OFFSET`` registers. The +origin of this space will be called :math:`b_{cs}`, the common bin start, for +reasons that are explained below. When using FDM, coordinates in this space +must be multiplied by the scaling factor :math:`s` derived from the fragment +density map, or equivalently divided by the fragment area (as defined by the +Vulkan specification), with the origin still at the upper left of the tile. For +example, if :math:`s_x = 1/2`, then the bin is half as wide as it would've been +without FDM and all coordinates in this space must be divided by 2. + +Rendering space +^^^^^^^^^^^^^^^ + +This is the space in which the hardware rasterizer operates and produces +fragments. Normally this is the same as framebuffer space, but with FDM it is +not. We transform the viewport and scissor from framebuffer space to +rendering space by patching them per-tile in the driver and then when we +resolve the tile we scale the resulting tile back to the correct resolution by +blitting from the rendering space source to the framebuffer space destination. + +In order to come up with the correct transform from framebuffer space to +rendering space, it has to shrink the coordinates by :math:`s` while +mapping the original bin start in framebuffer space :math:`b_s` to +:math:`b_{cs}`. Since :math:`b_{cs}` is entirely defined by the driver when +programming ``*_WINDOW_OFFSET``, one tempting way to do this is to just +multiply by :math:`s` and define :math:`b_{cs} = b_s * s`. It turns out, +however, that this doesn't work. A key requirement is to handle cases where the +same scene is rendered in multiple different views at the same time using +``VK_KHR_multiview``, as in VR use-cases, and in this case we want :math:`s` to +vary per view, but :math:`b_{cs}` is always the same for every view because +there is only one ``*_WINDOW_OFFSET`` register for all layers (hence the name). + +We follow the blob by leaving :math:`b_{cs}` the same regardless of whether FDM +is enabled or not. This means that normally :math:`b_s = b_{cs}`, although this +is not the case if ``VK_EXT_fragment_density_map_offset`` is in use and the +bins are shifted per-view. Since the coordinates need to be scaled by :math:`s`, +we know that the transform needs to look like :math:`x' = s * x + o`, where +only the offset :math:`o` is free. Plugging in the constraint that :math:`b_s` +maps to :math:`b_{cs}`, we get that :math:`b_{cs} = s * b_s + o` or +:math:`o = b_{cs} - s * b_s`. This is the function computed by +``tu_fdm_per_bin_offset`` and used to calculate the transform for the viewport, +scissor, and ``gl_FragCoord``. One critical thing is that the offset must be an +integer, or in other words the framebuffer space bin start :math:`b_s` must be +a multiple of :math:`1 / s`. This is a natural constraint anyway, because if +it wasn't the case then the bin would start in the middle of a fragment which +isn't possible to handle correctly. + +Viewport and Scissor Patching +----------------------------- + +In order to have :math:`s` differ per view, we have to be able to override the +viewport per view. That is, we need to transform the viewport for each view +differently. If there is only one viewport, then we duplicate the user's +viewport for each view and transform it using the :math:`b_s` and :math:`s` for +that view, and we set a "per-view viewport" bit to select the viewport per view +instead of using the default viewport 0. When +``VK_VALVE_fragment_density_map_layered`` is in use, we instead have to insert +shader code to achieve the same thing. + +If the user specifies multiple viewports but they are per-view because +``VK_QCOM_multiview_per_view_viewport`` is enabled, then we can just set the +per-view viewport bit and transform each user viewport individually by the +corresponding scale. But if the user explicitly writes ``gl_ViewportIndex``, +then there is nothing we can do and we have to make :math:`s` the same for all +views by conservatively taking the minimum. Then we apply :math:`s` to all of +the user-specified viewports. + +Because the bin size is now per-view, the usual mechanism of +``*_WINDOW_SCISSOR`` for clipping fragments outside the bin doesn't work. +Instead the driver needs to intersect the transformed user-specified scissor +with the transformed rendering-space bin coordinates, effectively replacing +``*_WINDOW_SCISSOR``. + +Fragment density map offset +--------------------------- + +In order to "properly" implement ``VK_EXT_fragment_density_map_offset``, we +need to add an extra row/column of bins at the end and then shift the binning +grid up and to the left by an offset :math:`b_o`. This offset is based on the +user's offset but has the opposite sign, i.e. when shifting the FDM to the left +we have to shift the binning grid to the right, and once the user's offset +becomes large enough then we "wrap around" and shift over the scaling factor +:math:`s` to the next bin. This has to happen per-view. In turnip the function +that computes :math:`b_o` is called ``tu_bin_offset``. Each tile then gets an +offseted start :math:`b_s = b_{cs} - b_o` except for the first row/column which +only shrink in height/width respectively. + +If we cannot make :math:`s` per-view, then we also cannot make :math:`b_s` +per-view and so we cannot shift the bins over. Therefore we fall back to only +shifting where :math:`s` is sampled from, which produces jittery and jarring +transitions when a bin suddenly changes resolution. + +Bin merging +----------- + +FDM shrinks the size of the bin in GMEM, which results in a lot of wasteful +unused extra space in GMEM. a7xx mitigates this by introducing "bin merging". +If two tiles next to each other have the same scaling for each view, then we +combine them into one tile, as long as the combined size in rendering space +isn't larger than the original size of an unscaled bin in framebuffer space. We +can even merge larger groups of tiles. The only hardware feature needed for +this to work is the ability to merge the visibility streams for the tiles, +which was added on a7xx by a new bitmask in ``CP_SET_BIN_DATA5`` and variants. +Only bins within the same visibility stream/VSC pipe can be merged. + +Hardware scaling registers and LRZ +---------------------------------- + +One disadvantage of FDM on a6xx is that low-resolution tiles cannot use +LRZ, because the LRZ hardware is not aware of the transform between framebuffer +space and rendering space and applies the framebuffer-space LRZ values to the +rendering-space fragments. In order to fix this, a740 adds new offset and scale +registers. The offset :math:`o'` is applied to fragment coordinates during +rasterization *after* LRZ, so that viewport, scissor, and LRZ are in a +new "LRZ space" while the other operations (resolves and unresolves, and +attachment writes) still happen in the rendering space which is now offset. +:math:`o'` is specified for each layer. The scale :math:`s` is the same as +before, and it is used to multiply the fragment area covered by each LRZ pixel. + +Without ``VK_EXT_fragment_density_map_offset``, we can simply make LRZ space +equal to framebuffer space scaled down by :math:`s`. That is, we can set +:math:`o'` to what :math:`o` was before and then set :math:`o` to 0, only +scaling down the viewport but not shifting it and letting the hardware handle +the shift. Then LRZ pixels will be scaled up appropriately and everything will +work. However, this doesn't work if there is a bin offset :math:`b_o`. In order +to make binning work, we shift the viewport and scissor by :math:`b_o` when +binning. Unfortunately the offset registers do not have any effect when +binning, so rendering space and LRZ space have to be the same when binning, and +the visibility stream is generated from rendering space. This means that LRZ +space also has to be shifted over compared to framebuffer space, and the LRZ +buffer must be overallocated when FDM offset might be used with it (which is +signalled by ``VK_IMAGE_CREATE_FRAGMENT_DENSITY_MAP_OFFSET_BIT_EXT``) because +the LRZ image will be shifted by :math:`b_o`. + +In order for LRZ to work, LRZ space when rendering must be equal to LRZ space +when binning scaled down by :math:`s`. The origin of LRZ space when binning is +:math:`-b_o`, and this must be mapped to 0. The transform from +framebuffer space to LRZ space is :math:`x' = x * s + o`, and the transform +from framebuffer space to rendering space is :math:`x'' = x * s + o + o'`. +We get that :math:`o + o' = b_{cs} - b_s * s`, similar to before, and +:math:`0 = -b_o * s + o` so that :math:`o = b_o * s` and finally +:math:`o' = b_{cs} - b_s * s - b_o * s`, or after rearranging +:math:`o' = b_{cs} - (b_s + b_o) * s`. For all tiles except those in the first +row or column, this simplifies to :math:`o' = b_{cs} - b_{cs} * s` because +:math:`b_{cs} = b_s + b_o`. For tiles in the first row or column, :math:`b_s` +and :math:`b_{cs}` are both 0 in one of the coordinates, so it becomes +:math:`o' = -b_o * s` in that coordinate. This isn't representable in hardware, +both because it is negative (which can be worked around by artifically +shifting :math:`b_{cs}`) but more importantly because it may not meet the +alignment requirements for the hardware register (which is currently 8 pixels). +We have to just disable LRZ in this case. diff --git a/src/freedreno/vulkan/tu_pipeline.cc b/src/freedreno/vulkan/tu_pipeline.cc index 5985cedd546..9dab846838a 100644 --- a/src/freedreno/vulkan/tu_pipeline.cc +++ b/src/freedreno/vulkan/tu_pipeline.cc @@ -2583,13 +2583,13 @@ struct apply_viewport_state { }; /* It's a hardware restriction that the window offset (i.e. common_bin_offset) - * must be the same for all views. This means that GMEM coordinates cannot be - * a simple scaling of framebuffer coordinates, because this would require us - * to scale the window offset and the scale may be different per view. Instead - * we have to apply a per-bin offset to the GMEM coordinate transform to make - * sure that the window offset maps to the per-view bin coordinate, which will - * be the same if there is no offset. Specifically we need an offset o to the - * transform: + * must be the same for all views. This means that rendering coordinates + * cannot be a simple scaling of framebuffer coordinates, because this would + * require us to scale the window offset and the scale may be different per + * view. Instead we have to apply a per-bin offset to the rendering coordinate + * transform to make sure that the window offset maps to the per-view bin + * coordinate, which will be the same if there is no offset. Specifically we + * need an offset o to the transform: * * x' = s * x + o *