b2972cf410a9ce23a73aa6620b3eecc971a01367
The current stack size is a significant limiter for occupancy, and hence we need smaller stacks in LDS. Rhys earlier had a patch that just put the N entries closest to the root in LDS and the rest in scratch. However, this is not ideal for performance as most of the activity is happening away from the root, near the leaves. Of course we can't just switch it around, as the leaf activity likely isn't happening all the way at the end of the stack. So what we do is make the LDS stack kinda a ringbuffer by always accessing it using the stack index modulo the buffer size (always a power of two so we can efficiently mask). If we then do not have free space in this buffer we evict the entries closest to the root to scratch and if we hit the "bottom" of the LDS space we load from scratch. Some rough perf numbers for indication with Q2RTX: | evicting | LDS entries | perf | |----------|-------------|------| | no | 76 | 55% | | no | 32 | 100% | | no | 24 | 105% | | yes | 32 | 95% | | yes | 16 | 100% | | yes | 8 | 90% | | yes | 4 | 75% | (For the case with 4 entries we need to do some extra accounting as a full batch may not be available to evict) So an obvious choice is to use a stack of 16 entries. One might wonder if Q2RTX perf is mainly good due to BVHs with very little geometry and hence low depth, so I also did some profiling with control. This is done with RGP instruction timing, so this is instructions executed not weighted for enabled masks, i.e. divergence effects included. | game | LDS entries | scratch action | fraction of iterations | |---------|-------------|----------------|------------------------| | Control | 8 | store | 10.3% | | Control | 8 | load | 34.8% | | Control | 16 | store | 0.58% | | Control | 16 | load | 2.62% | | Q2RTX | 16 | store | 1.00% | | Q2RTX | 16 | load | 3.07% | So Q2RTX doesn't seem like an unreasonably good case for this algorithm. On the implementation side, we can always place the scratch stack at address 0 by just reserving the scratch space, and in the case of fixed callstack size moving that up. In the dynamic case the dynamic stack base already takes any reserved scratch space into account. Reviewed-by: Konstantin Seurer <konstantin.seurer@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18541>
`Mesa <https://mesa3d.org>`_ - The 3D Graphics Library ====================================================== Source ------ This repository lives at https://gitlab.freedesktop.org/mesa/mesa. Other repositories are likely forks, and code found there is not supported. Build & install --------------- You can find more information in our documentation (`docs/install.rst <https://mesa3d.org/install.html>`_), but the recommended way is to use Meson (`docs/meson.rst <https://mesa3d.org/meson.html>`_): .. code-block:: sh $ mkdir build $ cd build $ meson .. $ sudo ninja install Support ------- Many Mesa devs hang on IRC; if you're not sure which channel is appropriate, you should ask your question on `OFTC's #dri-devel <irc://irc.oftc.net/dri-devel>`_, someone will redirect you if necessary. Remember that not everyone is in the same timezone as you, so it might take a while before someone qualified sees your question. To figure out who you're talking to, or which nick to ping for your question, check out `Who's Who on IRC <https://dri.freedesktop.org/wiki/WhosWho/>`_. The next best option is to ask your question in an email to the mailing lists: `mesa-dev\@lists.freedesktop.org <https://lists.freedesktop.org/mailman/listinfo/mesa-dev>`_ Bug reports ----------- If you think something isn't working properly, please file a bug report (`docs/bugs.rst <https://mesa3d.org/bugs.html>`_). Contributing ------------ Contributions are welcome, and step-by-step instructions can be found in our documentation (`docs/submittingpatches.rst <https://mesa3d.org/submittingpatches.html>`_). Note that Mesa uses gitlab for patches submission, review and discussions.
Description
Languages
C
75.5%
C++
17.2%
Python
2.7%
Rust
1.8%
Assembly
1.5%
Other
1%