2e0f6e5705
Each supergroup executes a number batches. Each batch has 16 elements (one per QPU lane), except possibly the last batch which might be incomplete. Until now, we packed a single workgroup in each supergroup, which can lead to more incomplete batches and less efficient use of the QPUs depending on the configuration of workgroups being dispatched. This patch computes a number of workgroups per supergroup so that we reduce or completely eliminate incomplete batches if possible. It should be noted however, that TSY barriers act on supergroups, so larger supergroups lead to larger syncpoints on barriers too. A follow-up patch will try to find a good balance for compute shaders that use such barriers. This improves performance of the Sascha Willem's computecloth demo by ~13%. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10541>