anv,iris: Advertise a max 3D workgroup size of 1024^3

On GFX version 12.5+ with COMPUTE_WALKER, this is the limit based on the
size of the HW packet.  On older HW, we can technically go a bit bigger
but there's not much point.  Technically, some hardware can support a
scalar workgroup size up to 2048 but most apps don't go any bigger than
1024.

As discussed on the merge request page, the current limit assumes
SIMD32, but it is unclear if we want to encourage applications to use
SIMD32 if it may lead to additional register spilling in shader
programs. Many applications have likely tuned for a limit of 1024
based on the OpenGL minimum limit, so it might not gain much by
advertising more than 1024.

Reworks:
 * Jordan: Use MIN2 and limit total invocations as well.
 * Jordan: Add second paragraph to commit message based on merge
   request discussion.

Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13538>
This commit is contained in:
Jason Ekstrand
2021-10-27 01:40:36 -05:00
committed by Marge Bot
parent 8626949f07
commit 419b02c90c
2 changed files with 4 additions and 2 deletions
+2 -1
View File
@@ -527,7 +527,8 @@ iris_get_compute_param(struct pipe_screen *pscreen,
struct iris_screen *screen = (struct iris_screen *)pscreen;
const struct intel_device_info *devinfo = &screen->devinfo;
const uint32_t max_invocations = 32 * devinfo->max_cs_workgroup_threads;
const uint32_t max_invocations =
MIN2(1024, 32 * devinfo->max_cs_workgroup_threads);
#define RET(x) do { \
if (ret) \
+2 -1
View File
@@ -1817,7 +1817,8 @@ void anv_GetPhysicalDeviceProperties(
pdevice->has_bindless_images && pdevice->has_a64_buffer_access
? UINT32_MAX : MAX_BINDING_TABLE_SIZE - MAX_RTS - 1;
const uint32_t max_workgroup_size = 32 * devinfo->max_cs_workgroup_threads;
const uint32_t max_workgroup_size =
MIN2(1024, 32 * devinfo->max_cs_workgroup_threads);
VkSampleCountFlags sample_counts =
isl_device_get_sample_counts(&pdevice->isl_dev);