AlexIndustrial/mesa

Author	SHA1	Message	Date
Chia-I Wu	8615653c0e	v3dv: use vk_default_allocator This also fixes the allocator used in v3dv_DestroyDevice. v2: fix two more occurences of default_alloc (Roman Stratiienko) Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11117>	2021-06-03 08:13:26 +00:00
Chia-I Wu	447e80ac9b	vulkan/wsi: provide more info in wsi_image_create_info Always chain wsi_image_create_info to VkImageCreateInfo, which indicates that the image is a wsi image and can be transitioned to/from VK_IMAGE_LAYOUT_PRESENT_SRC_KHR. Add prime_blit_buffer to the struct as well. When set, it indicates the prime blit destination and implies that the image is a prime blit source. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Yiwei Zhang <zzyiwei@chromium.org> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10789>	2021-06-03 04:24:55 +00:00
Iago Toral Quiroga	1f7d2b4994	v3dv: implement external semaphore/fence extensions This provides most of the implementation, but there are some things we cannot enable until we improve of kernel submit interface, namely: We don't expose capacity to export SYNC_FD, although we do have the implementation in place. This requires that we improve our kernel interface and event wait implementation first so we can cover the corner case where the application submits a command buffer that includes a VkCmdWaitForEvents and tries to export a SYNC_FD from its signal semaphores or fence before it the event is signaled and the command buffer is sent to the kernel for execution in full. Likewise, we can't currently import semaphores. This is because our current kernel submit interface can only take one syncobj. We have been working around this so far by waiting on the last syncobj produced from the device whenever we had to wait on any semaphores (which is obviously suboptimal already), but this won't work as soon as we allow importing external semaphores, as those could (and would typically) be produced from a different device. Once we address the kernel bits, we should come back and enable SYNC_FD exports as well as semaphore imports. Relevant CTS tests: dEQP-VK.api.external.fence.* dEQP-VK.api.external.semaphore.* dEQP-VK.synchronization.cross_instance.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>	2021-06-02 09:58:47 +00:00
Iago Toral Quiroga	cfb4d109a7	v3dv: don't keep an open file descriptor for imported fences/semaphores We can (and should) close the descriptor immediately after the import. Gets the following CTS test to pass without requiring to increase limits for open file descriptors: dEQP-VK.synchronization.basic.binary_semaphore.chain Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11105>	2021-06-02 09:58:47 +00:00
Juan A. Suarez Romero	1341e2a547	ci/v3dv: update expected results Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11103>	2021-06-01 18:21:21 +00:00
Alejandro Piñeiro	53d937c2e8	v3d/simulator: use BFC/RFC registers to wait for bin/render to complete We were using the CT0CA (Control List Executor Current Address) and CT0EA (Control List Executor End Address) registers, but that would only wait for the CLE to reach the end of the list, but there could still be things in the rest of the pipeline. Even if that seems to work with the current simulator, the correct way to do that is using the BFC (Binning Mode Flush Count) and RFC (Rendering Mode Frame Count) registers instead. In fact, this would be needed with a newer simulator snapshot, in order to get the followint CTS tests working: dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image.4_bit dEQP-VK.api.copy_and_blit.core.resolve_image.whole_array_image_one_region.4_bit dEQP-VK.api.copy_and_blit.core.resolve_image.whole_copy_before_resolving.4_bit dEQP-VK.api.device_init.create_instance_device_intentional_alloc_fail dEQP-VK.api.image_clearing.core.clear_color_image.1d.optimal.multiple_layers.r32g32_uint dEQP-VK.api.image_clearing.core.clear_color_image.1d.optimal.remaining_array_layers_twostep.r16_sint Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>	2021-06-01 12:22:28 +02:00
Alejandro Piñeiro	ec85862d76	v3d/simulator: use the proper register when waiting on a CSD submit Until now we were waiting until having a dispatch current and/or queued. But that would only wait for all shaders to have started, it won't wait for them to have finished. With this commit we wait until the NUM_COMPLETED_JOBS (that in spite of that name, it is about dispatches) field got increased. This is in general safest, and it is needed after the latest simulator update to get CTS tests like the following ones working: dEQP-VK.compute.basic.copy_ssbo_multiple_invocations dEQP-VK.compute.basic.copy_ssbo_single_invocation dEQP-VK.compute.basic.ssbo_rw_single_invocation dEQP-VK.compute.basic.ssbo_unsized_arr_single_invocation dEQP-VK.compute.basic.ubo_to_ssbo_multiple_invocations dEQP-VK.compute.basic.ubo_to_ssbo_single_invocation v2 (from Juan feedback): * Clarify JOBS vs DISPATCHES Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>	2021-06-01 12:22:28 +02:00
Alejandro Piñeiro	7f3e34bcb4	v3d/simulator: wait for cache flushes Current code just assumes that flushes are instant, as simulator doesn't really model the caches. So right now we have just an assert that the flush has been done. But that can change on the future, so let's change the assert for a wait. Note that for the l1t case we are writing on the field TMUWCF. So I understand that then we need to wait for TMUWCF_SET, even if the previous code was using L2TFLS_SET. This also happpens on the kernel side. We need to check if this was a typo on the kernel side. v2 (from Juan feedback) * Add comment about the TMUWCF vs L2TFLS difference between this commit and the kernel. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>	2021-06-01 12:22:28 +02:00
Alejandro Piñeiro	9bd8d26969	v3d/simulator: add a cache flush mode enum Makes the write to the l2t cache control more readable (without magic numbers). Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>	2021-06-01 12:22:28 +02:00
Alejandro Piñeiro	123c7d7277	v3d/simulator: capture hub interrupts So far we were not capturing any HUB interrupt, just core. This could be a problem if any is fired, as we could enter on an infinite loop. With this commit we start to capture them. So we split v3d_isr into core and hub interrupt handling. As reference we capture the same HUB interrupts that we capture on the v3d kernel support. It is worth to note that all those are mostly untested. Now with both opengl/vulkan driver being stable we were not able to raise those interrupts. v2 (Juan feedback): * Just one V3D_VERSION >= 41 block, more readable * Assert that the core is 0 at v3d_isr_core (we don't handle multi-core right now). Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>	2021-06-01 12:22:28 +02:00
Alejandro Piñeiro	4e9f1261ee	broadcom/compiler: use proper type field for atomic operations We were using the num_components to infer it, but in the end it is VEC2 for CMPXCHG and 32BIT for anything else. This doesn't affect any test with the real hw, but fixes an assert with the last version of the simulator. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11039>	2021-06-01 12:22:22 +02:00
Georg Lehmann	9d66a2d986	v3dv: use VKAPI_ATTR and VKAPI_CALL. Closes #4852 Signed-off-by: Georg Lehmann <dadschoorse@gmail.com> Tested-by: Roman Stratiienko <r.stratiienko@gmail.com> Acked-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11062>	2021-05-31 17:08:27 +00:00
Iago Toral Quiroga	234e1b7356	v3dv: implement VK_KHR_device_group We only support one device group with a single device, so the implementation is trivial. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>	2021-05-31 09:06:18 +00:00
Iago Toral Quiroga	c672b23857	v3dv: implement interactions of VK_KHR_device_group with VK_KHR_swapchain There are some interactions between these two extensions that need to be implemented when both are supported. Particularly: 1. Applications can create images that will be bound to swapchain memory by passing a VkImageSwapchainCreateInfoKHR in the pNext chain of VkImageCreateInfo. In this case we need to make sure that the created image takes some of its parameters from the underlying swapchain. 2. Applications can bind memory from a swapchain image to a VkImage by passing a VkBindImageMemorySwapchainInfoKHR in the pNext chain of VkBindImageMemoryInfo. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>	2021-05-31 09:06:18 +00:00
Iago Toral Quiroga	bf60ba6e7f	v3dv: create a helper for image creation Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>	2021-05-31 09:06:18 +00:00
Iago Toral Quiroga	f07c797e93	v3dv: implement vkCmdDispatchBase This was added with VK_KHR_device_group and allows users to specify a base offset that will be automatically added to gl_WorkGroupID. Unfortunately, V3D doesn't support this natively, so we need to add the base to the workgroup id generated by hardware manually. For this, we inject add instructions that source from a QUNIFORM that will retrieve the actual dispatch base from the compute job when it is dispatched. Since a compute shader can be dispatched with CmdDispatch and/or CmdDispatchBase, we always need to add these additional add instructions and use a base of (0,0,0) for regular dispatches. Since we don't support any version of OpenGL with this dispatch base functionality we can avoid the extra instructions there. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>	2021-05-31 09:06:18 +00:00
Alejandro Piñeiro	0d2d26a68c	v3dv: remove unused v3dv_zs_buffer_from_vk_format Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11050>	2021-05-28 09:00:35 +00:00
Iago Toral Quiroga	3179daf613	v3dv: add v3dv_GetImageSparseMemoryRequirements back This one is not implemented in the common dispatch handler in terms of its KHR_get_memory_requirements2 version, so the driver needs to implement it. Fixes: `d87afc1acc` ('v3dv: implement VK_KHR_get_memory_requirements2') Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11038>	2021-05-27 13:01:18 +02:00
Iago Toral Quiroga	e531755451	v3dv: trivially handle VK_STRUCTURE_TYPE_EXPORT_MEMORY_ALLOCATE_INFO_KHR Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11002>	2021-05-27 08:23:55 +02:00
Iago Toral Quiroga	597b448967	v3dv: implement VK_KHR_dedicated_allocation Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11002>	2021-05-27 08:23:55 +02:00
Iago Toral Quiroga	e60b009271	v3dv: keep track of whether an image may be backed by external memory Such images will always require dedicated allocations. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11002>	2021-05-27 08:21:15 +02:00
Iago Toral Quiroga	d87afc1acc	v3dv: implement VK_KHR_get_memory_requirements2 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11002>	2021-05-27 08:21:15 +02:00
Iago Toral Quiroga	5283c6d47b	v3dv: implement VK_KHR_bind_memory2 Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11001>	2021-05-26 10:17:53 +00:00
Iago Toral Quiroga	6a847cbe1d	v3dv: implement VK_KHR_maintenance3 We don't have any special restrictions associated with the number of descriptors in a set other than maybe not exceeding what we can put in a single memory allocation, so in practice, applications will be limited by the per-stage contraints defined by other Vulkan limits. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10970>	2021-05-26 07:18:19 +00:00
Iago Toral Quiroga	f7ce44b6e5	v3dv: define V3D_MAX_BUFFER_RANGE Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10970>	2021-05-26 07:18:19 +00:00
Iago Toral Quiroga	de75f43aef	v3dv: expose VK_KHR_maintenance2 We don't do anything for input attachment aspects read by a subpass since it doesn't have performance implications for us. We also ignore the the new depth stencil layouts because they don't have practical implications for our implementation. We also ignore the new usage info for views since we are not currently making decisions about views based on their usage. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10951>	2021-05-25 09:12:35 +00:00
Iago Toral Quiroga	b32a48c7e2	v3dv: allow creating uncompressed views from compressed images and vice versa Relevant CTS tests (requires VK_KHR_maintenance2): dEQP-VK.image.texel_view_compatible.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10951>	2021-05-25 09:12:35 +00:00
Iago Toral Quiroga	8e3179545e	v3dv: fix texture_size() The uniform data for the texture size as produced by the compiler contains the texture index directly and is not packed with v3d_unit_data_create, so using v3d_unit_data_get_unit is not correct. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10951>	2021-05-25 09:12:35 +00:00
Iago Toral Quiroga	32abeac8a8	v3dv: implement VK_STRUCTURE_TYPE_PHYSICAL_DEVICE_POINT_CLIPPING_PROPERTIES Relevant CTS test (requires VK_KHR_maintenance2); dEQP-VK.clipping.clip_volume.clipped.large_points Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10951>	2021-05-25 09:12:35 +00:00
Alejandro Piñeiro	77edb2d40d	v3dv: don't use typedef enum with broadcom stages This is the only place on the broadcom stack where we use "typedef enum", so for consistency let's avoid it. Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10947>	2021-05-24 15:22:29 +00:00
Ian Romanick	2fdd9b8604	gallium/dri: Add Y21x formats v2: Add all the Y21x tests to the A530 expected fail list. All of the YUV image import tests fail on this platform, and nobody has been able to investigate why. v3: Update the comment describing the zeroed bits in Y212. Suggested by Emma. v4: Add all Y21x test to the rpi3 expected fail list. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9610>	2021-05-21 01:40:22 +00:00
Ian Romanick	3c4c03cd13	gallium/dri: Add Y41x formats v2: Don't leak __DRI_IMAGE_FOURCC_RGBA16161616 to applications. v3: Fix typo in __DRI_IMAGE_FOURCC_RGBA16161616 table entry. v4: Add the Y412 and Y416 tests to the A530 expected fail list. Many YUV image import tests fail on this platform, and nobody has been able to investigate why. v5: Update the comment describing the zeroed bits in Y412. Suggested by Emma. v6: Add all Y41x test to the rpi3 expected fail list. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9610>	2021-05-21 01:40:22 +00:00
Juan A. Suarez Romero	53ef2a7e69	ci/broadcom: update expected results Fix also some typos in the expected failed results. Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10865>	2021-05-18 15:52:57 +00:00
Iago Toral Quiroga	b06b24191f	broadcom/ci: update fail list for v3dv Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>	2021-05-18 11:28:17 +00:00
Iago Toral Quiroga	39c41169ba	broadcom/compiler: consider RT component size when lowering logic ops in Vulkan In Vulkan we configure our integer RTs to clamp automatically, so with logic operations we need to be careful and avoid overflows by discarding any bits that won't fit in the RT component size. Fixes remaining CTS test failures in: dEQP-VK.pipeline.logic_op.* Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>	2021-05-18 11:28:17 +00:00
Iago Toral Quiroga	df7185d0d1	broadcom/compiler: don't emit TLB loads for components that don't exist This avoids debug builds to assert crash. Components that don't exist won't be used and will be eventually DCEd, so simply lower them to 0. Fixes CTS tests like these in debug builds: dEQP-VK.pipeline.logic_op.r8_uint.clear dEQP-VK.pipeline.logic_op.r8_uint.and dEQP-VK.pipeline.logic_op.r8_uint.and_reverse dEQP-VK.pipeline.logic_op.r8_uint.copy dEQP-VK.pipeline.logic_op.r8_uint.and_inverted dEQP-VK.pipeline.logic_op.r8_uint.no_op dEQP-VK.pipeline.logic_op.r8_uint.xor dEQP-VK.pipeline.logic_op.r8_uint.or Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10801>	2021-05-18 11:28:17 +00:00
Connor Abbott	a40714abf7	nir/lower_phis_to_scalar: Add "lower_all" option We don't want to have to deal with vector phis in freedreno, because vectors are always split/unsplit around vectorized instructions anyways, and the stated reason for not scalarising them (it hurting coalescing) won't apply to us because we won't be using nir_from_ssa. Add this option so that we don't have to do the equivalent thing while translating from NIR. Reviewed-by: Rob Clark <robdclark@gmail.com> Reviewed-by: Daniel Schürmann <daniel@schuermann.dev> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10809>	2021-05-17 09:59:45 +00:00
Juan A. Suarez Romero	629e8347ad	ci: Update VK-GL-CTS to 1.2.6.1 Reviewed-by: Emma Anholt <emma@anholt.net> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10800>	2021-05-14 20:35:24 +00:00
Iago Toral Quiroga	6f93354bae	broadcom/compiler: clarify PIPE_SHADER_CAP_INDIRECT_INPUT_ADDR setting We enabled this in the past to fix some register allocation issues we faced with geometry shaders but we didn't document why it is safe for us to do this, which is not immediately obvious. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10745>	2021-05-11 12:26:19 +02:00
Iago Toral Quiroga	370495abd1	v3d: disable GLSL loop unrolling again We had re-enabled this because of some test regressions: KHR-GLES31.core.geometry_shader.limits.max_input_components and ext_transform_feedback-max-varyings failed to register allocate, but now that we support indirect indexing on vertex shader outputs natively this is no longer an issue. Piglit's max-samplers tests failed. These tests use indirect indexing on samplers which is not supported and fail to link with this error message: "Failed to link: error: sampler arrays indexed with non-constant expressions is forbidden in GLSL 110". This is expected. The reason these were passing before is that loop unrolling was able to turn indirect indexing into direct indexing. We add them to the expected fail list. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	f0fef41917	broadcom/compiler: don't unroll due to indirect indexing of outputs We can handle this natively now, so there is no point. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	9f5481cf78	v3dv: don't lower indirect derefs on output variables Our backend compiler can handle this for all supported shader stages now. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	0235ed18a7	broadcom/compiler: don't use nir_src_is_dynamically_uniform Now that we have divergence analysis we should use that. Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	cb39dca2d3	broadcom/compiler: make vir_VPM_WRITE_indirect handle non-uniform offsets Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	f71893a942	broadcom/compiler: implement non-uniform offset on vertex outputs Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Iago Toral Quiroga	067ad7eccc	broadcom/compiler: move vertex shader output handling to its own function Reviewed-by: Juan A. Suarez <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10723>	2021-05-11 09:31:31 +00:00
Juan A. Suarez Romero	54ec9c95cf	broadcom/compiler: fix dynamic-stack-buffer-overflow error When spilling a register, the number of temps can be increased when introducing a temporal variable. Those nodes are not elegible to be spilled, but we need to take care of no accessing out-of-bounds of the arrays defined with a size equal to the original number of temps. Fixes address sanitizer error on KHR-GLES3.shaders.uniform_block.random.all_shared_buffer.14 (and many others). v2 (Iago): - Add clarification in assertion. - Use `vir_get_temp` to increase num_temps. v3 (Iago): - Update clarification Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10643>	2021-05-11 07:46:17 +00:00
Juan A. Suarez Romero	ea463f9bff	ci/broadcom: update expected results Reviewed-by: Jose Maria Casanova Crespo <jmcasanova@igalia.com> Signed-off-by: Juan A. Suarez Romero <jasuarez@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10694>	2021-05-10 10:11:09 +00:00
Iago Toral Quiroga	d81a6e5f1d	broadcom/compiler: change register allocation policy for accumulators The current policy is to always favor accumulators if possible, however, this is not always optimal. Particularly, accumulators play a crucial role in enabling QPU instruction merges, since these are limited to both the ADD and the ALU instructions addressing at most 2 physical registers. For 2-src instructions, this means that to be able to merge we need them to address at least 2 accumulators. While favoring accumulators does help the case for instruction merges in general, it is risky to assign accumulators to variables that have long life spans. Doing so will make the accumulator unavailable for any other instructions during that life span, and since we only have a few accumulators, we can quickly run out and losing our capacity to merge instructions for large parts of the qpu program. On the other hand, we also want to avoid the extreme case were we keep allocating physical registers to the point we run out, even if we have accumulators available, since accumulators have additional restrictions and may not be suitable for everything. This change continues the policy of favoring accumulators, but it only does so if the life span of the temps is short, to ensure that we can recycle accumulators often across instructions and avoid running out for sections of the QPU code, unless we are already running out of physical registers. total instructions in shared programs: 13654647 -> 13336921 (-2.33%) instructions in affected programs: 11015919 -> 10698193 (-2.88%) helped: 39758 HURT: 17325 Instructions are helped. total threads in shared programs: 412046 -> 412038 (<.01%) threads in affected programs: 16 -> 8 (-50.00%) helped: 0 HURT: 4 Threads are HURT. total uniforms in shared programs: 3745726 -> 3746003 (<.01%) uniforms in affected programs: 17296 -> 17573 (1.60%) helped: 76 HURT: 99 Uniforms are HURT. total max-temps in shared programs: 2364430 -> 2359942 (-0.19%) max-temps in affected programs: 109117 -> 104629 (-4.11%) helped: 2893 HURT: 772 Max-temps are helped. total spills in shared programs: 5727 -> 5746 (0.33%) spills in affected programs: 221 -> 240 (8.60%) helped: 1 HURT: 2 total fills in shared programs: 13121 -> 13139 (0.14%) fills in affected programs: 466 -> 484 (3.86%) helped: 1 HURT: 2 total sfu-stalls in shared programs: 33432 -> 34491 (3.17%) sfu-stalls in affected programs: 18219 -> 19278 (5.81%) helped: 4459 HURT: 5087 Inconclusive result total inst-and-stalls in shared programs: 13688079 -> 13371412 (-2.31%) inst-and-stalls in affected programs: 11030017 -> 10713350 (-2.87%) helped: 39630 HURT: 17429 Inst-and-stalls are helped. total nops in shared programs: 335753 -> 333708 (-0.61%) nops in affected programs: 112659 -> 110614 (-1.82%) helped: 8726 HURT: 7383 Inconclusive result Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10686>	2021-05-08 13:15:42 +02:00
Iago Toral Quiroga	c11e479852	broadcom/compiler: specify maximum thread count in compile strategies Once we have exhausted compile strategies at 4 threads and we start enabling lower thread counts, there is no point in starting compiles with 4 threads for them, we know these will fail, so let's start at 2 in these cases. This also has another nice implication: if the driver compiles at 4 threads and fails to register allocate, we were allowing it to try with 2 threads, but this would only retry the register allocation process and would not really recompile the shader with 2 threads. This is not optimal, because at 2 threads we have more TMU fifo space for each thread and we can do more TMU pipelining, so we were missing that opportunity. This improves performance in Sponza by ~1.5% and also seems to help UE4 slightly. Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10647>	2021-05-06 12:27:06 +02:00

1 2 3 4 5 ...

1518 Commits