Using stripes to deal with the different packet layout variants resulted
in redefining "register" offsets with different values, so use "prefix"
to add a suffix to disambiguate.
drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h:1066: warning: "REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT" redefined
1066 | #define REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT 0x00000006
|
drivers/gpu/drm/msm/adreno/adreno_pm4.xml.h:1057: note: this is the location of the previous definition
1057 | #define REG_A6XX_CP_DRAW_INDIRECT_MULTI_INDIRECT 0x00000003
|
(Admittedly it isn't really a "prefix" but that was the field in the
schema available to use, and REG_INDEXED_CP_DRAW_INDIRECT_MULTI_STRIDE
sounds somewhat more funny.)
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
This runs through the SQE bootstrap code to extract the packet-table,
rather than relying on heuristics. As a bonus, it can detect the start
of the LPAC fw in a660+ fw so that we can properly decode the LPAC fw
and packet-table.
Note that this decodes the jmptable as normal instructions, which is a
change in behavior from the previous heuristic based jmptbl extraction.
Not sure if that is a good or bad thing.
For a5xx, for now the legacy heuristic based jmptable decoding is
preserved, at least until enough control regs are figured out.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
This is an (at least somewhat complete) logical emulator of the a6xx SQE
that lets us step through firmware execution (bootstrap, cmdstream pkt
handling, etc). It lets us poke at various fw visible state and run
through pm4 packet(s) to better understand what the fw is doing when it
handles various packets.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10944>
Going beyond 0x100000 results in hangs, however I found that the
last 0x100000 packet just doesn't get executed. Thus the real limit is
0x0FFFFF. At least this is true for a6xx.
This could be tested by appending nops to the cmdstream and placing
e.g. CP_INTERRUPT at the end, at any position other than being
0x100000 packet it results in a hang.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10786>
There is a limit on IB size, which on freedreno is set to 0x100000.
Going beyond it results in hangs, however I found that the last
0x100000 packet just doesn't get executed. Thus the real limit is
0x0FFFFF.
This could be tested by appending nops to the cmdstream and placing
e.g. CP_INTERRUPT at the end, at any position other than being
0x100000 packet it results in a hang.
Fixes:
dEQP-VK.api.command_buffers.record_many_draws_secondary_2
dEQP-VK.api.command_buffers.record_many_draws_primary_2
However these tests could trigger hangcheck timeouts.
Also this fixes hangs when opening captures of games in RenderDoc.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10786>
There are some interactions between these two extensions that need to be
implemented when both are supported. Particularly:
1. Applications can create images that will be bound to swapchain memory
by passing a VkImageSwapchainCreateInfoKHR in the pNext chain
of VkImageCreateInfo. In this case we need to make sure that the
created image takes some of its parameters from the underlying
swapchain.
2. Applications can bind memory from a swapchain image to a VkImage
by passing a VkBindImageMemorySwapchainInfoKHR in the pNext chain
of VkBindImageMemoryInfo.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>
This was added with VK_KHR_device_group and allows users to specify
a base offset that will be automatically added to gl_WorkGroupID.
Unfortunately, V3D doesn't support this natively, so we need to add
the base to the workgroup id generated by hardware manually. For this,
we inject add instructions that source from a QUNIFORM that will
retrieve the actual dispatch base from the compute job when it is
dispatched.
Since a compute shader can be dispatched with CmdDispatch and/or
CmdDispatchBase, we always need to add these additional add
instructions and use a base of (0,0,0) for regular dispatches.
Since we don't support any version of OpenGL with this dispatch
base functionality we can avoid the extra instructions there.
Reviewed-by: Alejandro Piñeiro <apinheiro@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11037>
Construct a scissor descriptor correpsonding to the intersection of the
framebuffer, the viewport, and the region selected for scissoring by the app.
Use the intersected scissor in the "clip tile" fields in the viewport. Select
this scissor descriptor from the command stream.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11084>