Same enum as PowerVR CDM, annoyingly different from the VDM block types.
Split out the stream link / terminate structs (both observed with Metal
for copious amounts of compute), in preparation for decoding "properly".
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18623>
Jumps in the command streams, allowing us to chain ("link") command
buffers. Naming is from PowerVR, which contains an identical command.
PowerVR's has conditional jumps and function call support, it's likely
that AGX inherited this too but I haven't tested that. (Those might be
useful for conditional rendering and secondary command buffers
respectively?)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
Looking at PowerVR's PPP definitions in tree in Mesa
(src/imagination/csbgen/), we find that AGX's "tagged" data structures
are actually sequences of state items prefixed by a header specifying
which state follows. Rather than hardcoding the sequences in which Apple's
driver chooses to bundle state, we need the XML to be flexible enough to
encode or decode any valid combination of state. That means reworking
the XML. While doing so, we find a number of fields that are identical
between RGX and AGX, and fix the names while at it (for example, the W
Clamp floating point).
Names are from the PowerVR code in Mesa where sensible.
Once we've reworked the XML, we need to rework the decoder. Instead of
reading tags and printing the combined state packets, the decoder now
must unpack the header and print the individual state items specified by
the header, with slightly more complicated bounds checking.
Finally, state emission in the driver becomes much more flexible. To
prove the flexibility actually works, we now emit all PPP state (except for
viewport and scissor state) as a single PPP update. This works. After
this we can move onto more interesting arrangements of state for lower
driver overhead.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18421>
There are a pair of flags controlling the stencil test. One enables
stencil testing in general, the other enables two-sided stencil. Compare
the identical "twosided" flag in src/imagination/csbgen/rogue_ppp.xml's
STATE_ISPCTL structure, at the samebit offset even. Evidently this word of
the "Rasterizer" is, in fact, a derivative of STATE_ISPCTL.
Fixes
dEQP-GLES2.functional.fragment_ops.depth_stencil.*
dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.*
dEQP-GLES2.functional.fragment_ops.random.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
This is still a guess, but a considerably firmer one as it now corrects
handles the clear pipelines emitted by Metal as well as the regular
vertex/fragment shader, and gets rid of the preshader special cases
seen there. Fixes decode of clear pipeline's preshaders.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
As we recently discovered, the layout of level L of a mipmapped 2D image
of size WxH is /not/ the same as the layout of a non-mipmapped 2D image
of size minify(W, L) x minify(H, L). The difference occurs due to
subtleties of the "power of two" miptrees which can force a level to use
a larger tile size than it would have required at root level. To handle
this quirk correctly, the driver must not implement texture views with
address arithmetic -- it must supply instead the base width/height of a
texture and use first/last level fields on the texture descriptor to map
it. Similar issues occur when writing a particular level of a mipmapped
texture, which was handled correctly in the colour case but not the Z/S
case.
Fixes
dEQP-GLES2.functional.texture.mipmap.cube.generate.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
For cube maps, depth=1 in the hardware (but 6 in Gallium). Likewise for
cube map arrays, depth=n in the hardware (but 6n in Gallium). We need to
divide to compensate. This will be relevant for cube map arrays in the
future -- add the dimension XML for cube map arrays too.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18380>
I'm not sure why we need to set this magic bit, but this fixes the
non-depth_component portion of
dEQP-GLES3.functional.texture.format.sized.*, e.g
dEQP-GLES3.functional.texture.format.sized.cube.srgb_rg8_pot
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18167>
Setting this bit (at the batch level, not the draw level!) switches to
[-1, 1] clipping instead of Metal's preferred [0, 1] clipping. Using
this bit allows us to drop the clip_halfz lowering we had before, saving
2 instructions in every vertex shader.
Fixes dEQP-GLES2.functional.depth_range.*
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17948>
Instead of using driver_location magic and hoping things work, make the
linkage between vertex and fragment shaders explicit. Thanks to the
coefficient register mechanism reverse-engineered and documented earlier
in this series, this does not require any shader keys to support
separable shaders. It just requires that we regenerate the coefficient
register binding tables at draw time, based on the varying layouts
decided by the compiler independently for the VS and FS. This is more
robust in the face of separate shaders.
This also gets us glProvokingVertex() support without shader keys.
After that, we don't need any of the remapping prepasses. For fragment
shaders, any old mapping will do, so we can assign coefficient registers
as we go (based on what the program actually uses, not nir_variable
information that might be stale by this point). We do want to cache
coefficient registers, particularly for fragcoord.w which is used for
perspective interpolation everywhere.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Lots of changes from reverse-engineering harder the interactions with
fp16 and noperspective and such, and comparing against the PowerVR
driver code in Mesa that's been released since this XML was
originally written.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
The counts for textures/samplers are specified in the bind
texture/sampler packets. What's in the bind pipeline appear to be...
hints? of some kind? It's a direct function of the numbers of textures
and samplers, but much more coarse. Unknown purpose.
This should be correct for up to 48 textures and at least 8 samplers.
For more than 48 textures, Metal switches to a "bindless" mode, where
the textures are instead bound with a bind uniform packet, ts* is no
longer read in the shader, and instead registers and immediates are used
to index the texture with a substantial preshader. Details TBD. We don't
need to worry about that for a long while, though.
Fixes a number of dEQPs.
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.array_in_struct.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_pointer.render.nested_structs_arrays.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.array_in_struct.sampler2D_samplerCube_vertex,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_both,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_fragment,Crash
dEQP-GLES2.functional.uniform_api.value.assigned.by_value.render.nested_structs_arrays.sampler2D_samplerCube_vertex,Crash
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17198>
Aka occlusion queries. There is an annoying limitation in the hardware
(reflected in Metal) that only a single buffer may be bound per render
pass, with the per-draw settings merely specifying an offset.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16512>
This uses a subset of the depth/stencil infrastructure we built out to
support writing back tiled, uncompressed Z32F depth buffers to memory.
Texturing from this format is already supported.
This gets glmark2 -bshadow working.
v2: Fix partial renders
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/16512>