For geometry shaders, we're going to need to compile various graphics
shaders down to compute shaders. This means that they'll look like
compute shaders to much of the compile pipeline but ultimately get
executed as graphics shaders. Most of the time, the compiler will just
happily take whatever offset you give and try to load the sysval from
there so you can load a graphics sysval from a compute shader just fine.
However, for the common ones, we switch on the shader stage and load
from a different offset for 3D vs. compute. This breaks the moment you
have a compute shader that's going to actually load from a 3D sysval
space.
The solution here is to ensure that any common sysvals (currently just
the push uniforms address and the printf buffer) are at exactly the same
offset in both. This is done by adding a panvk_common_sysvals struct,
some static asserts, and a bit of macro magic to keep things eurgonamic.
This also changes push uniform upload to just swap in the push uniform
address instead of writing it to the command buffer on every iteration.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38508>
Our backend compiler explains the limits as :
32 bytes for the patch header (tessellation factors)
480 bytes for per-patch varyings (a varying component is 4 bytes and
gl_MaxTessPatchComponents = 120)
16384 bytes for per-vertex varyings (a varying component is 4 bytes,
gl_MaxPatchVertices = 32 and
gl_MaxTessControlOutputComponents = 128)
In all that's :
* 32 patches * 128 components (counting tessellation factors)
* 32 vertices * 128 components
8192 total components.
I'm not sure why the limit was set so low, maybe leftover from older platforms?
Bump the limit to something like competition.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38523>
We were overflowing an array during bifrost disassembly. This was
only a problem if the user explicitly set an environment variable,
so unlikely to occur in casual use, and also only could be triggered
in very specific, dense code. But we still should get this right!
The specific CTS test that caused the assert is:
'dEQP-VK.graphicsfuzz.stable-quicksort-for-loop-with-injection'
with environment variable `BIFROST_MESA_DEBUG=shaders`. One of the
shaders has a clause with 6 constants (the maximum) and this overflowed
the array because we assume we always have an extra slot (used for
modifier processing).
Cc: mesa-stable
Reviewed-by: Christoph Pillmayer <christoph.pillmayer@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38501>
Skip TU_CMD_FLAG_WAIT_FOR_BR wait whenever concurrent binning is disabled.
Without CB there is nothing to wait for, so the sync only adds overhead,
and in workloads with thousands of tiny renderpasses the cumulative overhead
becomes too big.
In one real-world workload I saw the following timings:
- 99.20 ms without disabling TU_CMD_FLAG_WAIT_FOR_BR
- 65.15 ms with TU_CMD_FLAG_WAIT_FOR_BR disabled
- 64.92 ms with TU_DEBUG=nocb
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
Disable concurrent binning by default so regular renderpasses have access
to all vertex fetch resources. When a renderpass can actually enable CB,
walk back to the CB barrier at submission time and re-enable CB for all
patchpoints between CB barrier and the renderpass.
Because we expect at most one or two renderpasses with CB per frame,
the number of patches stays small.
The reduced vertex fetch resources resulted in up to 10% performance loss
seen in targeted benchmark and in a few game captures.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
The sync emitted on TU_CMD_FLAG_WAIT_FOR_BR didn't disable CB
when CB was previously disabled for the renderpass, this resulted
in less resources vertex processing resources available for BR.
We can just not emit the sync instead, since next time CB is enabled
it will force the sync.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38378>
Unlike textures, we can't easily do format-conversion of the data
before, because the source is a buffer object and not a texture object.
But we already have a hammer for this in Mesa, which means we'll drop
the ARB_texture_buffer_object extension support, but only for the OpenGL
compatibility profile. We still get GL 4.6, both core and compatibility.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
Unlike textures, we can't easily do format-conversion of the data
before, because the source is a buffer object and not a texture object.
But we already have a hammer for this in Mesa, which means we'll drop
the ARB_texture_buffer_object extension support, but only for the OpenGL
compatibility profile. We still get GL 3.1 exposed.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
Unlike textures, we can't easily do format-conversion of the data
before, because the source is a buffer object and not a texture object.
But we already have a hammer for this in Mesa, which means we'll drop
the ARB_texture_buffer_object extension support, but only for the OpenGL
compatibility profile. We still get GL 3.2 exposed.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
While OpenGL 3.1 does require texture buffer objects, the ARB spec for
this requires support for texture buffers with alpha, luminance,
luminance-alpha and intensity formats in addition to RGBA formats. The
version of texture buffer objects that ended up in the OpenGL spec (even
in the compatibility spec) does not require these formats.
But, we don't even need to check this, because this is already included
in the GLSL 1.40 requirement that's also checked. So this shouldn't make
us expose GL 3.1 in cases where it isn't supported in the first place.
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
Most of the time, we remember to check for both extensions. But in one
case, it seems we forgot the GLES extension. Whoops.
Let's switch to a helper here, so we don't have to repeat the logic over
and over again.
Fixes: b4c0c514b1 ("mesa: add OES_texture_buffer and EXT_texture_buffer support")
Reviewed-by: Lars-Ivar Hesselberg Simonsen <lars-ivar.simonsen@arm.com>
Reviewed-by: Iago Toral Quiroga <itoral@igalia.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38162>
There's a single underlying bo mapping shared by the initial alloc here
and the later import of the same. The mapping size has to be initialized
with the real size of the created blob resource, since the app can query
the exported native handle size for re-import. e.g. lseek dma-buf size
Similar to virtgpu_bo_create_from_device_memory, the app can do multiple
imports with different sizes for suballocation. So on the initial
import, the mapping size has to be initialized with the real size of the
backing blob resource.
Backport-to: 25.3
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38443>
If the allocation originates from the same instance, the tracker map
size follows the allocationSize. After export and re-import, mapping the
whole dma-buf can exceed the original map size. This change backs out
the offending changes.
Test: dEQP-VK.api.external.memory.*.suballocated.host_visible.*
Fixes: 442f242a49 ("venus: requests whole blob mem size for non-dedicated import")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38443>
Found when running glxgears with vblank enabled and modesetting DDX.
glxgears will send many present requests at the beginning, but most
of them get complete event with skip mode. This problem causes
glxgears report ~75fps on a 60Hz monitor at the first record.
This change reduces it to 60fps.
Vulkan side X11 WSI does not have this problem as it will wait first
present request's complete event before send second present request.
How the problem happens:
1. client send present request 1 with target msc = 1
2. server side current msc is 100, so it find request 1 is
outdated and queue it for vblank with target msc = 101
3. client send present request 2 with target msc = 2
4. server side current msc is still 100, so it find request 2
is outdated and queue it with target msc = 101, and find
request 1 will be overridden, so mark it as skipped and
send idle notify for it.
5. client get the idle notify for request 1, and reuse the
request 1 buffer for new back buffer to send present
request 3.
6. this keeps going until client send present request N, and
server finally process the vblank queue before 101 msc
arrive and send complete event for all these requests back
to client.
Reviewed-by: Michel Dänzer <michel@daenzer.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38178>