This will allow gallium drivers to send messages to KHR_debug endpoints
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Do it in the visitor, like we do for other opcodes.
v2: use const, get rid of useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Do it in the visitor, like we do for other opcodes.
v2: use const, get rid of useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
v2: Use const, do not add unnecessary temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Right now the generator marks direct surfaces as used but leaves marking of
indirect surfaces to the caller. Just make the callers handle marking in both
cases for consistency.
v2: Use const and remove useless surf_index temporary (Curro)
Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Previously there was a problem in i965 where if 16x MSAA is used then
some of the sample positions are exactly on the 0 x or y axis. When
the MSAA copy blit shader interpolates the texture coordinates at
these sample positions it was possible that it would jump to a
neighboring texel due to rounding errors. It is likely that these
positions would be used on 16x MSAA because that is where they are
defined to be in D3D.
To fix that this patch makes it use interpolateAtOffset in the blit
shader whenever 16x MSAA is used and the GL_ARB_gpu_shader5 extension
is available. This forces it to interpolate the texture coordinates at
the pixel center to avoid these problematic positions.
This fixes ext_framebuffer_multisample-unaligned-blit and
ext_framebuffer_multisample-clip-and-scissor-blit with 16x MSAA on
SKL+.
v2: Use interpolateAtOffset instead of interpolateAtSample
v3: Always try to enable GL_ARB_gpu_shader5 in the shader
[Ian Romanick]
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
Previously this extension was only enabled when blitting between two
multisampled buffers. However I don't think it does any harm to just
enable it all the time. The ‘enable’ option is used instead of
‘require’ so that the shader will still compile if the extension isn't
available in the cases where it isn't used. This will make the next
patch simpler because it wants to add another optional extension.
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The destination rectangle is now drawn at 4x4 the size and the shader
code to calculate the sample number is adjusted accordingly.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
In order to accomodate 16x MSAA, the starting sample pair index is now
3 bits rather than 2 on SKL+.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
The gen7_surface_msaa_bits function already returns the right values
for 16 samples but it just needs its assert to be relaxed.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
When 16x MSAA is used for sampling with texelFetch the compiler needs
to use a different instruction which passes more arguments for the MCS
data. Previously on skl+ it was unconditionally using this new
instruction. However since 16x MSAA is probably going to be pretty
rare, it is probably worthwhile to avoid using this instruction for
the other sample counts. In order to do that this patch adds a new
member to brw_sampler_prog_key_data to track when a sampler refers to
a buffer with 16 samples.
Note that this isn't done for the vec4 backend because it wouldn't
change how many registers it uses.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data in the response
still fits in a single register so we just need to ensure we copy both
values rather than just the lower one.
Acked-by: Ben Widawsky <ben@bwidawsk.net>
In order to support 16x MSAA, skl+ has a wider version of ld2dms that
takes two parameters for the MCS data. The MCS data retrieved from the
ld_mcs instruction already returns 4 or 8 registers and is documented
to return zeroes for the mcsh value when the sample count is less than
16.
v2: Use get_lowered_simd_width to fall back to SIMD8 instructions when
the message length would be too long in SIMD16.
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
This is the standard pattern used by the other 3D graphics API.
BDW has slots for these values, but they aren't actually used until
SKL. Even though the documentation for BDW says they must be zero, it
doesn't seem to cause any harm to program them anyway.
The comment above for the 8x sample positions says that the hardware
implements centroid interpolation by picking the centre-most sample
that is inside the primitive. That implies that it might be worthwhile
to pick a pattern that includes 0.5,0.5. However by experimentation
this doesn't seem to actually be the case. With the sample positions
in this patch, if I modify the piglit test below so that it instead
reports the centroid position, it reports 0.492188,0.421875 which
doesn't match any of the positions. If I modify the sample positions
so that they include one at exactly 0.5,0.5 it doesn't help and it
reports another position which is even further from the center for
some reason.
arb_gpu_shader5-interpolateAtSample-different
Kenneth Graunke experimented with some other patterns that have a
higher standard deviation but I think after some discussion it was
decided that it would be better to pick the same pattern as the other
graphics API in case there are games that rely on this pattern.
(Based on a patch by Kenneth Graunke)
Cc: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Ben Widawsky <ben at bwidawsk.net>
Equivalent to commit 8ac3b525c but with sel operations. In this case
we select the PredCtrl based on the writemask.
This patch helps on cases like this:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
3: (+f0.0) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
In this case, cmod propagation can't optimize instruction #2, because
instructions #1 and #2 have different writemasks, and we can't update
directly instruction #2 writemask because our code thinks that sel at
instruction #3 reads all four channels of the flag, when it actually
only reads .x.
So, with this patch, the previous case becames this:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null:D, vgrf40.xxxx:D, 0D
3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
Now only the x channel of the flag is used, allowing dead code
eliminate to update the writemask at the second instruction:
1: cmp.l.f0.0 vgrf40.0.x:F, vgrf0.zzzz:F, vgrf7.xxxx:F
2: cmp.nz.f0.0 null.x:D, vgrf40.xxxx:D, 0D
3: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
So now cmod propagation can simplify out #2:
1: cmp.l.f0.0 vgrf40.0.x:F, attr18.wwww:F, vgrf7.xxxx:F
2: (+f0.0.x) sel vgrf41.0.x:UD, vgrf6.xxxx:UD, vgrf5.xxxx:UD
Shader-db numbers:
total instructions in shared programs: 6235835 -> 6228008 (-0.13%)
instructions in affected programs: 219850 -> 212023 (-3.56%)
total loops in shared programs: 1979 -> 1979 (0.00%)
helped: 1192
HURT: 0
The anv_state is supposed to be a flyweight so we're not really saving
anything by using a pointer. Also, we were creating one, setting a pointer
to it, and then having it go out-of-scope which is bad.
We also have the "reserved for kick" space available. Some of my earlier
changes can probably be removed, but this is a quick fix for some of the
rarer fallout.
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Cc: <mesa-stable@lists.freedesktop.org>
Remove members
num_color_clear_attachments
has_depth_clear_attachment
has_stencil_clear_attachment
The new clear code in anv_meta_clear.c does not use them.
Fixes Crucible test "func.clear.load-clear.attachments-8".
The old clear code, when clearing attachments for
VK_ATTACHMENT_LOAD_OP_CLEAR, suffered from some fundamental bugs. The
bugs were not fixable with the old code's approach.
- It assumed that a VkRenderPass contained at most one depthstencil
attachment.
- It tried to clear all attachments (color and the sole
depthstencil) with a single instanced draw call, using the VUE
header's RenderTargetArrayIndex to specify the instance's target
color attachment. But the RenderTargetArrayIndex does not select
entries in the binding table; it only selects an array index of
a singled layered surface.
- If at least one attachment of VkRenderPass had
VK_ATTACHMENT_LOAD_OP_CLEAR,
then the old code cleared *all* attachments. This was
a consequence of using a single draw call and single pipeline for
the clear.
The new clear code fixes those bugs by making a separate draw call for
each attachment, and using one pipeline when clearing color attachments
and a different pipeline for depth attachments.
The new code, like the old code, does not clear stencil attachments. It
is left as a FINISHME.
Consistently rename bitmasks of Vulkan dynamic state to 'dynamic_mask'.
anv_meta_saved_state::dynamic_flags -> dynamic_mask
anv_meta_save(dynamic_state) -> dynamic_mask
As the functions are now exposed in anv_meta.h, let's rename them
to clarify that they are meta functions.
anv_cmd_buffer_save -> anv_meta_save
anv_cmd_buffer_restore -> anv_meta_restore
This greatly increases the pressure you can put on the driver before
create fails. Ultimately we need to let the kernel take control of
our cached BOs and just take them from us (and other clients)
directly, but this is a very easy patch for the moment.
Cc: "11.0" <mesa-stable@lists.freedesktop.org>
I thought that aliased functions didn't need to be added, but that might
only be if the function aliases something in the same {desktop,ES}
space. Resolves the dispatch sanity test failure.
Fixes: 13b19aa81 (mesa: expose support for GL_EXT_buffer_storage)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92824
Signed-off-by: Ilia Mirkin <imirkin@alum.mit.edu>
Very long line loops which spanned 3 or more vertex buffers were not
handled correctly and could result in stray lines.
The piglit lineloop test draws 10000 vertices by default, and is not
long enough to trigger this. Even 'lineloop -count 100000' doesn't
trigger the bug.
For future reference, the issue can be reproduced by changing Mesa's
VBO_VERT_BUFFER_SIZE to 4096 and changing the piglit lineloop test to
use glVertex2f(), draw 3 loops instead of 1, and specifying -count
1023.
Acked-by: Sinclair Yeh <syeh@vmware.com>
Reviewed-by: Roland Scheidegger <sroland@vmware.com>
When we emulate XOR logicop mode with blend-subtract, we need to ensure
that the fragment shader always emits white. We had this implemented
for VGPU9, but not VGPU10.
VMware bug 1545492.
Reviewed-by: Charmaine Lee <charmainel@vmware.com>
This actually stored the values as 8bit linear values in the cache,
then did another srgb->linear conversion...
We don't want to do the former (decoding 8bit srgb values to 8bit linear
completely defeats the purpose of srgb in the first place), so just decode
to 8bit srgb.
Fixes piglit.spec.ext_texture_srgb.texwrap formats-s3tc tests.