The simulator complains about using byte operands, we also have
documentation telling us.
Note that add operations on bytes seems to work fine on HW (like ADD).
Using dwords operands with CMP & SEL fixes the following tests :
dEQP-VK.spirv_assembly.type.vec*.i8.*
v2: Drop the GLK changes (Matt)
Add validator tests (Matt)
v3: Drop GLK ref (Matt)
Don't mix float/integer in MAD (Matt)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Rafael Antognolli <rafael.antognolli@intel.com> (v1)
Reviewed-by: Matt Turner <mattst88@gmail.com>
BSpec: 3017
Cc: <mesa-stable@lists.freedesktop.org>
SLICE_COMMON_CHICKEN3 is a privileged register not accesible from userspace.
This patch silences a simulator warning about it.
We don't need to add this workaround in linux kernel as the WA description
says it's fixed on latest stepping.
This reverts commit 2be60e0c73.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Aligning phys_level0_sa by the compression block dimension prior to
mipmap layout causes the layout of compressed surfaces to differ from
the sampler's expectations in certain cases. The hardware docs agree:
From the BDW PRM, Vol. 5, Compressed Mipmap Layout,
The compressed mipmaps are stored in a similar fashion to
uncompressed mipmaps [...]
The following exceptions apply to the layout of compressed (vs.
uncompressed) mipmaps:
* [...]
* The dimensions of the mip maps are first determined by applying
the sizing algorithm presented in Non-Power-of-Two Mipmaps
above. Then, if necessary, they are padded out to compression
block boundaries.
The last bullet indicates that alignment should not be done for
calculating a miplevel's dimensions, but rather for determining miplevel
placement/padding. Comply with this text by removing the extra
alignment.
Fixes some fbo-generatemipmap-formats piglit failures on all tested
platforms (SNB-KBL).
v2:
- Note fixed platforms.
- Update some consumers via a helper function.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Prepare for a bug fix by adding and using helpers which convert
isl_surf::logical_level0_px and isl_surf::phys_level0_sa to units of
surface elements.
v2:
- Update iris (Ken).
- Update anv.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
This rewrites the ddy in EXECUTE_4 mode with a loop to make it more
obvious what is going on and also sets the group each of the 4 threads
in the groups are supposed to execute.
Fixes the following CTS tests :
dEQP-VK.glsl.derivate.dfdyfine.dynamic_*
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Co-Authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Fixes: 2134ea3800 ("intel/compiler/fs: Implement ddy without using align16 for Gen11+")
Sampler state prefetching is broken on Gen11, and WA_160668216 says
to disable it. Apparently sampler state prefetching also has basically
zero impact on performance, so we don't need to worry there.
i965, anv, and iris already handle this correctly, but we missed BLORP.
Ideally the kernel should globally disable this by writing SARCHKMD, at
which point we wouldn't have to worry about it. But let's be defensive
and handle it ourselves too.
v2: separate out from BTP workaround in case we change that eventually
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com> [v1]
When immutable samplers are set we call write_image_view with a NULL
image view. This causes issues on IVB where we have to fake texture
swizzling.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110999
Fixes: d2aa65eb18 "anv: Emulate texture swizzle in the shader when..."
When HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED is used, then the platform
gralloc module will select a format based on the usage flags provided by
the camera device and the other endpoint of the stream.
The patch fixes crash in vulkan when the test is run with camera stream
set to HAL_PIXEL_FORMAT_IMPLEMENTATION_DEFINED.
Test: android.graphics.cts.CameraVulkanGpuTest#testCameraImportAndRendering
on chromebook with camera HAL3.
v2: use AHARDWAREBUFFER_FORMAT_IMPLEMENTATION_DEFINED and take
AHARDWAREBUFFER_USAGE_CAMERA_MASK in to account (Gurchetan)
Fixes: f1654fa7e3 "anv/android: support creating images from external format"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Signed-off-by: Gurchetan Singh <gurchetansingh@chromium.org>
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Gurchetan Singh <gurchetansingh@chromium.org>
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
This is the preferred clipping mode since it doesn't mean your points
disappear the moment part of the point crosses over the edge of the
viewport and that lines have weird endpoints at viewport edges. We've
just never bothered to hook it up until now.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
In workloads where there is a lot of geometry drawn that crosses over
the edge of the viewport, this should substantially improve clipper
performance. Not really sure why it's taken 3 years to turn it on but
we never got around to it.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
We can rely on only one kind of synchronization object (drm-syncobj)
when it is available. This reduces the number of file descriptors we
use in our implementation.
This will be required later for timeline semaphores implementation, at
this point we won't ever want to use anything else but syncobjs.
v2: Only use has_syncobj for semaphores (Jason)
v3: Only has_syncobj in assert on semaphores in QueueSubmit (Jason)
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
For the block BLOCK_TEXEL_VIEW_COMPATIBLE case, this didn't matter
because the flags were already more-or-less what we wanted. However,
for gen7 stencil shadow images, it still had ISL_SURF_USAGE_STENCIL_BIT
so we were getting W-tiled which isn't what we want for the shadow. By
passing just ISL_SURF_USAGE_TEXTURE_BIT (and CUBE if we care), we now
get something that's actually texturable.
Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
Copies to a shadow image happen during a VkCmdPipelineBarrier or at
subpass transitions. We could potentially be a bit more conservative
but these transitions shouldn't happen often and it's better to have our
bases covered.
Fixes: f3ea0cf828 "anv: Add stencil texturing support for gen7"
This should fix floating-point border color on all gen7 HW. Integer is
still thoroughly busted on gen7 because it doesn't exist on IVB and it's
crazy on HSW.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Whenever stencil texturing is not required (most of the time), we can
use VK_EXT_separate_stencil_usage to only create the shadow image when
VK_IMAGE_USAGE_SAMPLED_BIT is required for stencil. Of course, this
depends on applications to use the extension but hopefully DXVK and
similar translators are doing so and that covers most of the apps.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Intel hardware didn't get support for sampling from W-tiled (required
for stencil) images until Broadwell so we can't directly sample from
stencil. Instead, if we want to support stencil texturing on gen7
hardware, we have to keep a texture-capable shadow copy around and use
BLORP to update when stencil changes. The one thing this commit does
not implement is self-dependencies with stencil input attachments.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99493
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The main motivation for this change is API ergonomics: most operations
on dynarrays are really on elements, not on bytes, so it's weird to have
grow and resize as the odd operations out.
The secondary motivation is memory safety. Users of the old byte-oriented
functions would often multiply a number of elements with the element size,
which could overflow, and checking for overflow is tedious.
With this change, we only need to implement the overflow checks once.
The checks are cheap: since eltsize is a compile-time constant and the
functions should be inlined, they only add a single comparison and an
unlikely branch.
v2:
- ensure operations are no-op when allocation fails
- in util_dynarray_clone, call resize_bytes with a compile-time constant element size
v3:
- fix iris, lima, panfrost
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
The other sources of the bcsel behave like the sources of an and or
other logical operation. However, source zero behaves differently.
It is evaluated as a Boolean, so it needs to be resolved.
No shader-db changes, but the tests mentioned in the bug get a couple
instructions added back.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110857
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
According to the Vulkan spec, inline uniform blocks are not allowed
to be updated through vkCmdPushDescriptorSetKHR().
These are the spec quotes from "13.2.1. Descriptor Set Layout"
that are relevant for this case:
"VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR specifies
that descriptor sets must not be allocated using this layout, and
descriptors are instead pushed by vkCmdPushDescriptorSetKHR."
"If flags contains
VK_DESCRIPTOR_SET_LAYOUT_CREATE_PUSH_DESCRIPTOR_BIT_KHR, then all
elements of pBindings must not have a descriptorType of
VK_DESCRIPTOR_TYPE_INLINE_UNIFORM_BLOCK_EXT".
There is no explicit mention in vkCmdPushDescriptorSetKHR() to forbid
this case but it is implied in the creation of the descriptor set
layout as aforementioned.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We were dropping "/' around arguments grouped together.
This was triggering failures with :
$ ./framemetrics -g "Memory Writes Distribution Gen9" -o /tmp/output.csv -f ./my.trace 10 11
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
This restriction was accidentally added to the BSpec/PRM as an
unrestricted restriction starting with the HSW docs and it was never
removed. However, it only ever applied to HSW and actually potentially
causes problems on BDW and above where we have mipmapped fast-clears.
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
On CNL+, the clear color struct is composed of RGBA channel values and
fields which are either reserved by the HW or used to control
fast-clears. Currently anv initializes the channel values to zero and
allows the other fields to be undefined.
Satisfy the MBZ field requirements by removing an optimization that
doesn't hold true for CNL+ and pulling in the number of dwords to
initialize from ISL.
Cc: <mesa-stable@lists.freedesktop.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
I recently discovered that the following code lead to valgrind errors:
struct isl_swizzle swizzle = ISL_SWIZZLE_IDENTITY;
VALGRIND_CHECK_MEM_IS_DEFINED(&swizzle, sizeof(swizzle));
which is surprising, because struct isl_swizzle is simply:
struct isl_swizzle {
enum isl_channel_select r:4;
enum isl_channel_select g:4;
enum isl_channel_select b:4;
enum isl_channel_select a:4;
};
and the above code initializes all of them with a C99 initializer.
Iván Briano reminded me that C99 initializers don't necessarily zero
padding. A quick inspection revealed that sizeof(struct isl_swizzle)
was 4 (rather than the expected 2). Ian Romanick suggested changing
it to uint16_t, since this is essentially dicing up an unsigned, and
that worked.
This patch marks enum isl_channel_select packed, changing its size
from 4 bytes to 1 byte. This then makes struct isl_swizzle 2 bytes,
with no bogus padding fields. This eliminates valgrind undefined
memory warnings.
These isl_swizzle values become part of our BLORP blit program keys,
which are then hashed. This undefined padding was being included in
the hashing, possibly leading to issues. I originally saw this error
when running KHR-GL45.texture_size_promotion.functional in iris under
valgrind.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
AHARDWAREBUFFER_FORMAT_Y8Cb8Cr8_420 is an implementation defined
flexible YUV format. Most of the times, it's NV12 or YV12.
On Intel, NV12 is preferred since it can be used by the display
engine.
This API adds a dependency between gralloc and buffer consumers,
unfortunately. Right now, the code seems to work for i915 gralloc,
but not cros_gralloc. Add a preprocessor flag to fix this.
TEST=android.graphics.cts.MediaVulkanGpuTest#testMediaImportAndRendering
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Checking isl_fmt returned value in assert seems appropriate
instead of format variable.
Fixes: f1654fa7e3 "anv/android: support creating images from external format"
Signed-off-by: Nataraj Deshpande <nataraj.deshpande@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
This is the same as the need_dest parameter to
prepare_alu_destination_and_sources. This allows us to not change the
register that is expected to hold an result if an instruction is
re-emitted. This is particularly a problem if the re-emitted
instruction is a partial write. A later patch will use this feature.
No shader-db changes on any Intel platform.
v2: Don't do the Boolean resolve when there is no destination. If the
ALU instruction didn't write a register, there's nothing to resolve.
This replaces an earlier patch "intel/fs: Allocate dummy destination
register when need_dest is false".
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
There were two errors. First, the pass could propagate conditional
modifiers from an instruction that writes on flag register to an
instruction that writes a different flag register. For example,
cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
cmp.nz.f0.1(16) null:F, vgrf6:F, vgrf5:F
could be come
cmp.nz.f0.0(16) null:F, vgrf6:F, vgrf5:F
Second, if an instruction writes f0.1 has it's condition propagated, the
modified instruction will incorrectly write flag f0.0. For example,
linterp(16) vgrf6:F, g2:F, attr0:F
cmp.z.f0.1(16) null:F, vgrf6:F, vgrf5:F
(-f0.1) discard_jump(16) (null):UD
could become
linterp.z.f0.0(16) vgrf6:F, g2:F, attr0:F
(-f0.1) discard_jump(16) (null):UD
None of these cases will occur currently. The only time we use f0.1 is
for generating discard intrinsics. In all those cases, we generate a
squence like:
cmp.nz.f0.0(16) vgrf7:F, vgrf6:F, vgrf5:F
(+f0.1) cmp.z(16) null:D, vgrf7:D, 0d
(-f0.1) discard_jump(16) (null):UD
Due to the mixed types and incompatible conditions, this sequence would
never see any cmod propagation. The next patch will change this.
No shader-db changes on any Intel platform.
v2: Fix typo in comment in test case subtract_delete_compare_other_flag.
Noticed by Caio.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>