The issue here was discovered by a set of Vulkan CTS tests:
dEQP-VK.glsl.derivate.*.dynamic_*
These tests use ballot ops to construct a branch condition that takes
the same path for each 2x2 quad but may not be uniform across the whole
subgroup. They then tests that derivatives work and give the correct
value even when executed inside such a branch. Because the derivative
isn't executed in uniform control-flow and the values coming into the
derivative aren't smooth (or worse, linear), they nicely catch bugs that
aren't uncovered by simpler derivative tests.
Unfortunately, these tests require Vulkan and the equivalent GL test
would require the GL_ARB_shader_ballot extension which requires int64.
Because the requirements for these tests are so high, it's not easy to
test on older hardware and the bug is only proven to exist on gen7;
gen4-6 are a conjecture.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Matt Turner <mattst88@gmail.com>
It'll grow further, and we'd like to avoid adding an additional
parameter to fs_generator() for each new piece of data.
v2 (idr): Rebase on 17 months. Track a visitor instead of a cfg.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Matt Turner <mattst88@gmail.com>
It's kind-of an anomaly that the Intel drivers are still treating
gl_FragCoord as an input. It also makes zero sense because we have to
special-case it in the back-end.
Because ANV is the only user of nir_lower_wpos_center, we go ahead and
just update it to look for nir_intrinsic_load_frag_coord as part of this
patch.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
On Haswell, the format works but it doesn't properly do an sRGB decode.
It appears to act identically to R8G8B8_UNORM. Only Vulkan uses this
format so this only affects Vulkan on HSW.
Cc: mesa-stable@lists.freedesktop.org
Reviewed-by: Eric Engestrom <eric@engestrom.ch>
As per previous commit, Meson doesn't support using uninstalled libs,
they're simply not ready until `ninja install` is ran, so delete them.
Suggested-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> # for anv
Reviewed-by: Eric Anholt <eric@anholt.net> # for tu
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> # for radv
It's totally implementable, it's just that the plumbing is a bit
different and we never hooked it up. Don't advertise a broken feature.
Fixes: 36ee2fd61c "anv: Implement the basic form of VK_EXT_transform_feedback"
This performs predicated MI_STORE_REGISTER_MEM commands, assuming that
the condition is already loaded into MI_PREDICATE_DATA.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
We might want to resolve something to be in a particular register,
so we can access it outside of the gen_mi framework...but it may already
be in that register, at which point there's no work to do.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Report VK_EXTERNAL_MEMORY_HANDLE_TYPE_HOST_ALLOCATION_BIT_EXT as
supported for images. It was being shown supported for buffers, but not
images.
Fixes: 69cc6272fb ("anv: Implement VK_EXT_external_memory_host")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This will effectively enable the optimization in anv.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This allows us to drop legacy userclip plane handling in both the vec4
and FS backends, and simplifies a few interfaces.
v2 (Jason Ekstrand):
- Move brw_nir_lower_legacy_clipping to brw_nir_uniforms.cpp because
it's i965-specific.
- Handle adding the params in brw_nir_lower_legacy_clipping
- Call brw_nir_lower_legacy_clipping from brw_codegen_vs_prog
Co-authored-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The rules for gl_SubgroupSize in Vulkan require that it be a constant
that can be queried through the API. However, all GL requires is that
it's a uniform. Instead of always claiming that the subgroup size in
the shader is 32 in GL like we have to do for Vulkan, claim 8 for
geometry stages, the maximum for fragment shaders, and the actual size
for compute.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Instead of lowering the subgroup size so early, wait until we have more
information. In particular, we're going to want different subgroup
sizes from different stages depending on the API. We also defer
lowering of subgroup masks because the ge/gt masks require the subgroup
size to generate a subgroup mask.
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>
Make sure that a <group> tag within another <group> tag work just fine.
v2: rename 'halfbyte' to 'byte' to match the size (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Now we can decode a <group> tag inside another <group> tag, and properly
print its indices and content.
v2: Use push/pop stack to fields, groups and iters (Lionel).
v3: Add assert(iter->level < DECODE_MAX_ARRAY_DEPTH) (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
We currently use the group->next pointer to iterate through the <group>
tags. This change them to be a type of field, so we can descend into
them while iterating, and then go back to the original position. Will be
useful when we want to decode <group>'s inside <group>'s, and when there
are more <field>'s after a <group> tag.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
A gen_group (group in most of the code) can be of several types:
- instruction
- struct
- register
- group (?!?)
The <group> tag actually represents an array of elements. So at least
in our code, lets call it an array to avoid confusion with gen_group.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Refactor the code from gen_spec_load_from_path() into a separate
function, that can be used with a xml file that doesn't fit the genX.xml
filename format.
Will be used soon for implementing unit tests for gen_decoder.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
When using gen_spec_load_from path, only abort decoding if the read
length is 0. Previously, we were aborting if finding an EOF, even if
something was read from the file.
Also only kill the decoded file if no commands or structs were found,
and print a message in such case.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
This doesn't fix any bug at the moment because the next statement is
'true' which happens to be APIMODE_D3D, but if that changes it could.
The fixes tags is as far I could go but the error predates it (2016 is
probably far enough).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 8db6f2e6eb ("anv/pipeline: Roll genX_pipeline_util.h into genX_pipeline.c")
Reviewed-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Normally, we haven't worried too much about stack sizes as Linux tends
to be fairly friendly towards large stacks. However, when running DXVK
apps under wine, we're suddenly subject to Windows' more stringent stack
limitations and can run out of space more easily. In particular, some
of the shaders in Elite Dangerous: Horizons have quite a few registers
and the arrays in split_virtual_grfs are large enough to blow a 1 MiB
stack leading to crashes during shader compilation.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108662
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Cc: mesa-stable@lists.freedesktop.org
anv_format is supposed to have a pointer back to the associated
VkFormat, we were missed this for depth/stencil formats.
This doesn't fix anything afaict, but will be needed for future
changes.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 465de47bad ("anv: associate vulkan formats with aspects")
Acked-by: Jason Ekstrand <jason@jlekstrand.net>
Spec says :
"timestampComputeAndGraphics specifies support for timestamps on all
graphics and compute queues. If this limit is set to VK_TRUE, all
queues that advertise the VK_QUEUE_GRAPHICS_BIT or
VK_QUEUE_COMPUTE_BIT in the VkQueueFamilyProperties::queueFlags
support VkQueueFamilyProperties::timestampValidBits of at least 36."
On gen7+ this should be true (we only have 32bits of timestamp on
gen6 and below).
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 802f00219a ("anv/device: Update features and limits")
Reported-by: Timothy Strelchun <timothy.strelchun@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In many cases, the compiler can just copy-prop the strided MOV whereas
the conversion is a bit trickier. This cuts 5% of the instructions off
of one particular Vulkan CTS test which does lots of load_ssbo.
Reviewed-by: Matt Turner <mattst88@gmail.com>
This fixes some validation errors generated by certain D->W conversions
but is likely not a full solution. Calculating an actual register
stride is a far more complex problem in general and should probably be
handled by the brw_fs_generator.
Reviewed-by: Matt Turner <mattst88@gmail.com>
When running on ICL the
dEQP-VK.ssbo.phys.layout.random.16bit.scalar.13 needs more than 1M for
the shader, so bump it.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Now that the 64-bit lowering passes do a complete lowering in one go, we
don't need to loop anymore. We do, however, have to ensure that int64
lowering happens after double lowering because double lowering can
produce int64 ops.
Reviewed-by: Eric Anholt <eric@anholt.net>
In 6ce8592836 we started looking at the dynamic stencil state and
disabling stencil writes when the stencil mask is zero. Unfortunately,
we never updated the PMA fix code accordingly so 3DSTATE_WM_DEPTH_STENCIL
and the PMA fix were getting out-of-sync causing hangs.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109203
Fixes: 6ce8592836 "anv: Disable stencil writes when both write..."
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>