v2 (Jason):
- Use nir_spirv_supported_extensions to check if the feature is enabled.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
So far, input_reads was a bitmap tracking which vertex input locations
were being used.
In OpenGL, an attribute bigger than a vec4 (like a dvec3 or dvec4)
consumes just one location, any other small attribute. So we mark the
proper bit in inputs_read, and also the same bit in double_inputs_read
if the attribute is a dvec3/dvec4.
But in Vulkan, this is slightly different: a dvec3/dvec4 attribute
consumes two locations, not just one. And hence two bits would be marked
in inputs_read for the same vertex input attribute.
To avoid handling two different situations in NIR, we just choose the
latest one: in OpenGL, when creating NIR from GLSL/IR, any dvec3/dvec4
vertex input attribute is marked with two bits in the inputs_read bitmap
(and also in the double_inputs_read), and following attributes are
adjusted accordingly.
As example, if in our GLSL/IR shader we have three attributes:
layout(location = 0) vec3 attr0;
layout(location = 1) dvec4 attr1;
layout(location = 2) dvec3 attr2;
then in our NIR shader we put attr0 in location 0, attr1 in locations 1
and 2, and attr2 in location 3 and 4.
Checking carefully, basically we are using slots rather than locations
in NIR.
When emitting the vertices, we do a inverse map to know the
corresponding location for each slot.
v2 (Jason):
- use two slots from inputs_read for dvec3/dvec4 NIR from GLSL/IR.
v3 (Jason):
- Fix commit log error.
- Use ladder ifs and fix braces.
- elements_double is divisible by 2, don't need DIV_ROUND_UP().
- Use if ladder instead of a switch.
- Add comment about hardware restriction in 64bit vertex attributes.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We use *64*_PASSTHRU formats to upload vertex attributes of 64 bits
to avoid conversions. From the BDW PRM, Volume 2d, page 586
(VERTEX_ELEMENT_STATE):
"When SourceElementFormat is set to one of the *64*_PASSTHRU
formats, 64-bit components are stored in the URB without any
conversion. In this case, vertex elements must be written as 128
or 256 bits, with VFCOMP_STORE_0 being used to pad the output
as required. E.g., if R64_PASSTHRU is used to copy a 64-bit Red
component into the URB, Component 1 must be specified as
VFCOMP_STORE_0 (with Components 2,3 set to VFCOMP_NOSTORE)
in order to output a 128-bit vertex element, or Components 1-3 must
be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex
element. Likewise, use of R64G64B64_PASSTHRU requires Component 3
to be specified as VFCOMP_STORE_0 in order to output a 256-bit vertex
element."
v2,v3 (Jason):
- Don't delete unused formats.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
SPIR-V does not have special opcodes for DF conversions. We need to identify
them by checking the bit size of the operand and the result.
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
This function returns the nir_op corresponding to the conversion between
the given nir_alu_type arguments.
This function lacks support for integer-based types with bit_size != 32
and for float16 conversion ops.
v2:
- Improve readiness of the code and delete cases that don't happen now (Jason)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
We need to pick two 32-bit values per component to perform the right shuffle operation.
v2 (Jason):
- Add assert to check matching bit sizes (Jason)
- Simplify the code to pick components (Jason)
v3:
- Switch on bit_size once (Jason)
- Add comment to explain the constant value for unused components (Erik)
Signed-off-by: Samuel Iglesias Gonsálvez <siglesias@igalia.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
In Vulkan, we'll compile the TCS and TES at the same time, so I can just
pass the TCS output VUE map to brw_compile_tes as the TES input VUE map.
So, we only need to do this in GL. Move it to the GL-specific layer.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
The vertex order is either clockwise or counterclockwise. We can just
store a "ccw" boolean rather than GLenum values. I don't want to use
GLenums in a Vulkan driver, and even in GL a simple boolean works fine.
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
I apparently broke mark_whole_variable in ir_set_program_inouts.
It was passing a type that wasn't var->type, so the wrapper didn't
work out. It's all broken, revert it and start over.
Fixes all kinds of things on other drivers.
Revert "glsl: Make is_fixed_function_array actually check for varyings."
This reverts commit 42699e1271.
Revert "glsl: Mark whole variable used for ClipDistance and TessLevel*."
This reverts commit 5c580e64cc.
Revert "glsl: Override the # of varying slots for ClipDistance and TessLevel*."
This reverts commit 8b5749f65a.
Revert "glsl: Create and use a new ir_variable::count_attribute_slots() wrapper."
This reverts commit 6aa5cb34d0.
We can't check VARYING_SLOT_* locations until we've determined that
the variable is actually a varying.
Fixes assert failures in drivers which actually use this path,
such as radeonsi and i915.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99314
Signed-off-by: Kenneth Graunke <kenneth@whitecape.org>
Marking operations as redundant if they are equal to the base
range is fine when the tree structure is something like this:
max
/ \
max b
/ \
3 max
/ \
3 a
But the opt falls apart with a tree like this:
max
/ \
max max
/ \ / \
3 a b 3
The problem is that both branches are treated the same: descending in
the left branch will prune the constant, and then descending the right
branch will prune the constant there as well, because limits[0] wasn't
updated to take the change on the left branch into account, and so we
still get [3,\infty) as baserange.
In order to fix the bug we just disable the marking of redundant expressions
when they match the baserange.
NIR algebraic opt will clean up the first tree for anyway, hopefully
other backends are smart enough to do this also.
Cc: "13.0" <mesa-stable@lists.freedesktop.org>
Reviewed-by: Nicolai Hähnle <nicolai.haehnle@amd.com>
We run this after nir_lower_vars_to_ssa so that as many load/store_var
intrinsics as possible before copy_prop_vars executes. This is because the
pass isn't particularly efficient (it does a lot of linear walks of a
linked list) so we'd like as much of the work as possible to be done before
copy_prop_vars runs.
Shader DB results on Sky Lake:
total instructions in shared programs: 12020290 -> 12013627 (-0.06%)
instructions in affected programs: 26033 -> 19370 (-25.59%)
helped: 16
HURT: 13
total cycles in shared programs: 137772848 -> 137549012 (-0.16%)
cycles in affected programs: 6955660 -> 6731824 (-3.22%)
helped: 217
HURT: 237
total loops in shared programs: 3208 -> 3208 (0.00%)
loops in affected programs: 0 -> 0
helped: 0
HURT: 0
total spills in shared programs: 4112 -> 4057 (-1.34%)
spills in affected programs: 483 -> 428 (-11.39%)
helped: 2
HURT: 0
total fills in shared programs: 5519 -> 5102 (-7.56%)
fills in affected programs: 993 -> 576 (-41.99%)
helped: 2
HURT: 0
LOST: 0
GAINED: 0
Broadwell had similar results. On older hardware, the impact isn't as
large because they don't advertise GL 4.5. Of the hurt programs, all but
one are hurt by a single instruction and the one is hurt by 3 instructions.
All of the helped programs, on the other hand, are helped by at least 3
instructions and one kerbal space program shader is helped by 44.59%.
The real star of the show, however, is the Gl43CSDof synmark2 benchmark
which has two shaders which are cut by 28% and 40% and the over-all runtime
performance of the benchmark on my Sky Lake laptop is improved by around
25-30% (it's a bit hard to be exact due to thermal throttling).
Reviewed-by: Timothy Arceri <timothy.arceri@collabora.com>
Because border color is handled pre-swizzle, when we move the alpha
channel around in the format, the OPAQUE_BLACK border colors don't work
correctly on B4G4R4A4_UNORM_PACK16 with the hack. This fixes the
following Vulkan CTS tests on Broadwell:
dEQP-VK.pipeline.sampler.view_type.2d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.1d_array.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.2d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.1d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
dEQP-VK.pipeline.sampler.view_type.3d.format.b4g4r4a4_unorm_pack16.address_modes.all_mode_clamp_to_border_opaque_black
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Cc: "13.0" <mesa-stable@lists.freedesktop.org>