There are a lot of optimizations in opt_algebraic that match ('ine', a,
0), but there are almost none that match i2b. Instead of adding a huge
pile of additional patterns (including variations that include both ine
and i2b), always lower i2b to a != 0.
At this point in the series, it should be impossible for anything to
generate i2b, so there /should not/ be any changes.
The failing test on d3d12 is a pre-existing bug that is triggered by
this change. I talked to Jesse about it, and, after some analysis, he
suggested just adding it to the list of known failures.
v2: Don't rematerialize i2b instructions in dxil_nir_lower_x2b.
v3: Don't rematerialize i2b instructions in zink_nir_algebraic.py.
v4: Fix zink-on-TGL CI failures by calling nir_opt_algebraic after
nir_lower_doubles makes progress. The latter can generate b2i
instructions, but nir_lower_int64 can't handle them (anymore).
v5: Add back most of the hunk at line 2125 of nir_opt_algebraic.py. I
had accidentally removed the f2b(bf2(x)) optimization.
v6: Just eliminate the i2b instruction.
v7: Remove missed i2b32 in midgard_compile.c. Remove (now unused)
emit_alu_i2orf2_b1 function from sfn_instr_alu.cpp. Previously this
function was still used. 🤷
No shader-db changes on any Intel platform.
All Intel platforms had similar results. (Ice Lake shown)
Instructions in all programs: 141165875 -> 141165873 (-0.0%)
Instructions helped: 2
Cycles in all programs: 9098956382 -> 9098956350 (-0.0%)
Cycles helped: 2
The two Vulkan shaders are helped because of the "new" (('b2i32',
('ine', ('ubfe', a, b, 1), 0)), ('ubfe', a, b, 1)) algebraic pattern.
Acked-by: Jesse Natalie <jenatali@microsoft.com> [earlier version]
Acked-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Tested-by: Daniel Schürmann <daniel@schuermann.dev> [earlier version]
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/15121>
With descriptor buffers, we have no way to know how big the descriptor
set actually is, so we have no idea how many constants we can safely
push. If we use a UBO then it will still get pushed, because we normally
assume that we can freely access UBOs without any fear of faults due to
the range checking. This does the easiest thing of using raw pointer
loads, although performance will fall off a cliff, because we don't have
many better options.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19849>
The immediate offset is in units of bytes, whereas the register offset
is in dwords. We need to compensate for that.
Also, fix an off-by-one when checking the range - the offset field is 13
bits, but the sign bit means we can only represent up to 1 << 12 in
bytes or 1 << 10 in dwords.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19849>
FS inputs are not directly loaded into regs, but require additional
instruction to do so. So in order to print in which reg the input
is loaded we have to scan the shader for the instruction
which loads the input.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20247>
While running VK-CTS with valgrind, the application hit the max
thread count of 500. After further investigation, this was due to
multiple instances being created with the disk cache spinning up
worker threads which wouldn't be cleaned as disk_cache_destroy
wasn't being called.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20178>
The `coord_offset` pass is responsible for upgrading any eligible
texture loads into prefetches, but a texture prefetch's capabilities
are limited and cannot handle any interpolation modes aside from
`smooth`.
An exception is carved out for `flat` interpolation modes, but this
doesn't exclude upgrading `noperspective` texture loads and results
in perspective-corrected samples being provided that can severely
break applications depending on this behaviour.
Fixes incorrect lighting projection on Super Mario Odyssey on
Skyline Emulator.
Fixes incorrect dirt texture mapping on Portal 2 trace on Turnip and
Zink on Turnip.
Fixes incorrect lighter shadowing on Half Life 2 trace on Turnip.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19842>
`coord_offset` is called on the source of `alu` instructions and
it returns -1 for failures, this not explicitly checked for and
as a result the fetch can incorrectly be upgraded to a prefetch
when it isn't appropriate to do so.
Signed-off-by: Mark Collins <mark@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19842>
Otherwise, making a CS using the memory will use the uninitialized .map
value (when checking the size of the CS in in begin's tu_cs_is_empty()
check), causing valgrind noise in
dEQP-VK.binding_model.descriptorset_random.sets4.dynindexed.ubolimitlow.sbolimitlow.sampledimghigh.lowimgsingletex.iublimitlow.nouab.vert.noia.0
(thanks to vi_info->vertexBindingDescriptionCount==0).
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20173>
When a pipeline has dynamic logicop state or blend state, we defer lrz
write decision to tu6_calculate_lrz_state. As such,
tu6_calculate_lrz_state should look at both states when either of them
is dynamic.
Fixes dEQP-GLES2.functional.fragment_ops.interaction.basic_shader.21 on
angle, which uses dynamic logicop state and static blend state with
blending enabled.
Fixes: c8c7154c2e ("tu: Implement extendedDynamicState3ColorBlendEnable")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20136>
STL/LDL have 13 bits to store imm offset. However the most significant
bit in the offset is a sign bit, so the positive offset is limited by
12 bits.
nir_opt_offsets only has the upper limit and doesn't deal with
negative offsets, so shared_max should be changed to `(1 << 12) - 1`.
The issue was found in "Monster Hunter: World".
Fixes: 0b2da9d795
("ir3: Limit the maximum imm offset in nir_opt_offset for shared vars")
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20100>
This could result in hangs if the entire descriptor set was inline
uniforms. Fixes
dEQP-VK.binding_model.descriptorset_random.sets4.dynindexed.ubolimitlow.nosbo.nosampledimg.outimgonly.iublimitlow.nouab.comp.noia.0
after 0a0a04bd made us prefetch descriptors again and uncovered this.
Fixes: 37cde2c6 ("tu: Rewrite inline uniform implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20101>
This probably wasn't noticed earlier because tests using sRGB storage
images didn't exist, and we didn't know whether this works, but this
fixes dEQP-VK.image.store.without_format.2d.*_srgb which also proves
that the bit works.
Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20060>
It's not in the key, so it randomly may or may not be present, and if it
is present then we don't actually save/restore the contents, so we will
save/restore random pointer values from the last run. Turnip already
disables searching the shader cache when assembly is requested, but
still wrote the final ir3_shader_variant which resulted in trying to
save random stale pointers when saving off the executable if a
subsequent compile hit that cache entry.
This fixes flakes in
dEQP-VK.pipeline.pipeline_library.shader_module_identifier.pipeline_from_id.*
for me.
Fixes: 56909868cd ("turnip: implement VK_KHR_pipeline_executable_properties")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20056>
We're reporting 2048 for VkPhysicalDeviceLimits::maxImageArrayLayers on
Turnip, so we should be able to use 2048 for OpenGL as well. And that's
the minimum required value for OpenGL 4.1 support.
According to http://vulkan.gpuinfo.org/, it seems like values of 2048
should be possible for at least as low as some Adreno 4xx GPUs. But
since we don't support recent GL versions on those, we this won't make a
big difference. So let's leave that up to someone who actually knows
what they're doing!
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19780>
This seems to have been triggered by some recent CTS changes which
changed the random number generation. I'm seeing context faults in
dEQP-VK.binding_model.descriptorset_random.sets4.dynindexed.ubolimitlow.sbolimitlow.sampledimghigh.lowimgnotex.iublimitlow.nouab.comp.noia.0
that are fixed by this.
Fixes: 37cde2c634 ("tu: Rewrite inline uniform implementation")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20039>
The source code for rddecompiler includes adreno_common.xml.h, which is
a generated header. In order to ensure that the header has been written
when compiling rddecompiler.c, we need a dependency here.
Fixes: 03d80e0a6d ("freedreno/decode: Add 'rddecompiler' tool")
Acked-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20003>
Decompiles a single cmdstream from .rd into compilable C source.
Given the address space bounds the generated program creates
a new .rd which could be used to override cmdstream with 'replay'.
Generated .rd is not replayable on its own and depends on buffers
provided by the source .rd.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19444>
This adds an option to override a single cmdstream while replaying
.rd capture. Cmdstream for override is stored in the same .rd format,
where there is only one RD_CMDSTREAM_ADDR section and any amount
of buffers.
Instead of using provided .rd file, 'replay' calls external program
to generate it first, in order to be able to pass the range of
GPU addresses available for the new buffers.
Usage example:
./replay --override=13 --generator=~/cmdstream_gen
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19444>