On ICL we have the src1 restriction which is applied through
fix_byte_src() and potentially changes the type of the operands from 8
to 32 bits. When this change happens, we fall into the "else if
(bit_size < 32)" case and miscompute src_type because it takes into
consideration bit_size (8) instead of the adjusted size of temp_op
(32). This results in the shader reading unused memory, giving us
mostly failures, but occasional passes due to whatever was already in
the registers we were reading.
This commit fixes a lot of dEQP subgroup i8vec2 tests on ICL, such as:
dEQP-VK.subgroups.arithmetic.compute.subgroupadd_i8vec2
This can also be verified by simply changing fix_byte_src() to apply
on all platforms.
Fixes: 5847de6e9a ("intel/compiler: don't use byte operands for src1 on ICL")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
This was causing uninitialized value to end up propagated to the
3DSTATE_DEPTH_BOUNDS packet, leading to asserts on packet
building due to the value being greater than 1.
Fixes: 939ddccb7a ("anv: Add support for depth bounds testing.")
Reviewed-by: Plamena Manolova <plamena.manolova@intel.com>
Pretty routine, we do have a hack to force swizzle alignment for !32-bit
for until we implement !32-bit the right way.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
This is less complicated than previously thought. Note we have no way of
specifying the work register count for blend shaders; it must be
strictly less than the work register count of the corresponding fragment
shader (which is fine since we force the fragment shader to report a
count of 16 with a blend shader as a major hack until we get register
pressure down for blend shaders).
TODO: pandecode the flags.
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@collabora.com>
Close input end of the pipe after data was written. Without this
fix I have seen a hang in sysfs_uevent_get(.., "OF_FULLNAME")
when key was not found.
Signed-off-by: Christian Gmeiner <christian.gmeiner@gmail.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can clear the "needs" flags once we emit a flag. And also, don't
open-code the opcode name.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
For pre-fs-dispatch texture fetch, we need to assign bary_ij to r0.x,
even if it is not used in the shader (ie. only varying use is for tex
coords). But if, for example, gl_FragCoord is used, it could get
assigned on top of bary_ij, resulting in a GPU hang.
The solution to this is two-fold: (1) the inputs/outputs rework has the
benefit of making RA realize bary_ij is a vec2, even if there are no
split/collect instructions (due to no varying fetches in the shader
itself). And (2) extend the live ranges of meta:input instructions to
the first non-input, to prevent RA from assigning the same register to
multiple inputs.
Backport note: because of (1) above, a better solution for 19.3 would be
to revert f30c256ec0.
Fixes: f30c256ec0 ("freedreno/ir3: enable pre-fs texture fetch for a6xx")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
At the ir3 level, we would assume that we could use wrmask to mask
off other components of an instruction returning a vecN when they are
not used. Which would let RA use components not written for other live
values. But this is only true for tex instructions.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Allow inputs/outputs to be vecN (ie. whatever their actual size is), and
use split to get scalar components of inputs, and collect to gather up
scalar components of outputs.
The main motivation is to simplify RA, by only having to consider split/
collect to figure out where values need to land in consecutive scalar
registers, rather than having to also deal with left/right neighbors.
Because of varying packing, and the resulting fractional location
(location_frac), to implement load_input/store_output, it is still
convenient to have a table of scalar inputs/outputs. We move this to
the compile ctx (since it is only needed for nir->ir3).
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
In almost all places, the add_sysval_input() is paired directly with a
create_input(). (The one exception is frag shader ij bary coord, and
this exception will go away in a later patch.)
So go ahead and clean this up before reworking input/output handling.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This is a driver-param (loaded from uniform), not a sysval (populated by
hw into a register). So it has no value to having a sysval slot.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We keep kill's alive w/ keeps these days, rather than a fake output.
This condition was left over from prior to that change.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
If I'm going to refactor a bit to use these meta instructions to also
handle input/output, then might as well cleanup the names first.
Nouveau also uses collect/split for names of these meta instructions,
and I like those names better.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This doesn't really work, we can't necessarily just change the outputs
to half-precision like this in anything but simple cases.
Keep the shader key entry around though, eventually with proper mediump
support we could use this with a nir pass to use lower precision frag
shader outputs when the render target format has <= 16b/component.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
The instruction has 3 src regs, so `instr->regs[0..3]` are valid, but
`instr->regs[4]` is not.
```
Test case 'dEQP-GLES31.functional.shaders.linkage.es31.tessellation.varying.rules.output_superfluous_declaration'..
==29239== Invalid read of size 8
==29239== at 0x5BE9CDC: emit_cat6 (ir3.c:841)
==29239== by 0x5BEA1BF: ir3_assemble (ir3.c:921)
==29239== by 0x5BDF0A7: ir3_shader_assemble (ir3_shader.c:133)
==29239== by 0x5BDF193: assemble_variant (ir3_shader.c:162)
==29239== by 0x5BDF407: create_variant (ir3_shader.c:215)
==29239== by 0x5BDF4DB: shader_variant (ir3_shader.c:241)
==29239== by 0x5BDF553: ir3_shader_get_variant (ir3_shader.c:257)
==29239== by 0x5BA85F7: ir3_shader_variant (ir3_gallium.c:80)
==29239== by 0x5BA7703: ir3_cache_lookup (ir3_cache.c:96)
==29239== by 0x5B8B8B3: fd6_emit_get_prog (fd6_emit.h:119)
==29239== by 0x5B8C137: fd6_draw_vbo (fd6_draw.c:186)
==29239== by 0x5BB1FBB: fd_draw_vbo (freedreno_draw.c:290)
==29239== Address 0xb97f2d0 is 0 bytes after a block of size 240 alloc'd
==29239== at 0x4848D54: malloc (in /usr/lib/aarch64-linux-gnu/valgrind/vgpreload_memcheck-arm64-linux.so)
==29239== by 0x61BD35B: ralloc_size (ralloc.c:119)
==29239== by 0x61BD41B: rzalloc_size (ralloc.c:151)
==29239== by 0x5BE599B: ir3_alloc (ir3.c:45)
==29239== by 0x5BEA583: instr_create (ir3.c:984)
==29239== by 0x5BEA5DF: ir3_instr_create2 (ir3.c:1000)
==29239== by 0x5BEE317: ir3_STLW (ir3.h:1431)
==29239== by 0x5BF12D3: emit_intrinsic_store_shared_ir3 (ir3_compiler_nir.c:903)
==29239== by 0x5BF418B: emit_intrinsic (ir3_compiler_nir.c:1802)
==29239== by 0x5BF5D07: emit_instr (ir3_compiler_nir.c:2339)
==29239== by 0x5BF603F: emit_block (ir3_compiler_nir.c:2426)
==29239== by 0x5BF624B: emit_cf_list (ir3_compiler_nir.c:2474)
==29239==
```
Probably this only triggers in non-optimized builds?
Fixes: 1f3b52ce50 ("freedreno/a6xx: Add register offset for STG/LDG")
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
We can end up with scenarios where last_fence is associated with a batch
that is flushed through some other path before needs_out_fence_fd gets
set. Resulting in returning a fence that has no backing fd.
The simplest thing is to just skip the optimization to try and avoid
no-op batches when a fence-fd is requested. This should normally be
just once a frame anyways.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Kristian H. Kristensen <hoegsberg@google.com>
If the data is uniform, then it's really a uniform copy. If the index is
uniform, then it's really a read_invocation.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Seems we can use DPP's row_mask field to get an effect similar to
modifying exec.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
It seems whatever was causing this is no longer an issue. So let's get
rid of the hack here.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
The automatically generated padding in structs contains
undefined values, force pack the structs to eliminate the
padding. Otherwise structs with the same values may generate
different hashes.
Valgrind output:
Conditional jump or move depends on uninitialised value(s)
util_fast_urem32 (fast_urem_by_const.h:71)
hash_table_search (hash_table.c:262)
_mesa_hash_table_search (hash_table.c:296)
anv_pipeline_cache_search_locked (anv_pipeline_cache.c:318)
anv_pipeline_cache_search (anv_pipeline_cache.c:335)
lookup_blorp_shader (anv_blorp.c:38)
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1112)
blorp_mcs_partial_resolve (blorp_clear.c:1205)
anv_image_mcs_op (anv_blorp.c:1742)
anv_cmd_predicated_mcs_resolve (genX_cmd_buffer.c:774)
transition_color_buffer (genX_cmd_buffer.c:1159)
cmd_buffer_end_subpass (genX_cmd_buffer.c:4840)
Uninitialised value was created by a stack allocation
blorp_params_get_mcs_partial_resolve_kernel (blorp_clear.c:1103)
Signed-off-by: Danylo Piliaiev <danylo.piliaiev@globallogic.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Otherwise, we fail validation and potentially generate invalid code.
Let's fix up the mode of the accesses to the variable.
Signed-off-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Reviewed-by: Caio Marcelo de Oliveira Filho <caio.oliveira@intel.com>