Indirect addressing(vx1 and vxh) not supported with UB/B datatype for
src0, so we need to change the data type for both dest and src0.
This fixes following tests cases on Xe2+
- dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_16*
- dEQP-VK.spirv_assembly.instruction.compute.8bit_storage.push_constant_8_to_32*
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>
We can do CSEL on F, HF, *W, and *D on Gfx11+. Gfx9 can only do F.
We can lower unsupported types to CMP+CSEL, allowing us to use CSEL
in the IR and not worry about the limitations.
Rework: (Sagar)
- Update validation pass for CSEL
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29316>
Coverity has spotted a place where we could in theory overflow. In
reality it wont happen as the potential overflow is a bitfield with a
maximum of two values. Add an `assume()` statement to help out the
compiler and document our assumption.
fixes: dc1aedef2b
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29825>
Reorganize the code to make clearer all the lowering cases:
(a) Single invocation workgroup. Index and IDs are all zero.
(b) Local ID provided by hardware.
(c) Local Index provided by the hardware. Depending on the case this
might not be the final local index, e.g. heuristics for tile.
(d) Neither provided by the hardware.
Case (c) is new and supported by Mesh/Task shaders. At the moment the
nir_lower_compute_system_values handle lowering of LocalID for
Task/Mesh, but a later patch will flip that on ANV.
This will make the Task/Mesh use the same lowering as Compute shaders.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29828>
Looks like some of the tests uses the bias which does not fit into half
float parameter, so it's better to use float param for sample_b.
If we have cube arrays, we anyway combine BIAS and array index properly
so we don't have to worry about the first parameter.
This fixes: GTF-GL46.gtf21.GL3Tests.texture_lod_bias.texture_lod_bias_clamp_m_g_M
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29533>
Function anv_physical_device_try_create() creates the devinfo variable
and then at some point it copies its contents to device->info:
device->info = devinfo;
Much much later we're calling:
intel_common_update_device_info(fd, &devinfo);
... which is updating devinfo but not device->info. As a consequence,
we're only creating one queue, as engine_class_supported_count[klass]
is zero for everybody.
Fixes: 5b8b4f7878 ("intel/dev: Add engine_class_supported_count to intel_device_info")
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29927>
When the destination of both instructions is NULL and the conditional
modifier matches, operands_match (by way of instructions_match) will
only test the first two operands. This can result in bad CSE
happening.
This is a very, very narrow edge case.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
This one is a bit more complex in that we need to handle 3-source
commutative opcodes. But it's also quite useful:
fossil-db results on Alchemist (A770):
Instrs: 151659750 -> 150164959 (-0.99%); split: -0.99%, +0.01%
Cycles: 12822686329 -> 12574996669 (-1.93%); split: -2.05%, +0.12%
Subgroup size: 7589608 -> 7589592 (-0.00%)
Send messages: 7375047 -> 7375053 (+0.00%); split: -0.00%, +0.00%
Loop count: 46313 -> 46315 (+0.00%); split: -0.01%, +0.01%
Spill count: 110184 -> 54670 (-50.38%); split: -50.79%, +0.41%
Fill count: 213724 -> 104802 (-50.96%); split: -51.43%, +0.47%
Scratch Memory Size: 9406464 -> 3375104 (-64.12%); split: -64.35%, +0.23%
Our older Shadow of the Tomb Raider fossil is particularly helped with
over a 90% reduction in scratch access (spills, fills, and scratch
size). However, benchmarking in the actual game shows no change in
performance. We're thinking the game's shaders have been updated since
our capture.
Ian noted that there was a bug here where we'd accidentally CSE two ADD3
instructions with null destinations and different src[2] that couldn't
be dead code eliminated due to conditional mods. However, this is only
a bug in the new cse_defs pass so we don't need to nominate this for
stable branches.
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29848>
This fixes corruption of push constants on Xe2 due to a mismatch in
the uniform layout implemented by the compiler and assumed by the
driver. To fix it we need to align the push constant ranges computed
by the Vulkan driver to a multiple of the GRF size of the platform.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29926>
This implements a replacement for the previous implementation of
nir_intrinsic_load_barycentric_at_sample that relied on the Pixel
Interpolator shared function, since it's going to be removed from the
hardware from Xe2 onwards.
This implementation simply looks up the X/Y offsets of each sample
index on the table provided in the PS thread payload by using indirect
addressing, then does the actual interpolation by recursing into
emit_pixel_interpolater_alu_at_offset() introduced in the previous
commit.
Note that even though this is only immediately useful on Xe2+
platforms there's no reason why it shouldn't work on earlier
platforms, as long as we have the sample X/Y offsets available in the
thread payload.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
This implements a replacement for the previous implementation of
nir_intrinsic_load_barycentric_at_offset that relied on the Pixel
Interpolator shared function, since it's going to be removed from the
hardware from Xe2 onwards.
That's okay since we can get all the primitive setup information
needed for interpolation at an arbitrary coordinate: We use the X/Y
offset relative to the "X/Y Start" coordinates from the thread payload
order to evaluate the plane equations also provided in the thread
payload for each barycentric coordinate of each polygon. The
evaluation of the barycentric plane equations (and the RHW plane
equation for perspective-correct interpolation) uses the accumulator
and MAD/MAC for ALU efficiency, but that means we need to manually
split instructions to fit the width of the accumulator. The division
and scaling for perspective-correct interpolation is also now done in
the shader if necessary.
Note that even though this is only immediately useful on Xe2+, the
thread payload numbers are filled out for older platforms, and the EU
restrictions of previous Xe platforms are taken into account, mostly
for the purposes of testing and performance evaluation.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
Floating-point offsets work fine in combination with the
floating-point arithmetic we're about to lower these intrinsics into,
and they require less instructions than converting to fixed-point and
then back. No reason to take the precision/range hit nor the extra
instructions.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
Plumb the prog_data bits recently introduced for ALU-based
interpolation down to 3DSTATE_PS_EXTRA emission in the Vulkan driver.
Even though this is only going to be used on Xe2+ for now there seems
to be no reason not to plumb the bits on all platforms back to gfx11,
since the 3DSTATE_PS_EXTRA enables already existed on ICL.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
The ALU-based implementation of the barycentric interpolation
intrinsics introduced by a subsequent commit will require some
primitive setup information not delivered in the PS thread payload
unless explicitly requested:
- "Source Depth and/or W Attribute Vertex Deltas" if a
perspective-correct interpolation mode is used -- Note that this is
already requested for CPS interpolation, we just need to enable it
in more cases.
- "Perspective Bary Planes" if a perspective-correct interpolation
mode is used.
- "Non-Perspective Bary Planes" if a non-perspective-corrected
interpolation mode is used.
- "Sample offsets" if any at_sample interpolation is used so the
coordinate offsets of the sample can be calculated.
This ALU implementation of barycentric interpolation will only be
needed for *_at_offset and *_at_sample interpolation, since the fixed
function hardware still computes barycentrics for us at the current
sample coordinates, only the cases that previously relied on the Pixel
Interpolator shared function need to be re-implemented with ALU
instructions, since that shared function will no longer exist on Xe2
hardware.
Thanks to Rohan for a bugfix of the uses_sample_offsets calculation,
this patch includes his fix squashed in.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29847>
This has been required from the kernel for quite some time, but it
wasn't (and technically still isn't) explicitly checked. Commit
7da5b1caef changed the code paths such that an assertion is hit when
I915_PARAM_HAS_EXEC_TIMELINE_FENCES is not available.
Fixes: 7da5b1caef ("anv: move trtt submissions over to the anv_async_submit")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29920>