Extended math instructions are now synchronized as in-order
instructions like other ALU operations, which is more efficient than
the out-of-order tracking we had to do in previous generations, and
avoids false dependencies introduced due to SBID aliasing.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
Xe2 hardware has a "long" EU pipeline specifically for FP64
instructions, so these are handled as in-order instructions which
require RegDist synchronization. 64-bit integer instructions are now
handled by the normal integer pipeline, so the existing special-casing
inherited from ATS needs to be disabled.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
This implements the extended 10-bit encoding of the software
scoreboard information used by Xe2 platforms. The new encoding is
different enough that there are few opportunities for sharing code
during translation to machine code, but the high-level tgl_swsb
representation remains roughly the same.
Among other changes the 10-bit SWSB format provides 5 bits worth of
SBID tokens (though they're only usable in large GRF mode) instead of
4 bits, the extended math pipeline is handled as an in-order (RegDist)
pipeline instead of as an out-of-order one, and the dual-argument
encodings support additional combinations of RegDist and SBID
synchronization modes. A new encoding is introduced for preventing
the accumulator hardware scoreboard from being updated, but this is
currently not needed.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25514>
The workaround applies specifically to Cube and Cube Arrays, so we can
still apply the optimization for the others.
Ideally we would like to pull opt_zero_samples logic into the lowering
sends -- to avoid adding a bit to communicate between passes. However
the texture coordinates for the LOGICAL backend instructions, which
are a common target for the optimization, are combined into offsets over
a single VGRF, so we can't easily identify the constant cases. The
copy-prop pass make this more visible for opt_zero_samples.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
Inadvertently, because of a sequence of changes elsewhere, this pass
ended up not having any effect:
- Before Gfx5 the optimization is not applicable.
- On Gfx5-6 it doesn't apply because it sampler operations don't
currently use LOAD_PAYLOAD, but write the MOVs directly. Not clear to
me whether they ever did.
- On Gfx7+ it doesn't apply anymore because now the logical sampler
operations are now lowered directly to SENDs, and the is_tex() check
would skip them.
Since the LOAD_PAYLOAD implementation applies for Gfx7+ only, rework the
pass to work again by handling SEND instructions. To make the pass
easier, the optimization will happen before opt_split_sends() so only
one LOAD_PAYLOAD needs to be cared for.
Update the code to accept BAD_FILE sources in addition to zeros, these
are added in some cases as padding and effectively are don't care
values, so we can assume them zeros.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
This is a preparation to (re-)enable opt_zero_samples(), which will reduce
a SEND mlen before we split it. When that happen, opt_split_sends()
won't be able to rely on the fact that mlen covers the entire
LOAD_PAYLOAD.
Since we are changing that, take the opportunity to also not modify the
existing LOAD_PAYLOAD, just create two new ones with the exact set of
sources. This allows the pass to be further simplified by iterating
forward and not require live_variables analysis.
The helper function was added so can be used later for
opt_zero_samples().
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25742>
This app is generating viewports with scale[0]==0, so that is not a good
condition for testing viewport validity. It would result in skipping
the only viewport, and ending up with gb x/y being ~0. Triggering an
assert in the register builder.
The main reason this was done previously was to avoid an assert in
fd_calc_guardband(). Lets just flip it around and return 0x1ff on
errors instead of asserting. This also makes it more consistent with
the other error cases.
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7628
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26086>
util_fast_urem32 is used in the hot path of hashmap lookups and this
asserts causes noticeable overhead. The correctness of this code should
be well exercised both from testing and mathematical proofs, so gate
this assertion behind #ifdef DEBUG.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/14168>
These are needed by RADV to enable mesh/task shader queries.
My last attempt was broken, for obscur reasons I used invalid hashes
and the dEQP build script didn't reject them. Hopefully now it should
fail if a hash is invalid.
The dEQP list changes introduced even more failures with some drivers.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26079>
We already print all the detected target jobs from regex and its
dependencies. But for more complex regexes the list can be cumbersome,
and an aggregate list of dependencies and targets can be more value, so
add these prints as well.
This is what looks like:
```
Running 10 dependency jobs:
alpine/x86_64_lava_ssh_client, clang-format, debian-arm64,
debian-testing, debian/arm64_build, debian/x86_64_build,
debian/x86_64_build-base, kernel+rootfs_arm64, kernel+rootfs_x86_64,
rustfmt
Running 15 target jobs:
a618_gl 1/4, a660_gl 1/2, intel-tgl-skqp, iris-amly-egl, iris-apl-deqp
1/3, iris-cml-deqp 1/4, iris-glk-deqp 1/2, iris-kbl-deqp 1/3,
lima-mali450-deqp:arm64, lima-mali450-piglit:arm64 1/2,
panfrost-g52-gl:arm64 1/3, panfrost-g72-gl:arm64 1/3,
panfrost-t720-gles2:arm64, panfrost-t860-egl:arm64, zink-anv-tgl
```
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25940>
Modify the GraphQL query used to fetch all jobs within a pipeline,
transitioning from fetching data via stage nodes to a direct job
retrieval approach.
The prior method was not paginated, potentially overloading the server
and complicating result parsing due to the structure of stage nodes. The
new approach simplifies data interpretation and handles job lists
exceeding 100 elements by implementing pagination with helper functions
to concatenate paginated results.
- Transitioned from extracting jobs from stage nodes to a direct query
for all jobs in the pipeline, improving data readability and server
performance.
- With the enhanced data clarity from the updated query, removed the
Dag+JobMetadata tuple as it's now redundant. The refined query
provides a more comprehensive job data including job name, stage, and
dependencies.
- The previous graph query relied on a graph node that will (or should)
be paginated anyway.
Closes: #10050
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25940>
Make query support pagination by supplying the paginated key.
In the following toy example, the paginated key is:
["levels", "cars"]
```graphql
query vehicle_store($location: ID!) {
levels {
cars {
pageInfo {
hasNextPage
endCursor
}
...
}
}
}
```
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25940>
For some reason AIOHTTPTransport started to use MultiDict after doing
some adjustments in the GraphQL query, which made `filecache` fail
because MultiDict object are not pickle-able.
Changing the transport strategy from AIOHTTPTransport to
RequestsHTTPTransport, which dropped one requirement. We aren't doing
async anyway, all the calls were sync before.
Signed-off-by: Guilherme Gallo <guilherme.gallo@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25940>
Next patches will make of intel_device_info_pat_entry parameters other
than index, so here adding a function to return it.
While at it also renaming and adjusting parameter of
iris_pat_index_for_bo_flags() to match other functions in
iris_bufmgr.c/h.
No changes in behavior expected here.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
When anv_device_map_bo() is called from anv_device_alloc_bo() it gets
VkMemoryPropertyFlags set to 0 so it ends up with a write-combine
caching for integrated platforms with LLC, see 'if (!(property_flags &
VK_MEMORY_PROPERTY_HOST_CACHED_BIT)))'.
Current approach also has issues when mapping with anv_MapMemory2KHR()
as it would not have information to know that BO is a scanout.
It was also not properly calculating mmap mode for platforms with PAT
uAPI before "anv: Change default PAT entry to WC".
So here storing alloc_flags to anv_bo so there is no mismatches
between different code paths then using it to properly
calculate the mmap mode.
alloc_flags in anv_bo will also be used to calculate PAT index in
future patches.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
i915 mmap_calc_flags() is calculating WC caching for all MTL memory
types.
It will be fixed in the next patch but doing so causes tests to
fail due to incoherency in BOs not allocated with
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT.
So here switching the default/non-coherent BO allocation to a WC
PAT entry.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
Integrated GPUs almost always works with write-back caching(only
scanout and external bos works in write-combine) but in platforms
without LLC the coherency is broken if not explict asked to KMD.
vkFlushMappedMemoryRanges and vkInvalidateMappedMemoryRanges()
don't do any flushing or invalidate for memory allocated with
VK_MEMORY_PROPERTY_HOST_COHERENT_BIT.
So if an application asked for a memory coherent, the
ANV_BO_ALLOC_SNOOPED flag needs to be set in alloc_flags and that
will be passed to KMD backends to properly ask to KMD for coherent
buffer.
The other chunk here removes the assert(alloc_flags & ANV_BO_ALLOC_MAPPED),
that is needed otherwise application can't ask for a coherent and
mapped memory.
Tried to find a reason for that assert in git history but did not
found what was the reasoning of this assert.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>
mmap mode information will be used to properly calculate the mmap flags
in the i915 mmap uAPI and also will be used for BO creation when the
PAT uAPI lands in Xe KMD.
Xe KMD will also require the coherency mode during the BO creation.
So to avoid information duplication, adding this information to
intel_device_info platform entries.
No changes in behavior here.
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26099>