Once NIR code is lowered and a few optimization passes have run, there
might be flag register interactions between instructions quite far
away from one another.
In the following case :
f0 = and r0, r1
...
fs_interpolate r2, r3
...
if f0
...
endif
If we lower fs_inteporlate while using the f0 register, we completely
garble the value meant for the if block.
To fix this, emit the predication for fs_interpolate in brw_fs_nir.cpp
when doing the NIR translation to the backend IR. This will guarantee
that the flag register interactions are visible to the optimization
passes, avoiding the problem above.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 68027bd38e ("intel/fs: implement dynamic interpolation mode for dynamic persample shaders")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9757
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26306>
We don't care about patterns like
loop {
...
if (...) {
break;
} else {
...
}
...
}
In that case, we don't need to sync after the if because there's nothing
to re-converge. Every path except one will end up breaking out of it
anyway.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26382>
`\$` is interpreted before being passed to `re.search()`, but luckily
for us the escape is also invalid and because of that, python 3.12+
warns us about it.
Use a raw string instead, so that the `\` is passed untouched to
`re.search()`.
Fixes: aa04b47c6e ("intel/perf: add support for GtSlice/GtSliceXDualsubsliceY variables")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26355>
This is required in order to return the correct value for
`gbm_dri_bo_get_offset()` for e.g. the second plane of a NV12 image.
Use the newly introduced `util_resource` helper and, while on it, also
add support for `gbm_bo_get_plane_count()`.
Cc: mesa-stable
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26283>
This is required in order to return the correct value for
`gbm_dri_bo_get_offset()` for e.g. the second plane of a NV12 image.
Use the newly introduced `util_resource` helper and, while on it, also
add support for `gbm_bo_get_plane_count()`.
Cc: mesa-stable
Reviewed-by: Juan A. Suarez <jasuarez@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/26283>
When allocating 2MB chunks of memory, the kernel can use 64K pages if
both the virtual and physical addresses are aligned to 2MB. While we
can't control the physical allocation, we can ensure the virtual address
we use with softpin meets the requirements.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
16K was apparently a little unrealistic - Unigine Superposition has
individual shaders that are larger than 16K. Yikes. Moving to 64K
also puts shaders into the same cache bucket as other allocations.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
On non-LLC systems, most system memory isn't coherent between the CPU
and GPU by default. However, we can enable snooping or 1-way coherency
at a performance cost. In the old days, we maintained one set of cache
buckets and toggled coherency as needed via I915_GEM_SET_CACHING. But
in the modern uAPI (GEM_CREATE_EXT_SET_PAT) we have to select coherency
up front at BO creation time. So this doesn't work out well anymore.
This patch splits system memory into two distinct heaps:
- IRIS_HEAP_SYSTEM_MEMORY_CACHED_COHERENT
- IRIS_HEAP_SYSTEM_MEMORY_UNCACHED
The flags_to_heap() function will select a heap for a given allocation
based on the coherency/scanout requirements, as well as the hardware
configuration (LLC integrated, non-LLC integrated, or discrete card).
Once a heap is selected, it completely determines the cacheability and
coherency settings. A given heap will always have the same mmap mode
and PAT index. This enables us to simplify a lot of code.
Because each heap also has its own bucket cache, we no longer have to
disable bucket caching based on flags, cacheability, coherency, mmap
modes, or so on, as all BOs in each cache have matching settings.
This effectively enables bucket-caching for non-LLC systems.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>
Originally we only had a single bucket cache, and it was the main
allocation mechanism. Later on, we added one for device-local memory,
and then a third one for device-local-preferred. Each time, we just
copy and pasted the fields, keeping them all as direct members of the
bufmgr struct. This is getting a bit unwieldy.
This patch introduces an iris_bucket_cache structure to contain the
list of buckets and the number of buckets. It then replaces the three
inline copies with a bufmgr->bucket_cache[heap] array. This lets us
drop a bunch of copy and pasted code in favor of a loop over heaps.
This will also make it easier to add more heaps.
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25447>