In the C23 standard unreachable() is now a predefined function-like
macro in <stddef.h>
See https://android.googlesource.com/platform/bionic/+/HEAD/docs/c23.md#is-now-a-predefined-function_like-macro-in
And this causes build errors when building for C23:
-----------------------------------------------------------------------
In file included from ../src/util/log.h:30,
from ../src/util/log.c:30:
../src/util/macros.h:123:9: warning: "unreachable" redefined
123 | #define unreachable(str) \
| ^~~~~~~~~~~
In file included from ../src/util/macros.h:31:
/usr/lib/gcc/x86_64-linux-gnu/14/include/stddef.h:456:9: note: this is the location of the previous definition
456 | #define unreachable() (__builtin_unreachable ())
| ^~~~~~~~~~~
-----------------------------------------------------------------------
So don't redefine it with the same name, but use the name UNREACHABLE()
to also signify it's a macro.
Using a different name also makes sense because the behavior of the
macro was extending the one of __builtin_unreachable() anyway, and it
also had a different signature, accepting one argument, compared to the
standard unreachable() with no arguments.
This change improves the chances of building mesa with the C23 standard,
which for instance is the default in recent AOSP versions.
All the instances of the macro, including the definition, were updated
with the following command line:
git grep -l '[^_]unreachable(' -- "src/**" | sort | uniq | \
while read file; \
do \
sed -e 's/\([^_]\)unreachable(/\1UNREACHABLE(/g' -i "$file"; \
done && \
sed -e 's/#undef unreachable/#undef UNREACHABLE/g' -i src/intel/isl/isl_aux_info.c
Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36437>
Tested with:
commit 3a252ff9d8b6dc22b20463bfcb31a4e8992b0e8f
Merge: 9800bf6fae3b 11895f375939
Author: Simona Vetter <simona.vetter@ffwll.ch>
Date: Fri Jul 11 11:25:34 2025 +0200
Note that the kernel treats WCL similar to PTL, so 94de1dfd4729
("drm/xe/ptl: Drop force_probe requirement") also removed the
force_probe for WCL.
Backport-to: 25.1
Ref: 3c0f211bc8fc ("drm/xe: Add Wildcat Lake device IDs to PTL list")
Ref: 94de1dfd4729 ("drm/xe/ptl: Drop force_probe requirement")
Ref: drm/drm-next 3a252ff9d8b6dc22b20463bfcb31a4e8992b0e8f
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36148>
We will move this check into the `intel_dev_info` tool. Unfortunately,
this means we will be much less likely to notice inconsistencies, but
the current strategy has proven to be far too noisy.
For example, if the driver was built in debug mode, then when test
suites are running thousands of tests, the current approach can lead
to thousands of messages being printed.
Closes: mesa/mesa#12141
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/34243>
This was only needed on Sandybridge. We can delete the brw code,
and replace the generic devinfo bit with a helper inside the elk
compiler itself.
Thanks to Iván Briano for noticing we still had dead brw code for this.
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
We've been programming our minimum number of URB entries for geometry
shaders to 2, but it appears that we should have been setting 8 on
Broadwell and later. Additionally, there's a workaround on Skylake
and later that requires us to add flushing (which we haven't) or use
a minimum of 16 URB entries.
This alone will not fix anything, as nothing reads this devinfo field
presently (will be fixed in the next commit).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
As we added new platforms, the device info macros evolved over time
Most platforms had a "FEATURES" macro, some had a "HW_INFO" macro,
a few had macros for URB entries - some with min entries only, some
with min and max, some including the .urb = { ... } braces, others not.
Thread counts or subslice info was sometimes considered FEATURES,
sometimes HW_INFO, sometimes inserted only in the final structure.
FEATURES macros often inherited from an ancestor platform, but not
necessarily the prior platform - many were based on GFX8_FEATURES.
Many redundantly set the same feature bits as prior platforms.
This patch aims to clean up the situation, so it's a little more
organized, especially if you look at multiple generations. Macros
are now split into several separate pieces:
1. The FEATURES macro only has architectural features, such as LSC,
ray tracing support, 64-bit integers, flat CCS, and so on. Thread
counts, subslice info, and URB sizes that may vary by SKU are not
included here. This makes it easy for one platform to inherit the
features from the previous, while not pulling in that extra data.
2. THREAD_COUNTS macros contain maximum thread counts from the
3DSTATE_VS documentation and so on.
3. URB_MIN_MAX_ENTRIES macros contain the entire URB configuration,
including .urb = { ... }.
4. PAT_ENTRIES macros (on modern platforms) contains our choice of which
PAT entries to use for various types of resources.
5. CONFIG macros combine all of the above into a tidy bundle for use
in defining various structures, and may also include the platform
macro or simulator ID for convenience.
On recent platforms where hwconfig tables exist, items #2-3 could
potentially be dropped and filled in from there instead. For XEHP+
where we require hwconfig, we instead have a PLACEHOLDER_THREADS_AND_URB
macro that makes it clear that these values are updated from hwconfig.
One nice thing is that the bits that could (or do) come from hwconfig
tables are now cleanly separate from those that do not (i.e. platform
feature support, PAT entry selection, and so on).
This patch does not touch GFX7 or earlier macros. We could probably
offer a similar treatment there, but they're generally working and not
quite as complex.
To verify that this commit does not have unintentional changes, I
recommend running
objdump -s build/src/intel/dev/libintel_dev.a.p/intel_device_info.c.o
before and after this commit, and diffing the output. The devinfo
structures produced are identical.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
intel_device_info_init_common calculates this for Gfx9+ based on
max_threads_per_psd and slice information. Mark it as zero in the
structures to make clear that the value there isn't useful, and make
it easier to diff binaries for the next commit's refactors.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
The documentation for 3DSTATE_URB_HS has 0 as the minimum number of HS
URB entries for all platforms. See BSpecs 32162, 47137, 56271 for
Gfx6-11, Xe, and Xe2-3, respectively.
This should silence warnings about our device info field not matching
the hwconfig tables.
Notably, nothing in our drivers currently uses this value so it cannot
have a functional impact.
Fixes: 4064b5546b ("intel/dev: reduce warning noise from urb settings")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33764>
Rather than checking hwconfig items when using them, wait until after
devinfo has been fully initialized. This includes having workarounds
implemented.
We can then check if the hwconfig data and final Mesa initialization
agree. If the match fails, we need to investigate if Mesa or the
hwconfig data is wrong.
This code becomes a no-op when not on a release build.
Fixes: a4c5bfd34c ("intel/dev: Use hwconfig for urb min/max entry values")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12141
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32359>
Before the patch, intel_device_info_get_max_preferred_slm_size()
returns values in kilobytes, but then
intel_device_info_get_max_slm_size() is multiplying it by 1024.
As a result, LNL is reporting maxComputeSharedMemorySize to be
134217728, which is 128mb.
Fix this by making intel_device_info_get_max_slm_size() not multiply
it by 1024.
This should fix at least the following dEQP tests:
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.1
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.128
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.16
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.2
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.4
dEQP-VK.compute.pipeline.zero_initialize_workgroup_memory.max_workgroup_memory.64
Some tests were failing with:
deqp-vk: ../../src/intel/common/intel_compute_slm.c:24: slm_encode_lookup: Assertion `kbytes <= table[table_len - 1].size_in_kb' failed.
while other tests were triggering the OOM.
v2:
- Make everybody return sizes in bytes (José).
v3:
- Rename variable to bytes (José, Jordan).
Fixes: fd368f5521 ("anv: Set maxComputeSharedMemorySize value for Xe2 platforms")
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30541>
Gfx 12.5 moved scratch to a surface and SURFTYPE_SCRATCH has this pitch
restriction:
RENDER_SURFACE_STATE::Surface Pitch
For surfaces of type SURFTYPE_SCRATCH, valid range of pitch is:
[63,262143] -> [64B, 256KB]
The pitch of the surface is the scratch size per thread and the surface
should be large enough to accommodate every physical thread.
So here adding a new field to intel_device_info, setting it in
intel_device_info_init_common() so even offline tools can have it set.
And finally adding a check to fail shader compilation if needed
scratch is larger than supported.
This issue can be reproduced in debug builds when running
dEQP-VK.protected_memory.stack.stacksize_1024 on Gfx 12.5 or newer
platforms.
Ref: BSpec 43862 (r52666)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30271>
So for this entry we want the CPU mapping to be WC but GPU caches
can be WB.
This way GPU don't need to snoop to CPU caches and at the end of
workloads L3 cache is flushed, so CPU access is coherent after get
the signal that workload was finished.
With this the transient(XD) L3 flushes will only affect displayable
buffers.
Ref: Bspec 71582 (r59285)
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Jianxun Zhang <jianxun.zhang@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29950>
Even though the hardware does not naively support these configurations,
there are many potential benefits to advertising them. These
configurations can theoretically use half the memory bandwidth for loads
and stores. For large matrices, that can be the limiting in performance.
The current implementation, however, has a number of significant
problems.
The conversion from float16 to float32 is performed in the driver during
conversion from NIR. As a result, many common usage patterns end up
doing back-to-back conversions to and from float16 between matrix
multiplications (when the result of one multiplication is used as the
accumulator for the next).
The float16 version of the matrix waste half the possible register
space. Each float16 value sits alone in a dword. This is done so that
the per-invocation slice of an 8x8 float16 result matrix and an 8x8
float32 result matrix will have the same number of elements. This makes
it possible to do straightforward implementations of all the unary_op
type conversions in NIR.
It would be possible to perform N:M element type conversions in the
backend using specialized NIR intrinsics. However, per #10961, this
would be very, very painful. My hope is that, once a suitable resolution
for that issue can be found, support for these configs can be restored.
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/28834>