We will later use the devinfo from iris_bufmgr, where we don't have
access to the screen pointer. And since we are moving it, we can reuse
it in Anv and i965.
v2: return error code and check for it on Anv (Lionel).
v3: Remove anv_gem_get_aperture() from anv_private.h and stubs (Lionel).
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/5043>
On all Gen7+ platforms except DG1, the URB is a subsection of the
configurable L3 cache, and so the size can vary. The size listed
in the documentation on those platforms is an "example size", picked
by calculating it based on an arbitrarily chosen L3 config.
Hardcoding a value for those platforms provides no value and only
confuses people trying to fill out these tables when doing hardware
enabling. anv and iris never use this field. i965 uses it to
initialize brw->urb.size, but then updates that in update_urb_size()
to be the correct value, so the initial value doesn't matter.
Delete the values for Gen7+ and update the comment accordingly.
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4969>
Avoids surprises where you set an OVERRIDE but it gets ignored and the
system PCI ID is used.
Also fixes the bug that the error of invalid platform name being
printed too early, even when the passed platform was a PCI ID (which
is also supported).
For the case where euid != uid, a warning was added but the behavior
wasn't changed: it is still going to fallback to system PCI ID.
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/4841>
This is useful in order to identify codegen issues caused by SIMD32.
It doesn't currently have any effect on compute shaders since SIMD32
dispatch is only enabled for CS when it's strictly necessary to do so
in order to support the workgroup size requested for the shader --
That might change in the future though when we hook up the SIMD32
heuristic to CS compilation.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Also removing the FIXME comments after matching the numbers with
updated documentation.
Signed-off-by: Anuj Phogat <anuj.phogat@gmail.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
The existing "fallback" code didn't actually do anything, so this
removes it, and instead we just always fallback to `iris` for future
PCI IDs.
Suggested-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Eric Engestrom <eric.engestrom@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
We're missing the offset of the slice in the subslice mask...
This worked for most platforms that don't have first slice fused off
because we would reread the same mask from slice0 again and again...
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c1900f5b0f ("intel: devinfo: add helper functions to fill fusing masks values")
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1869
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
i915 will report ENODEV on generations prior to Haswell because there
is no point in reporting values on those. This is prior any fusing
could happen on parts with identical PCI ids.
This query call was previously only triggered on generations that
support performance queries, which happens to match generation for
which i915 reports topology, but the commit pointed below started
using it on all generations.
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Gitlab: https://gitlab.freedesktop.org/mesa/mesa/issues/1860
Cc: <mesa-stable@lists.freedesktop.org>
Fixes: 96e1c945f2 ("i965: Move device info initialization to common code")
Reviewed-by: Mark Janes <mark.a.janes@intel.com>
The device info initializer makes several fuctions internal:
- handling of device override
- updating topology from kernel information
The implementation file is slightly reordered due to the renamed
functions being static.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Rename the original device info initialization routine so callers
don't mistakenly call the wrong one:
gen_get_device_info_from_fd:
Queries kernel for full device info, including topology
details.
gen_get_device_info_from_pci_id:
Partially initializes device info based on PCI ID lookup, when
the kernel is not available.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
With perf queries, initializing the device info is much more complex
than just getting a PCI ID and calling gen_get_device_info. This commit
adds a new gen_get_device_info_from_fd helper in common code which does
all of the requisite kernel queries to get device info including all of
the topology information.
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
add a new attribute, 'disable_ccs_repack' to gen_device info, which
indicates whether repacking of components in certain pixel formats
before compression needs to be disabled to keep the compatibility
with decompression capability of display controller (gen11+)
Signed-off-by: Dongwon Kim <dongwon.kim@intel.com>
Reviewed-by: Anuj Phogat <anuj.phogat@gmail.com>
This prints a log of every PIPE_CONTROL flush we emit, noting which bits
were set, and also the reason for the flush. That way we can see which
are caused by hardware workarounds, render-to-texture, buffer updates,
and so on. It should make it easier to determine whether we're doing
too many flushes and why.
Our tessellation control shaders can be dispatched in several modes.
- SINGLE_PATCH (Gen7+) processes a single patch per thread, with each
channel corresponding to a different patch vertex. PATCHLIST_N will
launch (N / 8) threads. If N is less than 8, some channels will be
disabled, leaving some untapped hardware capabilities. Conditionals
based on gl_InvocationID are non-uniform, which means that they'll
often have to execute both paths. However, if there are fewer than
8 vertices, all invocations will happen within a single thread, so
barriers can become no-ops, which is nice. We also burn a maximum
of 4 registers for ICP handles, so we can compile without regard for
the value of N. It also works in all cases.
- DUAL_PATCH mode processes up to two patches at a time, where the first
four channels come from patch 1, and the second group of four come
from patch 2. This tries to provide better EU utilization for small
patches (N <= 4). It cannot be used in all cases.
- 8_PATCH mode processes 8 patches at a time, with a thread launched per
vertex in the patch. Each channel corresponds to the same vertex, but
in each of the 8 patches. This utilizes all channels even for small
patches. It also makes conditions on gl_InvocationID uniform, leading
to proper jumps. Barriers, unfortunately, become real. Worse, for
PATCHLIST_N, the thread payload burns N registers for ICP handles.
This can burn up to 32 registers, or 1/4 of our register file, for
URB handles. For Vulkan (and DX), we know the number of vertices at
compile time, so we can limit the amount of waste. In GL, the patch
dimension is dynamic state, so we either would have to waste all 32
(not reasonable) or guess (badly) and recompile. This is unfortunate.
Because we can only spawn 16 thread instances, we can only use this
mode for PATCHLIST_16 and smaller. The rest must use SINGLE_PATCH.
This patch implements the new 8_PATCH TCS mode, but leaves us using
SINGLE_PATCH by default. A new INTEL_DEBUG=tcs8 flag will switch to
using 8_PATCH mode for testing and benchmarking purposes. We may
want to consider using 8_PATCH mode in Vulkan in some cases.
The data I've seen shows that 8_PATCH mode can be more efficient in
some cases, but SINGLE_PATCH mode (the one we use today) is faster
in other cases. Ultimately, the TES matters much more than the TCS
for performance, so the decision may not matter much.
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>