AlexIndustrial/mesa

Author	SHA1	Message	Date
Mario Kleiner	24094ee03d	vulkan/wsi/display: Reset connector state in vkReleaseDisplay(). If an application was transitioning out of fullscreen exclusive display mode, the wsi_display_connector->active state was not reset in vkReleaseDisplay() from fullscreen. When the app then later tried to go to fullscreen display mode again on the same display output with the same video mode, this caused _wsi_display_queue_next() to skip a required drmModeSetCrtc() during the first vkQueuePresent() after entering direct display mode. While this often worked by pure luck on a single-display setup, it goes sideways on a multi-display setup where the viewport of the associated crtc does not have a (x,y) offset of (0,0). E.g., XOrg/X11 RandR output leasing of an output whose viewport starts at x = 1920: 1. X-Server has RandR outputs viewport at x = 1920, in a shared framebuffer, shared across all crtc's on a X-Screen. 2. Application leases that output for direct display mode, 1st vkQueuePresent() triggers drmModeSetCrtc() of output to (x,y) = 0,0, as required for Vulkan/wsi/direct framebuffer setup. 3. Application does rendering and presenting. 4. Application vkReleaseDisplay() the output, terminates the RandR lease. X-Server takes over again. 5. X-Server modesets to reconfigure output back to viewport with (x,y) = 1920, 0. 6. Application leases same output again later on, and tries vkQueuePresent() again. Because of the bug fixed in this commit, the required drmModeSetCrtc() to (x,y) = 0,0 is erroneously skipped due to the stale cached connector state. 7. drmModePageflip() fails due to the wrong crtc viewport (x,y) = 1920, 0, mismatched for the need of the Vulkan framebuffer of (x,y) = 0,0. Kernel returns -ENOSPACE, Swapchain goes into permanent VK_ERROR_SURFACE_LOST state. Destroying and recreating the swapchain, as recommended by the Vulkan spec for error handling won't help. Game over! Resetting wsi_display_connector->active = false; fixes the problem of wrong / stale connector state and Vulkan/wsi/display clients are happy on multi-display setups again, as tested in various single- and multi-display configurations. This bug affects all Mesa releases with Vulkan/WSI/Display support and should therefore be backported. Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com> Fixes: `352d320a07` ("vulkan: Add EXT_direct_mode_display [v2]") Cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19484>	2022-11-09 17:13:19 +00:00
Karol Herbst	4ca61b5420	rusticl/nir: copy alignment info when lowering kernel input loads Signed-off-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19614>	2022-11-09 16:39:26 +00:00
Alyssa Rosenzweig	fd0af2bb4d	panfrost: DRY buffer range special case Pattern from iris. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19576>	2022-11-09 15:56:20 +00:00
Alyssa Rosenzweig	f8553ef44c	panfrost: Remove out-of-band CRC support Without additional signalling of modifiers, CRCs cannot possibly in a correct way work across process boundaries. Since we don't do that signalling, we should not be allocating private CRCs for imported resources, and we should not be using our own private CRCs for internal resources. The entire out-of-bands CRC infrastructure is a hack to let us do CRCs even for imported/exported BOs, but that can't possibly work. Remove it, and remove a pile of special cases across the driver. Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19576>	2022-11-09 15:56:20 +00:00
Alyssa Rosenzweig	cf7a3906b0	panfrost: Copy resources when necessary If the map doesn't set MAP_DISCARD_RANGE, we do have to copy the existing contents over. MAP_WRITE on its only gives permission to replace the contents, unfortunately it does not require that the application actually do so. Closes: #7640 Fixes: `0b26a9f773` ("panfrost: Don't copy resources if replaced") Signed-off-by: Alyssa Rosenzweig <alyssa@collabora.com> Reported-by: Roman Elshin Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19576>	2022-11-09 15:56:20 +00:00
Samuel Pitoiset	59cc628c06	radv: use radv_max_descriptor_set_size() for Vulkan 1.2 properties Instead of copying this limit entirely. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19598>	2022-11-09 15:16:01 +00:00
Martin Roukala (né Peres)	560b327696	radv/ci: add more subtests to VanGogh's flakes list Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19591>	2022-11-09 12:18:04 +00:00
Konstantin Seurer	35d0d30a0e	radv/rra: Fix node type validation Silly mistake... Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19584>	2022-11-09 09:16:15 +00:00
Caio Oliveira	8ab628ab2e	nir: Don't reorder volatile intrinsics Fixes issue with "is helper invocation" that in recent SPIR-V is mapped to a volatile Load. The CSE was catching the loads before they were transformed in the new is_helper_invocation intrinsic (that is not reorderable). Fixes: `729df14e45` ("nir: Handle volatile semantics for loading HelperInvocation builtin") Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: M Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19432>	2022-11-09 06:02:18 +00:00
Chia-I Wu	10b0a5dc34	freedreno/a6xx: set chroma offsets to MIDPOINT Vulkan has VkChromaLocation and all drivers suggest VK_CHROMA_LOCATION_MIDPOINT on Android. The blob also uses MIDPOINT. Based on my limited tests, the image quality is higher with MIDPOINT. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19514>	2022-11-09 05:15:38 +00:00
Chia-I Wu	cbf68450f8	freedreno/a6xx: set CHROMA_LINEAR This seems to have no effect on a618, but restores linear filtering on a635 when the texture is yuv. The blob sets it on a635 as well (but not on a618). Fixed android.media.cts.DecodeAccuracyTest#* on a635. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19514>	2022-11-09 05:15:38 +00:00
Yonggang Luo	d61ac94658	c11: Remove _MTX_INITIALIZER_NP for windows Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18493>	2022-11-09 04:38:28 +00:00
Yonggang Luo	37d79e38e9	egl: Remove the need of _MTX_INITIALIZER_NP by using simple_mtx_t/SIMPLE_MTX_INITIALIZER in egllog.c Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18493>	2022-11-09 04:38:28 +00:00
Yonggang Luo	23e6a4ccda	nir: Remove the need of _MTX_INITIALIZER_NP by using simple_mtx_t/SIMPLE_MTX_INITIALIZER in nir/nir_validate.c Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18493>	2022-11-09 04:38:28 +00:00
Yonggang Luo	e518ff4fd5	glsl: Remove the need of _MTX_INITIALIZER_NP by using simple_mtx_t/SIMPLE_MTX_INITIALIZER Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18493>	2022-11-09 04:38:28 +00:00
Yonggang Luo	db708b7e9c	llvmpipe: Remove the need of _MTX_INITIALIZER_NP by using simple_mtx_t/SIMPLE_MTX_INITIALIZER in lp_texture.c Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18493>	2022-11-09 04:38:28 +00:00
Yonggang Luo	fb979a19b0	vulkan/device-select-layer: Remove the need of call_once by using simple_mtx_t instead mtx_t Function device_select_once_init are removed in-favor of SIMPLE_MTX_INITIALIZER Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Jesse Natalie <jenatali@microsoft.com> Acked-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/18493>	2022-11-09 04:38:28 +00:00
Rob Clark	c0fc8d5046	freedreno/a6xx: Switch to global bcolor buffer Since we expect a limited # of unique border-color entry states, we can use a global table of border-color entries, rather than constructing the state at draw time. This shifts all the border-color overhead from draw time to sampler state CSO creation time. And it's less code! A hashtable is used to map unique border-color table value to entry so multiple usages of what maps to the same table entry all re-use a single slot in the table. This puts an upper bound on the # of unique border- color plus format value. In practice this shouldn't be a problem, we'll just size the table to be large enough to not run into problems with CTS. Note that the border-color table entry is not completely format dependent (mostly just integer vs float dependent), so for example a single color with different float formats can map to a single table entry. This also fixes the problem that we completely ignored border-color for GS/tess stages. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7518 Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19561>	2022-11-09 02:51:17 +00:00
Rob Clark	27b2496bae	freedreno/a6xx: Rename tex cache key/equals fxns We'll need different functions for border-color cache. Prep for next patch. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19561>	2022-11-09 02:51:17 +00:00
Rob Clark	c8cf786976	freedreno/a6xx: Move bcolor entry setup Just code motion, in prep for a following patch. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19561>	2022-11-09 02:51:17 +00:00
Rob Clark	755e3ff0ee	freedreno/ci: Update a5xx expectations These seem to have not been updated in a while. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19561>	2022-11-09 02:51:17 +00:00
Rob Clark	ed9152e2c1	freedreno: Use our border-color quirk This will let us remove our assumption that samplers and views map 1:1, and generally simplify our border-color handling. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19561>	2022-11-09 02:51:17 +00:00
David Heidelberg	26e742c661	ci/bare-metal: remove consolidations leftovers All defined in the baremetal-test-arm* Reviewed-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Emma Anholt <emma@anholt.net> Signed-off-by: David Heidelberg <david.heidelberg@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19548>	2022-11-09 02:23:37 +00:00
Rob Clark	e090e313fa	freedreno/ir3: Reduce compiler thread pool size With the current scheme, looking at game startup which should be the worst case (most heavily loaded) time for the compiler threads, and they seem to be ~10% busy. Furthermore we typically have a mix of "big" and "LITTLE" cores.. with about half being "big". So sizing the thread pool to the half the # of CPU cores seems reasonable. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19387>	2022-11-08 23:36:51 +00:00
Rob Clark	a6e4f8d03f	util/disk_cache: Add some blob cache traces Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19387>	2022-11-08 23:36:51 +00:00
Rob Clark	d831fd40c8	util/disk_cache: Add compression in blob cb path Android's implementation of the blob-cache get/put funcs do not implement any compression. And the default cache size is rather small, at 2MB (!!) per app (although I assume everyone patches android to increase the size limit). We don't bother compressing the has_key/put_key path, since that path is only storing a uint32_t. Signed-off-by: Rob Clark <robdclark@chromium.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19387>	2022-11-08 23:36:51 +00:00
Simon Ser	2fdc3846e7	vulkan/wsi/wayland: return VK_ERROR_NATIVE_WINDOW_IN_USE_KHR If the surface is already in use by another swapchain, return VK_ERROR_NATIVE_WINDOW_IN_USE_KHR. The spec states: > If pCreateInfo->oldSwapchain is VK_NULL_HANDLE, and the native > window referred to by pCreateInfo->surface is already associated > with a Vulkan swapchain, VK_ERROR_NATIVE_WINDOW_IN_USE_KHR must > be returned. Signed-off-by: Simon Ser <contact@emersion.fr> Reviewed-by: Leandro Ribeiro <leandro.ribeiro@collabora.com> Acked-by: Daniel Stone <daniels@collabora.com> References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7467 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19229>	2022-11-08 22:52:41 +00:00
Eric Engestrom	83b1cb936e	vc4: add DRM_VC4_CREATE_SHADER_BO support to drm-shim Signed-off-by: Eric Engestrom <eric@igalia.com> Reviewed-by: Iago Toral Quiroga <itoral@igalia.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19305>	2022-11-08 21:23:27 +00:00
Yusuf Khan	2c5b1d0e3b	nv50/ir: Support fmulz and ffmaz Signed-off-by: Yusuf Khan <yusisamerican@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19333>	2022-11-08 21:10:08 +00:00
Yusuf Khan	47251d2852	nv50/ir: add prefer_nir flag for getting compiler options So that we dont expose certain options for nir_to_tgsi Signed-off-by: Yusuf Khan <yusiamerican@gmail.com> Reviewed-by: Karol Herbst <kherbst@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19333>	2022-11-08 21:10:08 +00:00
Connor Abbott	def56b531c	tu: Support GMEM with layered rendering and multiview It turns out that this actually is supported. GMEM can hold multiple layers which are cleared, loaded, and resolved separately. The stride between layers seems to be implicitly calculated based on the tile size, and we have to match it when blitting to/from GMEM. One tricky thing is that now we may realize that we don't have enough space for GMEM only when computing the tiling config, because we may not know the number of framebuffer layers until we have the framebuffer and too many framebuffer layers will exhaust GMEM. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19505>	2022-11-08 16:35:02 +00:00
Samuel Pitoiset	a9ab53fbe2	radv: stop emulating number of generated primitives by GS on GFX11 According to RadeonSI, only GFX10 and GFX10.3 need to emulate. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19319>	2022-11-08 16:15:16 +00:00
Lionel Landwerlin	97b3dd34c1	anv: fix missing VkPhysicalDeviceExtendedDynamicState3PropertiesEXT handling Fixes: `13c422e1b2` ("anv: toggle on EXT_extended_dynamic_state3") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Jason Ekstrand <jason@jlekstrand.net> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19573>	2022-11-08 15:28:57 +00:00
Tapani Pälli	2a60037523	crocus: enable NV_alpha_to_coverage_dither_control Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19463>	2022-11-08 11:45:46 +00:00
Tapani Pälli	3c84809ca6	iris: enable NV_alpha_to_coverage_dither_control Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Reviewed-by: Adam Jackson <ajax@redhat.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19463>	2022-11-08 11:45:46 +00:00
Samuel Pitoiset	bff6a38ed9	radv: advertise extendedDynamicState3ColorWriteMask Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19589>	2022-11-08 11:04:54 +00:00
Samuel Pitoiset	a92d1d13c5	radv: add support for dynamic color write mask Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19589>	2022-11-08 11:04:54 +00:00
Caio Oliveira	22d8ed84b8	intel/compiler: Remove unused fs_visitor::emit_percomp() Since `7ef7738a61` ("i965: Write gl_FragCoord directly to the destination.") this is not used. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>	2022-11-08 07:33:09 +00:00
Caio Oliveira	90861e6fea	intel/compiler: Remove various unused function declarations Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>	2022-11-08 07:33:08 +00:00
Caio Oliveira	48506a9029	intel/compiler: Remove unused data members Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19586>	2022-11-08 07:33:08 +00:00
Yonggang Luo	7fe5fec747	util: Remove os/os_thread.h and replace #include "os/os_thread.h" with #include "util/u_thread.h" Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19557>	2022-11-08 05:21:42 +00:00
Yonggang Luo	a72d57fe26	util: cleanup os_thread.h __pipe_mutex_assert_locked is not used anymore so remove it from os_thread.h The remove of "pipe/p_compiler.h" caused compiling failure also fixed Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19557>	2022-11-08 05:21:42 +00:00
Yonggang Luo	1129537e4c	util: Move pipe_semaphore to u_thread.h and rename it to util_semaphore Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19557>	2022-11-08 05:21:42 +00:00
Yonggang Luo	b732064f9e	gallium/util: Remove the EMBEDDED_DEVICE macro because nobody use it Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/7641 Signed-off-by: Yonggang Luo <luoyonggang@gmail.com> Acked-by: Jose Fonseca <jfonseca@vmware.com> Acked-by: Roland Scheidegger <sroland@vmware.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19552>	2022-11-08 02:37:20 +00:00
Ian Romanick	9abeb3d739	intel/fs: Optimize integer multiplication of large constants by factoring Many Intel platforms can only perform 32x16 bit multiplication. The straightforward way to implement 32x32 bit multiplications is by splitting one of the operands into high and low parts called H and L, repsectively. The full multiplication can be implemented as: ((A * H) << 16) + (A * L) On Intel platforms, special register accesses can be used to eliminate the shift operation. This results in three instructions and a temporary register for most values. If H or L is 1, then one (or both) of the multiplications will later be eliminated. On some platforms it may be possible to eliminate the multiplication when H is 256. If L is zero (note that H cannot be zero), one of the multiplications will also be eliminated. Instead of splitting the operand into high and low parts, it may possible to factor the operand into two 16-bit factors X and Y. The original multiplication can be replaced with (A * (X * Y)) = ((A * X) * Y). This requires two instructions without a temporary register. I may have gone a bit overboard with optimizing the factorization routine. It was a fun brainteaser, and I couldn't put it down. :) On my 1.3GHz Ice Lake, a standalone test could chug through 1,000,000 randomly selected values in about 5.7 seconds. This is about 9x the performance of the obvious, straightforward implementation that I started with. v2: Drop an unnecessary return. Rearrange logic slightly and rename variables in factor_uint32 to better match the names used in the large comment. Both suggested by Caio. Rearrange logic to avoid possibly using `a` uninitialized. Noticed by Marcin. v3: Use DIV_ROUND_UP instead of open coding it. Noticed by Caio. Tiger Lake, Ice Lake, Haswell, and Ivy Bridge had similar results. (Ice Lake shown) total instructions in shared programs: 19912558 -> 19912526 (<.01%) instructions in affected programs: 3432 -> 3400 (-0.93%) helped: 10 / HURT: 0 total cycles in shared programs: 856413218 -> 856412810 (<.01%) cycles in affected programs: 122032 -> 121624 (-0.33%) helped: 9 / HURT: 0 No shader-db changes on any other Intel platforms. Tiger Lake and Ice Lake had similar results. (Ice Lake shown) Instructions in all programs: 141997227 -> 141996923 (-0.0%) Instructions helped: 71 Cycles in all programs: 9162524757 -> 9162523886 (-0.0%) Cycles helped: 63 Cycles hurt: 5 No fossil-db changes on any other Intel platforms. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Ian Romanick	5ec75ca10d	intel/compiler: Teach signed integer range analysis about imax and imin This is especially helpful for a*isign(a) generated by idiv_by_const optimization. On many GPUs, isign(a) is lowered to imax(imin(a, 1), -1). There are no changes on fossil-db because ANV uses a different optimization path for idiv with a constant denominator. A future MR will change this. NOTE: This commit used to help a few hundred shader-db shaders, but now none are affected. I suspect this is due to some change in the idiv_by_const optimization. This could possibly be dropped. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Ian Romanick	1b0da3a765	intel/compiler: Signed integer range analysis for imul_32x16 generation Only iabs and ineg are treated specially. Everything else just uses nir_unsigned_upper_bound. The special treatment of source modifiers is because they cause problems for nir_unsigned_upper_bound. Once those are peeled off, nir_unsigned_upper_bound can generally produce a tighter bound. Future commits will add more opcodes. This mostly introduces the basic framework. v2: Add a bunch of comments to signed_integer_range_analysis. Re-arrange the code a little to reduce duplication. Both suggested by Caio. Rearrange some logic to simplify things. Suggested by Marcin. Tiger Lake, Ice Lake, Haswell, and Ivy Bridge had similar results. (Ice Lake shown) total instructions in shared programs: 19912894 -> 19912558 (<.01%) instructions in affected programs: 109275 -> 108939 (-0.31%) helped: 74 / HURT: 0 total cycles in shared programs: 856422769 -> 856413218 (<.01%) cycles in affected programs: 15268102 -> 15258551 (-0.06%) helped: 65 / HURT: 4 total fills in shared programs: 8218 -> 8217 (-0.01%) fills in affected programs: 1171 -> 1170 (-0.09%) helped: 1 / HURT: 0 Skylake and Broadwell had similar results. (Skylake shown) total cycles in shared programs: 845145547 -> 845142263 (<.01%) cycles in affected programs: 15261465 -> 15258181 (-0.02%) helped: 65 / HURT: 0 Tiger Lake Tiger Lake Instructions in all programs: 157580768 -> 157579730 (-0.0%) Instructions helped: 312 Instructions hurt: 28 Cycles in all programs: 7566977172 -> 7566967746 (-0.0%) Cycles helped: 288 Cycles hurt: 53 Spills in all programs: 19701 -> 19700 (-0.0%) Spills helped: 2 Spills hurt: 4 Fills in all programs: 33311 -> 33335 (+0.1%) Fills helped: 5 Fills hurt: 4 Ice Lake Instructions in all programs: 141998667 -> 141997227 (-0.0%) Instructions helped: 420 Instructions hurt: 3 Cycles in all programs: 9162565297 -> 9162524757 (-0.0%) Cycles helped: 389 Cycles hurt: 29 Spills in all programs: 19918 -> 19916 (-0.0%) Spills helped: 2 Spills hurt: 3 Fills in all programs: 32795 -> 32814 (+0.1%) Fills helped: 6 Fills hurt: 3 Skylake Instructions in all programs: 132567691 -> 132567745 (+0.0%) Instructions hurt: 24 Cycles in all programs: 8828897462 -> 8828889517 (-0.0%) Cycles helped: 405 Cycles hurt: 6 Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Ian Romanick	f90d71055b	intel/compiler: Add and use a pass to generate imul_32x16 instructions Gfx8 and Gfx9 platforms are helped for cycles because now many instructions like mul(8) g12<1>D g10<8,8,1>D 6D become mul(8) g12<1>D g10<8,8,1>D 6W It is the same number of instructions, but the 32x16 multiply is a little faster. v2: Fix transposed hi and lo in "(hi >= INT16_MIN && lo <= INT16_MAX)". Noticed by Caio. Use nir_src_is_const instead of open coding it. Suggested by Caio. Broadwell and Skylake had similar results. (Skylake shown) total cycles in shared programs: 845748380 -> 845145547 (-0.07%) cycles in affected programs: 446346348 -> 445743515 (-0.14%) helped: 6017 HURT: 0 helped stats (abs) min: 2 max: 7380 x̄: 100.19 x̃: 8 helped stats (rel) min: <.01% max: 3.72% x̄: 0.41% x̃: 0.39% 95% mean confidence interval for cycles value: -113.37 -87.00 95% mean confidence interval for cycles %-change: -0.42% -0.41% Cycles are helped. Skylake Cycles in all programs: 8844820715 -> 8828897462 (-0.2%) Cycles helped: 47914 Cycles hurt: 1 No shader-db or fossil-db changes on any other Intel platform. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Ian Romanick	9479e3a19b	intel/fs: Allow constant copy prop from DW to W This enables copy propagation of mov(8) g5<1>UD 0x00000180UD mul(8) g10<1>D g2.3<0,1,0>D g5<16,8,2>W into mul(8) g10<1>D g2.3<0,1,0>D 180W This is necessary for any optimization passes that generate imul_32x16 instructions. No fossil-db or shader-db changes on any Intel platform. v2: Fix type size check to (src size != 2) \|\| (dest size != 4). It was previously &&. :( This allowed copying constants into UB sources, and that is invalid. v3: Fix incorrect extraction of upper 16-bits of immediate value when subnr=2. Noticed by Caio. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00
Ian Romanick	90d267b2d1	intel/fs: Fix bounds checking for integer multiplication lowering The previous bounds checking would cause mul(8) g121<1>D g120<8,8,1>D 0xec4dD to be lowered to mul(8) g121<1>D g120<8,8,1>D 0xec4dUW mul(8) g41<1>D g120<8,8,1>D 0x0000UW add(8) g121.1<2>UW g121.1<16,8,2>UW g41<16,8,2>UW Instead of picking the bounds (and the new type) based on the old type, pick the new type based on the value only. This helps a few fossil-db shaders in Witcher 3 and Geekbench5. No changes on any other Intel platforms. Tiger Lake Instructions in all programs: 157581069 -> 157580768 (-0.0%) Instructions helped: 24 Cycles in all programs: 7566979620 -> 7566977172 (-0.0%) Cycles helped: 22 Cycles hurt: 4 Ice Lake Instructions in all programs: 141998965 -> 141998667 (-0.0%) Instructions helped: 26 Cycles in all programs: 9162568666 -> 9162565297 (-0.0%) Cycles helped: 24 Cycles hurt: 2 Skylake No changes. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17718>	2022-11-08 00:02:16 +00:00

1 2 3 4 5 ...

150403 Commits