AlexIndustrial/mesa

Author	SHA1	Message	Date
Samuel Pitoiset	ec87f1338f	radv: emit more push shader registers on GFX12 They are supposed to be slightly faster. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37256>	2025-09-11 06:47:40 +00:00
Samuel Pitoiset	9039f33a8d	Revert "radv: handle fbfetch output after binding graphics shaders" This is actually wrong because if radv_handle_fbfetch_output() triggers a decompression pass and graphics shaders (ESO) are saved/restored they won't be updated because radv_bind_graphics_shaders() was called before. This fixes a very recent regression that I noticed while implementing a new extension. This reverts commit `9b912f00c7`. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37194>	2025-09-11 06:22:20 +00:00
Samuel Pitoiset	b69b953973	radv: add RADV_DEBUG=bo_history This dumps the BO history to /tmp/radv_bo_history.log after each BO operations. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37062>	2025-09-11 06:03:15 +00:00
Rob Clark	0fe652971e	freedreno/a6xx: Add missing format Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37279>	2025-09-11 03:08:54 +00:00
Rob Clark	6ab682e5f2	freedreno/blitter: Don't ignore blit swizzle Noticed by inspection. Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37279>	2025-09-11 03:08:54 +00:00
Rob Clark	250dba1dce	freedreno/a6xx: Fallback to original blit in the snorm_copy path Unlike z/s blits, where we want the fallback to use the re-written blit, we don't want this in the handle_snorm_copy_blit() path. Signed-off-by: Rob Clark <rob.clark@oss.qualcomm.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37279>	2025-09-11 03:08:54 +00:00
Caio Oliveira	03e9c01f0c	brw: Add and use more brw_validate.cpp macros Add and use more comparison variants (which provide more detailed print out of the values), remove old references to "fsv" and "scalar", use assertion names more similar to GoogleTest that we already use elsewhere. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37267>	2025-09-10 17:44:38 -07:00
Mel Henning	a9ea4630d4	nak: Make BindlessSSA store [SSAValue; 2] This reduces the size of ir::Src from 40 bytes down to 32 bytes. This makes the size of ir::Op fall from 272 bytes down to 232 bytes, meaning we save 40 bytes per instruction. Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37130>	2025-09-10 22:25:13 +00:00
Mel Henning	8ac9b077b1	nak/assign_regs: Make src_ssa_ref return a slice Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37130>	2025-09-10 22:25:13 +00:00
Mel Henning	d21a4d9e50	nak: impl HasRegFile for SSARef and &[SSAValue] Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37130>	2025-09-10 22:25:13 +00:00
Mel Henning	603d7f9413	nak: Remove Option<> from SSARef::file() return Nothing actually wants to mix register files in a SSARef so in practice no callers really handled the None return case. Panic on that case instead. Reviewed-by: Mary Guillemard <mary@mary.zone> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37130>	2025-09-10 22:25:12 +00:00
Dylan Baker	08a3497223	anv: add assertion that tes and tcs data is non-null It doesn't make any sense ot have TCS but not TES (or vice versa), but coverity doesn't realize that. Add an assertion that they are both non-null before we start reading them. Fixes: `50fd669294` ("anv: prep work for separate tessellation shaders") CID: 1665360 CID: 1665327 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37266>	2025-09-10 18:18:42 +00:00
Dylan Baker	ecfce9f9ad	blorp: Fix potential read of uninitaized elk fields in debug paths The intel_vue_map is only partially initialized before being used. All used fields are initialized, but in debug paths the unitialzed fields will also be read. To fix this initialize the struct to 0. In the brw path this struct is part of the prog_data, and is rzalloc'd. CID: 1665308 Reviewed-by: Iván Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37261>	2025-09-10 17:51:34 +00:00
Dylan Baker	6fe4b7344d	isl: prevent potential overflow before widen Fixes: `73608eb8b7` ("isl: Add support for creating layered surfaces for video encode/decode") CID: 1665354 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37260>	2025-09-10 17:01:40 +00:00
Dylan Baker	f18aca8689	intel/brw: Fix implementaiton of \|= operator for enum The current implementation does nothing, since it has no side effects, only a return value. By passing `x` as a reference we can mutate the value before returning. Fixes: `df37c7ca74` ("brw: fix analysis dirtying with pulled constants") CID: 1665293 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37263>	2025-09-10 16:30:19 +00:00
Dylan Baker	70ebc14de9	anv: avoid potential integer overflow in video address calculation Coverity caught one instance of this, by visual inspection I found another case. Fixes: `3fb25cc78a` ("anv: Add support for creating layered surfaces for video encode/decode") CID: 1665326 Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37262>	2025-09-10 16:06:37 +00:00
Anna Maniscalco	011ba1842e	freedreno/registers: add CP_ALWAYS_ON_CONTEXT Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37237>	2025-09-10 15:10:14 +00:00
Samuel Pitoiset	1da270fb35	radv/amdgpu: add more helpers for managing virtual BOs All these new helpers will make the SMEM PRT workaround better organized. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37193>	2025-09-10 14:50:25 +00:00
Samuel Pitoiset	3c4168a3cc	radv/amdgpu: return OOM device when BO mapping fails It's more appropriate than VK_ERROR_UNKNOWN. Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37193>	2025-09-10 14:50:24 +00:00
Valentine Burley	fd6d285417	zink/ci: Add a new Minecraft restricted trace From @zmike, it exposes a very niche corner case bug in zink. Signed-off-by: Valentine Burley <valentine.burley@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37270>	2025-09-10 14:35:23 +00:00
David Rosca	9d9fc1fe72	radeonsi/vcn: Get rid of PIPE_ALIGN_IN_BLOCK_SIZE Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37143>	2025-09-10 13:27:54 +00:00
David Rosca	8eb84f8854	radeonsi/vcn: Fix calculating QP map region dimensions It needs to be aligned to block size otherwise it would skip last row/column on resolutions like 1080p. Cc: mesa-stable Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37143>	2025-09-10 13:27:54 +00:00
Mike Blumenkrantz	5fefb9e795	zink: flag vertex element state for rebind after vstate draws vstate draws bind their own elements unrelated to the bound gallium elements, so any draw occurring after a vstate draw must rebind to ensure the correct ones are bound Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13570 cc: mesa-stable Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37274>	2025-09-10 13:06:07 +00:00
David Rosca	a03a79aa9d	pipe: Remove PIPE_VIDEO_CAP_PREFERS/SUPPORTS_INTERLACED Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36632>	2025-09-10 12:33:57 +00:00
David Rosca	6954460899	radeonsi/video: Remove support for interlaced buffers This is not used anymore with VDPAU removed. Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36632>	2025-09-10 12:33:57 +00:00
David Rosca	223d3ec433	gallium/vl: Remove now unused filters Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36632>	2025-09-10 12:33:57 +00:00
David Rosca	4b54277d2e	Remove VDPAU VDPAU only supports X11 and GL interop. There is no Wayland or Vulkan interop support. The API has limitations that makes it impossible to correctly decode certain streams. Application support is also very limited, and VAAPI is always a better choice over VDPAU. Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36632>	2025-09-10 12:33:57 +00:00
David Rosca	e7ea1233b1	mesa: Remove NV_vdpau_interop Acked-by: Mike Blumenkrantz <michael.blumenkrantz@gmail.com> Acked-by: Adam Jackson <ajax@redhat.com> Reviewed-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36632>	2025-09-10 12:33:57 +00:00
David Rosca	272bde24a3	ci: Stop building VDPAU driver Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36632>	2025-09-10 12:33:57 +00:00
Mary Guillemard	497005dc18	panvk: Enable SNORM rendering Blending should work properly those days. Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37271>	2025-09-10 12:15:06 +00:00
Mary Guillemard	f707f093ec	panvk: Do not clamp blend constants in command buffer This is wrong for SNORM and this is handled by nir_lower_blend. Signed-off-by: Mary Guillemard <mary.guillemard@collabora.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37271>	2025-09-10 12:15:06 +00:00
Lionel Landwerlin	1646e7d311	anv: run nir_opt_acquire_release_barriers In the middle of writing all this new shader object compile code, this pass got added and I missed adding it to the shader object path. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `d39e443ef8` ("anv: add infrastructure for common vk_pipeline") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37269>	2025-09-10 11:47:05 +00:00
Konstantin Seurer	7c9e945460	radv,vulkan: Avoid a useless barrier in radv_update_bind_pipeline Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36982>	2025-09-10 08:35:50 +00:00
Konstantin Seurer	a35dfab281	radv: Use vk_barrier_compute_w_to_compute_r more vk_barrier_compute_w_to_compute_r shows up in rgp captures and is less code. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36982>	2025-09-10 08:35:50 +00:00
Konstantin Seurer	850f339b89	vulkan: Add more detail to encode debug markers Useful for radv because radv has quite a few different configurations. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36982>	2025-09-10 08:35:50 +00:00
Konstantin Seurer	5c94e20abe	vulkan: Use a struct for debug markers Improves u_trace integation with anv. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36982>	2025-09-10 08:35:50 +00:00
Ella Stanforth	01c7c97ef7	util/tests: Add list iterator tests Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37061>	2025-09-10 07:38:25 +00:00
Ella Stanforth	d943a91b71	util/list: Add iterator debug to more routines. Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37061>	2025-09-10 07:38:24 +00:00
Ella Stanforth	6863223033	util/list: Fix next instruction removal usecase for non safe iterators Introducing this iterator debug information breaks the usecase of removing elements in the list other than the current element. Fixes: `372e83b95f` Reviewed-by: Rob Clark <rob.clark@oss.qualcomm.com> Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37061>	2025-09-10 07:38:24 +00:00
Samuel Pitoiset	c739d836f7	radv: exclude dynamic vertex input stride for the late scissor workaround RADV_DYNAMIC_VERTEX_INPUT_BINDING_STRIDE doesn't emit any context registers, so it can be excluded for the late scissor workaround to avoid re-emitting scissors all the time it's dirty. This fixes a performance regression noticed with Cyberpunk on Vega10, but other games are likely affected too. The late scissor workaround is only applied on Raven/Vega10. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13828 Fixes: `d7f401c2bb` ("radv: bind the vertex binding strides like a normal dynamic state") Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37252>	2025-09-10 07:09:48 +00:00
abdelhadi	3a41644165	aco, radv: remove line duplicate Signed-off-by: abdelhadi <abdelhadims@icloud.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37243>	2025-09-10 06:34:43 +00:00
Lionel Landwerlin	33d2c31d7a	brw: don't use brw_null_reg() for unused SEND sources Just avoiding the validation assert. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `47fe9d28e7` ("brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13777 Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37112>	2025-09-10 09:08:27 +03:00
Timothy Arceri	11a434f3df	glsl: remove now unused NumUniformRemapTable Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36997>	2025-09-10 05:11:47 +00:00
Timothy Arceri	e052254066	glsl: make use of u_range_remap for uniform remapping This will allow ubo buffers to have arrays containing millions of elements without excessive memory use on a remap table. Before this change using the max sized array on radeonsi would result in 1.3GB of memory being used for a remap table in a single shader. There is also a small functional change here, previously if the shader used more than GL_MAX_UNIFORM_BLOCK_SIZE mesa would ignore and allow this as the original ARB_uniform_buffer_object spec stated: "If the amount of storage required for a uniform block exceeds this limit, a program may fail to link." However in OpenGL 4.3 the text was clarified and the "may" was removed so with this change we enforce the max limit. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9953 Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36997>	2025-09-10 05:11:47 +00:00
Timothy Arceri	bf946bccf2	util: add range remap util This util allows a range of values to be remapped to a single pointer. Acked-by: Marek Olšák <marek.olsak@amd.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36997>	2025-09-10 05:11:47 +00:00
Francisco Jerez	5bf7bb5cf9	intel/brw/xe3+: Re-enable static analysis-based SIMD32 FS heuristic for the moment. This disables for now the "optimistic" SIMD heuristic that was implemented for xe3+ and makes it dependent on a debugging option, instead use the static analysis-based codepath that was used in previous generations and was extended by previous commits in this MR to model the xe3 trade-off between register use and thread parallelism. The reason is that the main assumption of the optimistic SIMD heuristic didn't hold up with reality: Real-world testing on PTL shows that there are many cases where SIMD32 shows performance degradation relative to SIMD16 despite the ability of xe3 hardware to scale the GRF file of a thread on demand, unfortunately that scenario seems to be more pervasive than hoped when the optimistic SIMD heuristic was implemented pre-silicon. In many cases what seems to be going on is that even when the register file is able to scale with the increased register use of SIMD32, the thread parallelism of the EU is scaled down by a similar factor, so at the bottom line SIMD32 (depending on the actual ratio of register use between both variants) may not buy us anything, and it frequently encounters constraints (like SIMD lowering and less effective scheduling) that lead to worse codegen than SIMD16, easily tipping the balance in favor of SIMD16. The extension of the performance analysis pass that was done in a previous commit allows the original SIMD32 heuristic to take into account quantitatively this effect, and that seems pretty effective at disabling SIMD32 shaders that underperform judging from the statistically significant improvement of most Traci test-cases that run on my PTL system (4 iterations, 5% significance), no statistically significant regressions were observed: Nba2K23-trace-dx11-2160p-ultra: 10.16% ±0.34% Superposition-trace-dx11-2160p-extreme: 4.06% ±0.50% TotalWarWarhammer3-trace-dx11-1080p-high: 3.52% ±0.76% Payday3-trace-dx11-1440p-ultra: 2.41% ±0.81% MetroExodus-trace-dx11-2160p-ultra: 2.28% ±0.78% Borderlands3-trace-dx11-2160p-ultra: 1.89% ±0.65% MountAndBlade2-trace-dx11-1440p-veryhigh: 1.81% ±0.40% Blackops3-trace-dx11-1080p-high: 1.66% ±0.29% HogwartsLegacy-trace-dx12-1080p-ultra: 1.53% ±0.22% TotalWarPharaoh-trace-dx11-1440p-ultra: 1.44% ±0.31% Fortnite-trace-dx11-2160p-epix: 1.44% ±0.27% Naraka-trace-dx11-1440p-highest: 1.39% ±0.27% PubG-trace-dx11-1440p-ultra: 1.30% ±0.49% Destiny2-trace-dx11-1440p-highest: 1.10% ±0.23% Factorio-trace-1080p-high: 1.10% ±1.77% TerminatorResistance-trace-dx11-2160p-ultra: 1.08% ±0.31% Ghostrunner2-trace-dx11-1440p-ultra: 1.05% ±0.15% ShadowTombRaider-trace-dx11-2160p-ultra: 0.98% ±0.19% CitiesSkylines2-trace-dx11-1440p-high: 0.67% ±0.19% Palworld-trace-dx11-1080p-med: 0.44% ±0.22% The downside is that this will reverse the large reduction in compile-time we gained from the optimistic SIMD heuristic -- The run-time of both shader-db and fossil-db jump back up by nearly 20% with this change. I'm working on a better compromise based on run-time feedback that will hopefully allow us to preserve the compile-time benefit of the optimistic heuristic without the reduction in run-time performance, but in the meantime it seems like the run-time performance gap from SIMD32 is the more urgent issue to address since it has an impact on titles across the board. Despite the reversal of that compile-time improvement xe3 still achieves slightly lower compile time on the average than previous generations as a result of VRT, so this doesn't seem terribly tragic. v2: Add bit to brw_get_compiler_config_value() (Lionel). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:58 +00:00
Francisco Jerez	a7969b5d42	intel/brw: Apply `7e1362e9c0` to pre-xe3 codepath of brw_compile_fs(). This applies the same workaround as `7e1362e9c0` to the pre-xe3 codepath of brw_compile_fs(), since ray queries appear to be unsupported from SIMD32 fragment shaders. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:58 +00:00
Francisco Jerez	531a34c7dd	intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency. The current register allocation loop attempts to use a sequence of pre-RA scheduling heuristics until register allocation is successful. The sequence of scheduling heuristics is expected to be increasingly aggressive at reducing the register pressure of the program (at a performance cost), so that the instruction ordering chosen gives the lowest latency achievable with the register space available. Unfortunately that approach doesn't consistently give the best performance on xe3+, since on recent platforms a schedule with higher latency may actually give better performance if its lower register pressure allows the use of a lower number of VRT register blocks which allows the EU to run more threads in parallel. This means that on xe3+ the scheduling mode with highest performance is fundamentally dependent on the specific scenario (in particular where in the thread count-register use curve the program is at, and how effective the scheduler heuristics are at reducing latency for each additional block of GRFs used), so it isn't possible to construct a fixed sequence of the existing heuristics guaranteed to be ordered by decreasing performance. In order to find the scheduling heuristic with better performance we have to run multiple of them prior to register allocation and do some arithmetic to account for the effect on parallelism of the register pressure estimated in each case, in order to decide which heuristic will give the best performance. This sounds costly but it is similar to the approach taken by brw_allocate_registers() when unable to allocate without spills in order to decide which scheduling heuristic to use in order to minimize the number of spills. In cases where that happens on xe3+ the scheduling runs introduced here don't add to the scheduling runs done to find the heuristic with minimum register pressure, we attempt to determine the heuristic with lowest pressure and best performance in the same loop, and then use one or the other depending on whether register allocation succeeds without spills. Significantly improves performance on PTL of the following Traci test cases (4 iterations, 5% significance): Nba2K23-trace-dx11-2160p-ultra: 4.48% ±0.38% Fortnite-trace-dx11-2160p-epix: 1.61% ±0.28% Superposition-trace-dx11-2160p-extreme: 1.37% ±0.26% PubG-trace-dx11-1440p-ultra: 1.15% ±0.29% GtaV-trace-dx11-2160p-ultra: 0.80% ±0.24% CitiesSkylines2-trace-dx11-1440p-high: 0.68% ±0.19% SpaceEngineers-trace-dx11-2160p-high: 0.65% ±0.34% The compile-time cost of shader-db increases significantly by 3.7% after this commit (15 iterations, 5% significance), the compile-time of fossil-db doesn't change significantly in my setup. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:57 +00:00
Francisco Jerez	0e802cecba	intel/brw: Make sure we don't use stale analysis after inst. order restore in brw_allocate_registers(). Do invalidate_analysis() from restore_instruction_order() to make sure we don't re-use stale analysis pass results if the user forgets to call invalidate_analysis() explicitly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:57 +00:00
Francisco Jerez	dfc2a89d96	intel/brw: Allow using performance analysis pass pre-register allocation. Mainly this involves changing 'struct state' so that the dep_ready array is allocated with a dynamic size based on the number of VGRFs of the program instead of assuming a fixed XE3_MAX_GRF count of GRF dependencies. VGRF register dependencies are then handled by using one dep_ready entry per VGRF allocation instead of one per hardware register. The ability to use the performance analysis pass pre-regalloc will mostly be useful on xe3+, but this also has the side effect of saving some memory on xe2 and earlier platforms since we no longer need to allocate XE3_MAX_GRF dep_ready entries for them. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:57 +00:00

1 2 3 4 5 ...

211815 Commits