AlexIndustrial/mesa

Author	SHA1	Message	Date
Lionel Landwerlin	ed3c2f73db	intel/fs: fixup sources number from opt_algebraic Fixes issues with register_coalesce : fossilize-replay: brw_fs_register_coalesce.cpp:297: bool fs_visitor::register_coalesce(): Assertion `mov[i]->sources == 1' failed. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>	2023-03-14 10:38:50 +00:00
Lionel Landwerlin	18bdc71459	intel/fs: fix nir_opt_peephole_ffma max vec assumption There can be larger vec than vec4. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21782>	2023-03-14 10:38:50 +00:00
Lionel Landwerlin	efde1917c9	intel/fs: don't SEND messages as partial writes For instance, to load uniform data with the LSC we usually rely on tranpose messages which have to execute in SIMD1. Those end up being considered as partial writes so within loops their life span spread to the whole loop, increasing register pressure. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: mesa-stable Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21867>	2023-03-14 10:10:32 +00:00
Ian Romanick	28311f9d02	nir: intel/compiler: Move ufind_msb lowering to NIR Fossil-db results: All Intel platforms had similar results. (Ice Lake shown) Cycles in all programs: 9098346105 -> 9098333765 (-0.0%) Cycles helped: 6 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	08ca862ef8	intel/compiler: Tighter src and dest size bounds checking for some opcodes Enforce the sizes listed in the Skylake PRM: BFREV: source types: D destination types: D CBIT: source types: UB, UW, UD destination types: UD FBH: source types: D, UD destination types: UD FBL: source types: UD destination types: UD LZD: source types: D, UD destination types: UD v2: Update BFREV commit message documentation. Suggested by Ken. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	0cc7bf63b7	nir: intel/compiler: Move ifind_msb lowering to NIR Unlike ufind_msb, ifind_msb is only defined in NIR for 32-bit values, so no @32 annotation is required. No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Ian Romanick	15c6c859cf	intel/compiler: Lower find_lsb in NIR No shader-db or fossil-db changes on any Intel platform. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19042>	2023-03-10 15:27:17 +00:00
Eric Engestrom	f5d3d1e7ed	meson: inline gtest_test_protocol now that it's always 'gtest' Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21485>	2023-03-10 07:20:29 +00:00
Sagar Ghuge	9a34b2ab0e	intel/compiler: Add swsb_stall debug option When enabled, on gfx12 plus, we will add the sync nop instruction after each instruction to make sure that current instruction depends on the previous instruction explicitly. This option will help us to get a hint if something is missing or broken in software scoreboard pass. Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21797>	2023-03-10 06:55:39 +00:00
Kenneth Graunke	dfe652fb03	intel/eu: Simplify brw_F32TO16 and brw_F16TO32 Now that we aren't using them on Gfx8+ we can drop a lot of cruft. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	c590a3eadf	intel/fs: Move packHalf2x16 handling to lower_pack() This mainly lets the software scoreboarding pass correctly mark the instructions, without needing to resort to fragile manual handling in the generator. We can also make small improvements. On Gfx 8LP-12.0, we no longer have the restrictions about DWord alignment, so we can simply write each half into its intended location, rather than writing it to the low DWord and then shifting it in place. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	f5e5705c91	intel/fs: Use F32TO16/F16TO32 helpers in fquantize16 handling I originally thought that we were intentionally emitting the legacy opcodes here to make them opaque to the optimizer, so that it wouldn't eliminate the explicit type conversions, as they're actually required to do the quantization. But...we don't actually optimize those away currently anyway. So...go ahead and use the helpers for consistency. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	44c6ccb197	Revert "intel/fs: Fix inferred_sync_pipe for F16TO32 opcodes" With the previous patch, we no longer need to special case this, as we emit a MOV with an HF source, rather than F16TO32 with an UW source, on all platforms that need scoreboarding. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	309ec3725a	intel/fs: Use new F16TO32 helpers for unpack_half_split_* opcodes This gets us a MOV at the IR level on Gfx8+ which should be more optimizable than F16TO32. It also removes confusion about which pipe which the instruction will run on. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	78bf53904e	intel/fs: Delete a TODO about using brw_F32TO16. We can just use the new builder helpers to get the optimization advantages of a MOV on Gfx8+ while also getting the necessary F32TO16 on Gfx7.x and yet not worry too hard about it. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	966995d911	intel/fs: Add builder helpers for F32TO16/F16TO32 that work on Gfx7.x These take care of emitting the F32TO16/F16TO32 instructions on Gfx7.x but otherwise just emit a type converting MOV on Gfx8+. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Kenneth Graunke	3864049184	intel/fs: Fix inferred_sync_pipe for F16TO32 opcodes For converting half-float to float, we currently emit BRW_OPCODE_F16TO32 with a UW source, to match legacy Gfx7 behavior. In the generator, this becomes a MOV with a HF source on Gfx8+. Unfortunately, this UW source confuses the scoreboarding pass into thinking it's an integer source, leading to incorrect SWSB annotations on Alchemist. We should ultimately fix the IR to stop being so...legacy...here, but this is the simplest fix for stable branches. Fixes misrendering in Elden Ring and likely Sekiro: Shadows Die Twice. Cc: mesa-stable Tested-by: Chuansheng Liu <chuansheng.liu@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8018 References: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8375 Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21783>	2023-03-09 23:26:17 +00:00
Lionel Landwerlin	09cdb77a92	intel/fs: report max register pressure in shader stats Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ivan Briano <ivan.briano@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21756>	2023-03-08 13:37:07 +00:00
Mark Janes	08649e3673	intel/fs: use generated workaround helpers for Wa_14017989577 Wa_14017989577 is a clone of Wa_14015360517, which applies to several platforms beyond INTEL_PLATFORM_DG2_G10. Update references to Wa_14017989577, and use the generated workaround helper to ensure application to the proper platforms. Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21744>	2023-03-07 21:43:11 +00:00
Mark Janes	bc04e2daca	intel/fs: use generated helpers for Wa_1209978020 / Wa_18012201914 Wa_1209978020 is a clone of Wa_18012201914. Update references to refer to the originating bug, and use generated helpers to ensure it is applied to future platforms as needed. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21741>	2023-03-07 01:41:53 +00:00
Caio Oliveira	c92d589597	intel/compiler: Drop non-scoped barrier handling Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>	2023-03-07 00:41:13 +00:00
Caio Oliveira	07de034791	intel/compiler: Drop brw_nir_lower_scoped_barriers Now that we handle scoped barriers with execution scope during NIR -> Backend IR translation, this lowering is not needed anymore. Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>	2023-03-07 00:41:13 +00:00
Caio Oliveira	dfc34b1a65	intel/vec4: Handle scoped barriers with execution scope Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>	2023-03-07 00:41:13 +00:00
Caio Oliveira	db0a09c9e2	intel/fs: Handle scoped barriers with execution scope Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21634>	2023-03-07 00:41:13 +00:00
Mark Janes	b96019f82b	intel/fs: use generated workaround helpers for Wa_14010017096 This workaround does not apply beyond gen 12.0. Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21746>	2023-03-07 00:10:33 +00:00
Alyssa Rosenzweig	952bd63d6d	nir/opt_barrier: Generalize to control barriers For GLSL, we want to optimize code like memoryBarrierBuffer(); controlBarrier(); into a single scoped_barrier intrinsic for the backend to consume. Now that backends can get scoped_barriers everywhere, what's left is enabling backends to combine these barriers together. We already have an Intel-specific pass for combining memory barriers; it just needs a teensy bit of generalization to allow combining all sorts of barriers together. This avoids code quality regression on Asahi when switching to purely scoped barriers. It's probably useful for other backends too. Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io> Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21661>	2023-03-06 22:09:27 +00:00
Faith Ekstrand	83fd7a5ed1	intel: Use nir_lower_tex_options::lower_index_to_offset Reviewed-by: Alyssa Rosenzweig <alyssa@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21546>	2023-03-06 21:38:32 +00:00
Faith Ekstrand	9a4641cf6b	intel/nir: Limit unaligned loads to vec4 This probably doesn't affect Vulkan or GL because they can't have anything bigger than a vec4 anyway unless it's a u64vec4 and those have to be at least 8B aligned. This may affect CL apps if they use __attribute__((packed)) on something with big vectors, depending on how LLVM decides to translate that. Fixes: `f8aa83f0c8` ("intel/nir: Use nir_lower_mem_access_bit_sizes()") Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>	2023-03-03 02:00:39 +00:00
Faith Ekstrand	eb9a56b6ca	nir: Rename nir_mem_access_size_align::align_mul to align It's a simple alignment so calling it align_mul is a bit misleading. Suggested-by: M Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>	2023-03-03 02:00:39 +00:00
Faith Ekstrand	ca4d73ba36	nir: Add a combined alignment helper Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@colllabora.com> Reviewed-by: M Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>	2023-03-03 02:00:39 +00:00
Faith Ekstrand	116a851264	nir: Add mode filtering to lower_mem_access_bit_sizes Reviewed-by: Jesse Natalie <jenatali@microsoft.com> Reviewed-by: M Henning <drawoc@darkrefraction.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21524>	2023-03-03 02:00:39 +00:00
Tapani Pälli	207eb94445	intel/compiler: add comment about workaround on simd width Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21619>	2023-03-02 14:06:36 +00:00
Dylan Baker	a0fa31bcdd	intel/dev: create a helper dependency for libintel_dev This ensures that users of libintel_dev.a won't be compiled until include files are generated, and that they are recompiled when the header changes. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mark Janes <markjanes@swizzler.org> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20825>	2023-03-02 00:01:27 +00:00
Caio Oliveira	c80268a20d	intel/compiler: Mark various memory barriers intrinsics unreachable Now that both SPIR-V and GLSL are using scoped barriers, we can stop handling the specialized ones. Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/3339>	2023-02-27 20:24:01 +00:00
Francisco Jerez	4420251947	intel/rt: Fix L3 bank performance bottlenecks due to SW stack stride alignment. Power-of-two SW stack sizes are prone to causing collisions in the hashing function used by the L3 to map memory addresses to banks, which can cause stack accesses from most DSSes to bottleneck on a single L3 bank. Fix it by padding the SW stack stride by a single cacheline if it was a power of two. This has been reported by Felix DeGrood to improve Quake2 RTX performance by ~30% on DG2-512 in combination with other RT patches Lionel Landwerlin has been working on. Many thanks to Felix DeGrood for doing much of the legwork and providing several iterations of Q2RTX performance counter dumps which eventually prompted me to consider the hash collision theory and motivated this patch, and for providing additional performance counter dumps confirming that there is no longer an appreciable imbalance in traffic across L3 banks after this change. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21461>	2023-02-26 11:48:33 -08:00
Marcin Ślusarz	512126b26d	intel/compiler: remove unused field from fs_thread_payload Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20957>	2023-02-23 08:04:24 +00:00
Marcin Ślusarz	e29a964d02	intel/compiler/mesh: follow the type of offset variable This allows copy propagation to kick in, decreasing the overall number of generated instructions. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21098>	2023-02-21 11:10:24 +00:00
Marcin Ślusarz	15afb8dcc6	intel/compiler/mesh: apply URB payload mask once per program Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21098>	2023-02-21 11:10:23 +00:00
Daniel Schürmann	2bb369dd8d	nir: add assertions that loops don't have a Continue Construct Hoping that I didn't miss any, this should add assertions to all functions and passes which explicitly handle 'nir_loop'. Acked-by: Faith Ekstrand <faith.ekstrand@collabora.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/13962>	2023-02-21 10:41:11 +00:00
Kenneth Graunke	96ba0344db	intel: Use common helpers for TCS passthrough shaders Rob added these new helpers a while back, which freedreno and radeonsi both share. We should use them too. The new helpers use variables and system value intrinsics, so we can drop the explicit binding table creation and just use the normal paths. Because we have to rewrite the system value uploading anyway, we drop the scrambling of the default tessellation levels on upload, and instead let the compiler go ahead and remap components like any normal shader. In theory, this results in more shuffling in the shader. In practice, we already do MOVs for message setup. In the passthrough shaders I looked at, this resulted in no extra instructions on Icelake (SIMD8 SINGLE_PATCH) and Tigerlake (8_PATCH). On Haswell, one shader grew by a single instruction for a pittance of cycles in a stage that isn't a performance bottleneck anyway. Avoiding remapping wasn't so much of an optimization as just the way that I originally wrote it. Not worth it. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20809>	2023-02-20 03:54:24 +00:00
Faith Ekstrand	f8aa83f0c8	intel/nir: Use nir_lower_mem_access_bit_sizes() This drops the Intel-specific pass in favor of the new generic one. No shader-db changes on Skylake or DG2. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21232>	2023-02-17 00:55:54 +00:00
Lionel Landwerlin	9ac192d79d	intel/fs: bound subgroup invocation read to dispatch size This is to avoid out of bound register accesses (potentially leading to hangs) when the dispatch size is smaller than when is reported in the NIR subgroup_size. v2: Implement bounding with a mask (since workgroup sizes are powers of 2) (Faith) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `530de844ef` ("intel,anv,iris,crocus: Drop subgroup size from the shader key") Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21282>	2023-02-14 21:29:42 +00:00
Marcin Ślusarz	dd9bf86725	intel/compiler/mesh: use slice id of task urb handles in mesh shaders When mesh shader is spawned on a different slice than the originating task shader, then input task urb handle can come from a different slice, so masking this information off will load data from the current slice, instead of the one where real data are. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21007>	2023-02-14 09:36:53 +00:00
Marcin Ślusarz	9d3e3c15f3	intel/compiler: replace gl_Layer & gl_ViewportIndex by 0 in fs if ms doesn't write it Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17620>	2023-02-14 08:24:51 +00:00
Marcin Ślusarz	465c241266	intel/compiler/mesh: use U888X packed index format Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20910>	2023-02-10 21:03:33 +00:00
Väinö Mäkelä	56667002fd	intel/vec4: Don't optimize multiply by 1.0 away The SPIR-V compiler's implementation of tanh generates a multiply by 1.0 to flush denorms to zero. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20232>	2023-02-10 16:34:01 +00:00
Väinö Mäkelä	dcad4a2cd1	intel/vec4: Set the rounding mode The rounding mode only needs to be set once, because 16-bit floats or preserving denorms aren't supported for the platforms where vec4 is used. Reviewed-by: Emma Anholt <emma@anholt.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20232>	2023-02-10 16:34:00 +00:00
Lionel Landwerlin	ebc4893947	intel/fs: fix mesh indirect movs The size in src[2] is in byte and needs to cover any possible data accessed in src[0] by the indirection. That way the register allocation is aware of what cannot be spilled for the instruction to execute on valid data. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `70ace2bbcd` ("intel/compiler: Implement Task Output and Mesh Input") Reviewed-by: Marcin Ślusarz <marcin.slusarz@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21188>	2023-02-09 15:35:55 +00:00
Francisco Jerez	f0b6348ad0	intel/eu/gfx8-9: Fix execution with all channels disabled due to HW bug #220160235 . This hardware bug is the result of a control flow optimization present in Gfx8-9 meant to prevent the ELSE instruction from disabling all channels and update the control flow stack only to have them re-enabled at the ENDIF instruction executed immediately after it. Instead, on Gfx8-9 an ELSE instruction that would normally have ended up with all channels disabled would pop off the last element of the stack and jump directly to JIP+1 instead of to the ENDIF at JIP, skipping over the ENDIF instruction. In simple cases this would work okay (though it's actual performance benefit is questionable), but in cases where a branch instruction within the IF block (e.g. BREAK or CONTINUE) caused all active channels to jump outside the IF conditional, the optimization would break the JIP chain of "join" instructions by skipping the ENDIF, causing the block of instructions immediately after the ENDIF to execute with all channels disabled until execution reaches the reconvergence point. This issue was observed on SKL in the dEQP-VK.reconvergence.subgroup_uniform_control_flow_elect.compute.nesting4.0.38 test in combination with some Vulkan binding model changes Lionel is working on. In such cases the execution with all channels disabled was leading to corruption of an indirect message descriptor, causing a hang. Unfortunately the hardware bug doesn't provide a recommended workaround. In order to fix the problem we point the JIP of an ELSE instruction to the instruction immediately before the ENDIF -- However that's not expected to work due to the restriction that JIP and UIP must be equal if and only if BranchCtrl is disabled -- So this patch also enables BranchCtrl, which is intended to support join instructions within the "ELSE" block, which in turn disables the optimization described above, which in turn causes us to execute the instruction immediately before the ENDIF with all channels disabled -- So in order to avoid further fallout from executing code with all channels disabled we need to insert a NOP before ENDIF instructions that have a matching ELSE instruction. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20921>	2023-02-07 21:37:12 +00:00
Alejandro Piñeiro	ba0bc7182d	anv: use shader_info->var_copies_lowered Instead of passing allow_copies as a parameter for brw_nir_optimize (so manually doing that tracking). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/19338>	2023-02-06 22:11:34 +00:00

1 2 3 4 5 ...

2440 Commits