AlexIndustrial/mesa

Author	SHA1	Message	Date
Lionel Landwerlin	06cf911ab4	brw: lower shader opcode into tex_instr Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37527>	2025-09-23 15:37:40 +00:00
Lionel Landwerlin	bddfbe7fb1	brw/blorp: lower MCS fetching in NIR One advantage here of moving a bunch of stuff to NIR is that we can now have consistent payload types straight from the NIR conversion to BRW. This massively simplifies the BRW lowering code and avoids type errors that are quite common to make in the backend. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37527>	2025-09-23 15:37:40 +00:00
Lionel Landwerlin	d4ab2087cf	brw: lower non coherent FS load_output in NIR Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37527>	2025-09-23 15:37:39 +00:00
Ian Romanick	3e04990c68	elk: Increase the size of some structure fields in combine_constants In very large shaders, first_use_ip, last_use_ip, and even (register) nr can overflow 16 bits. Increase the size of these fields. Some structure components are rearranged to promote better packing. Fixes: `2dad1e3abd` ("i965/fs: Add pass to combine immediates.") Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37482>	2025-09-22 20:02:25 +00:00
Ian Romanick	b7e1ac8309	brw: Increase the size of some structure fields in combine_constants In very large shaders, first_use_ip, last_use_ip, and even (register) nr can overflow 16 bits. Increase the size of these fields. used_in_single_block is moved earlier in the structure to promote better packing. Fixes: `2dad1e3abd` ("i965/fs: Add pass to combine immediates.") Closes: #9489 Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Tested-by: Tapani Pälli <tapani.palli@intel.com> Tested-by: @joostruis Tested-by: @Snoucher Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37482>	2025-09-22 20:02:25 +00:00
Caio Oliveira	f65fbb23e2	brw: Fix encoding of 3-src dst in Xe2+ Use FD20 macro that will account for the implicit LSB zero value and is already used for sources. For the new macro we need to use the entire bit-range of the field (55-51), so remove the adjustments we used to do prior to encoding and decoding. Fixes assertion in vkpeak (https://github.com/nihui/vkpeak) when running bf16 tests on BMG. And the code now will correctly apply the subreg_nr to the destination, e.g. a mad(32) gets splitted into two pieces, the generation would not fill out the upper-part of the register ``` mad(16) g13<1>BF g10<8,8,1>BF g12<8,8,1>BF g56<1,1,1>F { align1 1H A@5 }; -mad(16) g13<1>BF g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 }; +mad(16) g13.16<1>BF g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 }; ``` Reviewed-by: Matt Turner <mattst88@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37236>	2025-09-18 18:21:25 +00:00
Alyssa Rosenzweig	804ced9047	intel: drop legacy flatshade handling Let mesa/st do the keying instead. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>	2025-09-18 14:14:11 +00:00
Alyssa Rosenzweig	36bd06ebab	intel: drop clamp_fragment_color handling This is all dead code since we weren't even seting the cap in iris/crocus! Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>	2025-09-18 14:14:11 +00:00
Alyssa Rosenzweig	957f326a10	brw: drop printf info plumbing unused since printf hashing. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>	2025-09-18 14:14:10 +00:00
Alyssa Rosenzweig	bbf5bc8632	brw: cleanup int64 option set Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>	2025-09-18 14:14:09 +00:00
Alyssa Rosenzweig	168704c2fe	brw: hoist shared options out of the stage loop ideally we'd have no stage switching, but this is just a cleanup for now. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>	2025-09-18 14:14:09 +00:00
Alyssa Rosenzweig	0d7083d5bc	brw: drop indirection on compiler options I see no point, we allocate for every shader stage anyway. This is a bit simpler. I'm not a fan of the brw_compiler singleton at all but torching that is not on today's agenda. Flattening it a little bit very much is. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>	2025-09-18 14:14:08 +00:00
Alyssa Rosenzweig	2c161cc35d	brw: drop unused brw_kernel code unused since we dropped GRL. Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>	2025-09-18 14:14:07 +00:00
Georg Lehmann	714a149396	nir: remove unsigned upper bound config All config information is now either in nir->info or nir->options. Reviewed-by: Rhys Perry <pendingchaos02@gmail.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>	2025-09-16 09:24:04 +00:00
Lionel Landwerlin	a69853ce5e	brw: improve eot_reg computation in register allocate Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `c4c7ff3f8f` ("brw: enable register allocation to deal with multiple EOTs") Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37326>	2025-09-16 07:49:07 +00:00
Lionel Landwerlin	1f86a4ee37	brw: remove unused RT write code With `4fda724fd4` ("brw: Avoid invalid access when compacting out-of-bounds JIP/UIP") this stuff isn't needed anymore. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `fe38fb858c` ("brw: workaround broken indirect RT messages on Gfx11") Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37326>	2025-09-16 07:49:07 +00:00
Francisco Jerez	5c68b351fe	intel/brw: Fix regression in brw_allocate_registers() compiling large shaders with throughput==0. The following Vulkan CTS tests that emit massive shaders were regressing after "intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency.": dEQP-VK.graphicsfuzz.cov-nested-loops-set-struct-data-verify-in-function dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops The reason is that they have so many nested loops that they cause the performance analysis utilization estimates to overflow the 32-bit floating-point variables used to calculate them, which causes our throughput estimate to underflow and equal zero for those shaders, which breaks the logic introduced in brw_allocate_registers() to select the scheduling variant with highest throughput, since none of the scheduling modes tried has better throughput than the initial value equal to zero of "best_perf". Instead use -INFINITY as initial value for "best_perf" so we always select a scheduling mode. This should have been caught by CI but oddly the tests above are showing up as "not run" on my last baseline runs, so this wasn't flagged as a regression for me. v2: Use -INFINITY instead of previous approach that used NaN (Ian). Fixes: `531a34c7dd` ("intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency.") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13884 Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13885 Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1) Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37322>	2025-09-15 21:10:47 +00:00
Sushma Venkatesh Reddy	5f10c1a8fb	intel/compiler: generalize workaround script name for broader applicability Renamed brw_nir_trig_workarounds.py to brw_nir_workarounds.py to reflect its expanded scope beyond just trignometric workarounds. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36990>	2025-09-12 22:32:46 +00:00
Sushma Venkatesh Reddy	fe1d84e083	intel/compiler: apply sqrt workaround for Horizon Forbidden West shader Added a workaround for a known shader in Horizon Forbidden West that causes visual corruption on Intel anv driver. The fix clamps fsqrt inputs using fmax(x, 1e-12) to avoid invalid values. Integrated the workaround via brw_nir_apply_sqrt_workarounds() and applied it conditionally in the Vulkan pipeline based on the shader's BLAKE3 hash. Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12555 Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Reviewed-by: Tapani Pälli <tapani.palli@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36990>	2025-09-12 22:32:46 +00:00
Georg Lehmann	79d02047b8	intel: switch to new subgroup size info Reviewed-by: Iván Briano <ivan.briano@intel.com> Acked-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37258>	2025-09-12 21:05:17 +00:00
Georg Lehmann	95c2a65662	nir: remove unused shader_info param in nir_create_shader Reviewed-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37258>	2025-09-12 21:05:17 +00:00
Caio Oliveira	c358842c1d	brw: Don't use individual rallocs for each instruction Move from a single ralloc allocation per instruction to contiguous blocks of allocations. Still use ralloc for those large blocks. Each ralloc allocation has at least 5 pointers of overhead, which would be about a third of the current brw_inst, and get worse as we try to pack brw_inst better. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:05 +00:00
Caio Oliveira	2506540566	brw: Repack brw_inst fields In Release build, goes from 72 to 64 bytes, and now fits in a single cacheline. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:05 +00:00
Caio Oliveira	8ded571ef4	brw: Allocate only brw_inst for BASE instructions Now that all the other kinds were added, all transforms to SEND will come from non-BASE kinds, so we don't need overallocate for BASE instructions. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:05 +00:00
Caio Oliveira	08c0f33874	brw: Add a generic LOGICAL instruction kind This kind of instruction doesn't have a special struct but will still be always allocated so that it can be lowered to SEND. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:05 +00:00
Caio Oliveira	df2b5fb03f	brw: Add brw_fb_write_inst Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:04 +00:00
Caio Oliveira	d06c0a370e	brw: Add brw_urb_inst Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:04 +00:00
Caio Oliveira	90967e7b16	brw: Add brw_load_payload_inst Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:03 +00:00
Caio Oliveira	388bac06ce	brw: Add brw_dpas_inst Fixed the types in brw_inst::bits so the struct is packed correctly. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:03 +00:00
Caio Oliveira	09a26526cc	brw: Add brw_mem_inst Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:02 +00:00
Caio Oliveira	f0f1e63f99	brw: Add brw_tex_inst Incorporate some "control sources" directly into the instruction. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:02 +00:00
Caio Oliveira	0fcce2722f	brw: Add brw_send_inst Move all the SEND specific fields from brw_inst into brw_send_inst. This new instruction kind will contain all variants of SENDs plus the virtual opcodes that were already relying on those SEND fields. Use the `as_send()` helper to go from a brw_inst into the brw_send_inst when applicable. Some of the code was changed to use the brw_send_inst type directly. Until other kinds are added, all the instructions are allocated the same amount of space as brw_send_inst. This ensures that all brw_transform_inst() calls are still valid. This will change after a few patches so that BASE instructions can use less memory. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:01 +00:00
Caio Oliveira	b27f6621ae	brw: Add initial support for different instruction kinds Prepare code for supporting subclasses of brw_inst for certain specialized kinds of instructions. This will allow - Move certain fields from brw_inst to the specialized one, reducing its size and making it easy to understand what applies to which instruction; - Move certain control sources into the specialized inst type, which currently take a full brw_reg to encode small integers. Reducing the overall sources we walk and care also might help the code in general. Next commits will add the new instruction kinds. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:01 +00:00
Caio Oliveira	339a4e8680	brw: Remove the extra function call when lowering samplers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:00 +00:00
Caio Oliveira	71c23c6722	brw: Add brw_builder::URB_READ and URB_WRITE helpers Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:25:00 +00:00
Caio Oliveira	f92116832f	brw: Add brw_builder::SEND() helper Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:59 +00:00
Caio Oliveira	e194909b3f	brw: Add and use brw_transform_inst() The new function takes care of changing an instruction opcode and sources, which will allow later patches to tweak how allocations are done in those cases. Like the instruction allocation, this also takes a shader (or a builder, for it to get a shader). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:59 +00:00
Caio Oliveira	5d0160a87f	brw: Pass brw_shader in fold_instruction Will be used later for the general instruction transforming function. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:58 +00:00
Caio Oliveira	8f16cac492	brw: Allow emit instruction with only number of sources The emit will allocate the necessary number of sources but will let the caller fill them in. Change a couple of places to take advantage of that. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:58 +00:00
Caio Oliveira	3ef86a8d00	brw: Let the builder fill the sources of brw_inst Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:58 +00:00
Caio Oliveira	506fce20f0	brw: Bundle the allocation of brw_inst and its sources Flatten all the work being done into brw_new_inst() and brw_clone_inst() and allocate both the instruction and the sources in one swoop. For now we still keep a pointer to the array instead of declaring an array as last element to still allow growing the array -- which is used by the compiler in a few places. This commit removes the constructors for brw_inst, the idea is that the instructions are managed by the brw_shader, so we always go through it for new ones. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:57 +00:00
Caio Oliveira	c81c8c917f	brw: Remove builtin sources from brw_inst A later patch will add a different mechanism to achieve the same goal. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:57 +00:00
Caio Oliveira	858162a2fc	brw: Allocate brw_inst::src with ralloc In the few cases we have to _increase_ the number of sources, the new code will not attempt to recollect the memory, i.e. it delays freeing the old smaller one source array. For the instructions that may need this (when making a SEND into a SEND_GATHER), this is not expected to happen more than once. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:56 +00:00
Caio Oliveira	29c12bbebf	brw: Centralize brw_inst allocation Add and use brw_new_inst() and brw_clone_inst() and do not use stack allocated brw_insts. The builder was changed to not use the temporary ones either. Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:56 +00:00
Caio Oliveira	c90ec6d7e7	brw: Use uint16_t for size_written UINT16_MAX is larger than the maximum number of bytes in the general register file: 256 GRFs * 16 slots * 4 bytes = 16384. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:55 +00:00
Kenneth Graunke	6281a12822	brw: Remove brw_inst::no_dd_check/no_dd_clear These dependency hints were primarily useful for the vec4 backend, where it was common to write subsets of a vec4's components across multiple instructions. In the scalar backend, we rarely used them. They also no longer exist on Tigerlake and later in favor of software scoreboarding. Dropping this allows us to clean up the IR a bit. We still use the hardware hints in the generator in a couple places: - Gfx9-12.0 scratch headers - Quad swizzles - Indirect MOV lowering In theory we might want them back if we moved that lowering to the IR. For scratch at least, I suspect it won't have a huge impact, as we're already incurring the cost of spills/fills. The others are fairly rare as well, so it may not be worth keeping. Reviewed-by: Caio Oliveira <caio.oliveira@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>	2025-09-12 00:24:55 +00:00
Caio Oliveira	03e9c01f0c	brw: Add and use more brw_validate.cpp macros Add and use more comparison variants (which provide more detailed print out of the values), remove old references to "fsv" and "scalar", use assertion names more similar to GoogleTest that we already use elsewhere. Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37267>	2025-09-10 17:44:38 -07:00
Dylan Baker	f18aca8689	intel/brw: Fix implementaiton of \|= operator for enum The current implementation does nothing, since it has no side effects, only a return value. By passing `x` as a reference we can mutate the value before returning. Fixes: `df37c7ca74` ("brw: fix analysis dirtying with pulled constants") CID: 1665293 Reviewed-by: Ian Romanick <ian.d.romanick@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37263>	2025-09-10 16:30:19 +00:00
Lionel Landwerlin	33d2c31d7a	brw: don't use brw_null_reg() for unused SEND sources Just avoiding the validation assert. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `47fe9d28e7` ("brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many") Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13777 Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37112>	2025-09-10 09:08:27 +03:00
Francisco Jerez	5bf7bb5cf9	intel/brw/xe3+: Re-enable static analysis-based SIMD32 FS heuristic for the moment. This disables for now the "optimistic" SIMD heuristic that was implemented for xe3+ and makes it dependent on a debugging option, instead use the static analysis-based codepath that was used in previous generations and was extended by previous commits in this MR to model the xe3 trade-off between register use and thread parallelism. The reason is that the main assumption of the optimistic SIMD heuristic didn't hold up with reality: Real-world testing on PTL shows that there are many cases where SIMD32 shows performance degradation relative to SIMD16 despite the ability of xe3 hardware to scale the GRF file of a thread on demand, unfortunately that scenario seems to be more pervasive than hoped when the optimistic SIMD heuristic was implemented pre-silicon. In many cases what seems to be going on is that even when the register file is able to scale with the increased register use of SIMD32, the thread parallelism of the EU is scaled down by a similar factor, so at the bottom line SIMD32 (depending on the actual ratio of register use between both variants) may not buy us anything, and it frequently encounters constraints (like SIMD lowering and less effective scheduling) that lead to worse codegen than SIMD16, easily tipping the balance in favor of SIMD16. The extension of the performance analysis pass that was done in a previous commit allows the original SIMD32 heuristic to take into account quantitatively this effect, and that seems pretty effective at disabling SIMD32 shaders that underperform judging from the statistically significant improvement of most Traci test-cases that run on my PTL system (4 iterations, 5% significance), no statistically significant regressions were observed: Nba2K23-trace-dx11-2160p-ultra: 10.16% ±0.34% Superposition-trace-dx11-2160p-extreme: 4.06% ±0.50% TotalWarWarhammer3-trace-dx11-1080p-high: 3.52% ±0.76% Payday3-trace-dx11-1440p-ultra: 2.41% ±0.81% MetroExodus-trace-dx11-2160p-ultra: 2.28% ±0.78% Borderlands3-trace-dx11-2160p-ultra: 1.89% ±0.65% MountAndBlade2-trace-dx11-1440p-veryhigh: 1.81% ±0.40% Blackops3-trace-dx11-1080p-high: 1.66% ±0.29% HogwartsLegacy-trace-dx12-1080p-ultra: 1.53% ±0.22% TotalWarPharaoh-trace-dx11-1440p-ultra: 1.44% ±0.31% Fortnite-trace-dx11-2160p-epix: 1.44% ±0.27% Naraka-trace-dx11-1440p-highest: 1.39% ±0.27% PubG-trace-dx11-1440p-ultra: 1.30% ±0.49% Destiny2-trace-dx11-1440p-highest: 1.10% ±0.23% Factorio-trace-1080p-high: 1.10% ±1.77% TerminatorResistance-trace-dx11-2160p-ultra: 1.08% ±0.31% Ghostrunner2-trace-dx11-1440p-ultra: 1.05% ±0.15% ShadowTombRaider-trace-dx11-2160p-ultra: 0.98% ±0.19% CitiesSkylines2-trace-dx11-1440p-high: 0.67% ±0.19% Palworld-trace-dx11-1080p-med: 0.44% ±0.22% The downside is that this will reverse the large reduction in compile-time we gained from the optimistic SIMD heuristic -- The run-time of both shader-db and fossil-db jump back up by nearly 20% with this change. I'm working on a better compromise based on run-time feedback that will hopefully allow us to preserve the compile-time benefit of the optimistic heuristic without the reduction in run-time performance, but in the meantime it seems like the run-time performance gap from SIMD32 is the more urgent issue to address since it has an impact on titles across the board. Despite the reversal of that compile-time improvement xe3 still achieves slightly lower compile time on the average than previous generations as a result of VRT, so this doesn't seem terribly tragic. v2: Add bit to brw_get_compiler_config_value() (Lionel). Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>	2025-09-10 02:15:58 +00:00

1 2 3 4 5 ...

4589 Commits