Commit Graph

4589 Commits

Author SHA1 Message Date
Lionel Landwerlin 06cf911ab4 brw: lower shader opcode into tex_instr
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37527>
2025-09-23 15:37:40 +00:00
Lionel Landwerlin bddfbe7fb1 brw/blorp: lower MCS fetching in NIR
One advantage here of moving a bunch of stuff to NIR is that we can
now have consistent payload types straight from the NIR conversion to
BRW.

This massively simplifies the BRW lowering code and avoids type errors
that are quite common to make in the backend.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37527>
2025-09-23 15:37:40 +00:00
Lionel Landwerlin d4ab2087cf brw: lower non coherent FS load_output in NIR
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37527>
2025-09-23 15:37:39 +00:00
Ian Romanick 3e04990c68 elk: Increase the size of some structure fields in combine_constants
In very large shaders, first_use_ip, last_use_ip, and even (register) nr
can overflow 16 bits. Increase the size of these fields.  Some structure
components are rearranged to promote better packing.

Fixes: 2dad1e3abd ("i965/fs: Add pass to combine immediates.")
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37482>
2025-09-22 20:02:25 +00:00
Ian Romanick b7e1ac8309 brw: Increase the size of some structure fields in combine_constants
In very large shaders, first_use_ip, last_use_ip, and even (register) nr
can overflow 16 bits. Increase the size of these fields.
used_in_single_block is moved earlier in the structure to promote better
packing.

Fixes: 2dad1e3abd ("i965/fs: Add pass to combine immediates.")
Closes: #9489
Reviewed-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: Tapani Pälli <tapani.palli@intel.com>
Tested-by: @joostruis
Tested-by: @Snoucher
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37482>
2025-09-22 20:02:25 +00:00
Caio Oliveira f65fbb23e2 brw: Fix encoding of 3-src dst in Xe2+
Use FD20 macro that will account for the implicit LSB zero value and is
already used for sources.  For the new macro we need to use the entire
bit-range of the field (55-51), so remove the adjustments we used to
do prior to encoding and decoding.

Fixes assertion in vkpeak (https://github.com/nihui/vkpeak) when running
bf16 tests on BMG.  And the code now will correctly apply the subreg_nr
to the destination, e.g. a mad(32) gets splitted into two pieces, the
generation would not fill out the upper-part of the register

```
 mad(16)         g13<1>BF        g10<8,8,1>BF    g12<8,8,1>BF    g56<1,1,1>F { align1 1H A@5 };
-mad(16)         g13<1>BF        g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 };
+mad(16)         g13.16<1>BF     g10.16<8,8,1>BF g12.16<8,8,1>BF g57<1,1,1>F { align1 2H A@5 };
```

Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37236>
2025-09-18 18:21:25 +00:00
Alyssa Rosenzweig 804ced9047 intel: drop legacy flatshade handling
Let mesa/st do the keying instead.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
2025-09-18 14:14:11 +00:00
Alyssa Rosenzweig 36bd06ebab intel: drop clamp_fragment_color handling
This is all dead code since we weren't even seting the cap in iris/crocus!

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
2025-09-18 14:14:11 +00:00
Alyssa Rosenzweig 957f326a10 brw: drop printf info plumbing
unused since printf hashing.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
2025-09-18 14:14:10 +00:00
Alyssa Rosenzweig bbf5bc8632 brw: cleanup int64 option set
Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
2025-09-18 14:14:09 +00:00
Alyssa Rosenzweig 168704c2fe brw: hoist shared options out of the stage loop
ideally we'd have no stage switching, but this is just a cleanup for now.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
2025-09-18 14:14:09 +00:00
Alyssa Rosenzweig 0d7083d5bc brw: drop indirection on compiler options
I see no point, we allocate for every shader stage anyway. This is a bit
simpler.

I'm not a fan of the brw_compiler singleton at all but torching that is not on
today's agenda. Flattening it a little bit very much is.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
2025-09-18 14:14:08 +00:00
Alyssa Rosenzweig 2c161cc35d brw: drop unused brw_kernel code
unused since we dropped GRL.

Signed-off-by: Alyssa Rosenzweig <alyssa.rosenzweig@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37447>
2025-09-18 14:14:07 +00:00
Georg Lehmann 714a149396 nir: remove unsigned upper bound config
All config information is now either in nir->info or nir->options.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37361>
2025-09-16 09:24:04 +00:00
Lionel Landwerlin a69853ce5e brw: improve eot_reg computation in register allocate
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: c4c7ff3f8f ("brw: enable register allocation to deal with multiple EOTs")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37326>
2025-09-16 07:49:07 +00:00
Lionel Landwerlin 1f86a4ee37 brw: remove unused RT write code
With 4fda724fd4 ("brw: Avoid invalid access when compacting
out-of-bounds JIP/UIP") this stuff isn't needed anymore.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: fe38fb858c ("brw: workaround broken indirect RT messages on Gfx11")
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37326>
2025-09-16 07:49:07 +00:00
Francisco Jerez 5c68b351fe intel/brw: Fix regression in brw_allocate_registers() compiling large shaders with throughput==0.
The following Vulkan CTS tests that emit massive shaders were
regressing after "intel/brw/xe3+: Select scheduler heuristic with best
trade-off between register pressure and latency.":

 dEQP-VK.graphicsfuzz.cov-nested-loops-set-struct-data-verify-in-function
 dEQP-VK.graphicsfuzz.cov-dfdx-dfdy-after-nested-loops

The reason is that they have so many nested loops that they cause the
performance analysis utilization estimates to overflow the 32-bit
floating-point variables used to calculate them, which causes our
throughput estimate to underflow and equal zero for those shaders,
which breaks the logic introduced in brw_allocate_registers() to
select the scheduling variant with highest throughput, since none of
the scheduling modes tried has better throughput than the initial
value equal to zero of "best_perf".  Instead use -INFINITY as initial
value for "best_perf" so we always select a scheduling mode.

This should have been caught by CI but oddly the tests above are
showing up as "not run" on my last baseline runs, so this wasn't
flagged as a regression for me.

v2: Use -INFINITY instead of previous approach that used NaN (Ian).

Fixes: 531a34c7dd ("intel/brw/xe3+: Select scheduler heuristic with best trade-off between register pressure and latency.")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13884
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13885
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> (v1)
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37322>
2025-09-15 21:10:47 +00:00
Sushma Venkatesh Reddy 5f10c1a8fb intel/compiler: generalize workaround script name for broader applicability
Renamed brw_nir_trig_workarounds.py to brw_nir_workarounds.py to reflect
its expanded scope beyond just trignometric workarounds.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36990>
2025-09-12 22:32:46 +00:00
Sushma Venkatesh Reddy fe1d84e083 intel/compiler: apply sqrt workaround for Horizon Forbidden West shader
Added a workaround for a known shader in Horizon Forbidden West that causes
visual corruption on Intel anv driver. The fix clamps fsqrt inputs using
fmax(x, 1e-12) to avoid invalid values. Integrated the workaround via
brw_nir_apply_sqrt_workarounds() and applied it conditionally in the Vulkan
pipeline based on the shader's BLAKE3 hash.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12555

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36990>
2025-09-12 22:32:46 +00:00
Georg Lehmann 79d02047b8 intel: switch to new subgroup size info
Reviewed-by: Iván Briano <ivan.briano@intel.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37258>
2025-09-12 21:05:17 +00:00
Georg Lehmann 95c2a65662 nir: remove unused shader_info param in nir_create_shader
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37258>
2025-09-12 21:05:17 +00:00
Caio Oliveira c358842c1d brw: Don't use individual rallocs for each instruction
Move from a single ralloc allocation per instruction to contiguous
blocks of allocations.  Still use ralloc for those large blocks.

Each ralloc allocation has at least 5 pointers of overhead, which would
be about a third of the current brw_inst, and get worse as we try to
pack brw_inst better.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:05 +00:00
Caio Oliveira 2506540566 brw: Repack brw_inst fields
In Release build, goes from 72 to 64 bytes, and now fits
in a single cacheline.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:05 +00:00
Caio Oliveira 8ded571ef4 brw: Allocate only brw_inst for BASE instructions
Now that all the other kinds were added, all transforms to SEND will
come from non-BASE kinds, so we don't need overallocate for BASE
instructions.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:05 +00:00
Caio Oliveira 08c0f33874 brw: Add a generic LOGICAL instruction kind
This kind of instruction doesn't have a special struct but
will still be always allocated so that it can be lowered to SEND.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:05 +00:00
Caio Oliveira df2b5fb03f brw: Add brw_fb_write_inst
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:04 +00:00
Caio Oliveira d06c0a370e brw: Add brw_urb_inst
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:04 +00:00
Caio Oliveira 90967e7b16 brw: Add brw_load_payload_inst
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:03 +00:00
Caio Oliveira 388bac06ce brw: Add brw_dpas_inst
Fixed the types in brw_inst::bits so the struct is packed correctly.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:03 +00:00
Caio Oliveira 09a26526cc brw: Add brw_mem_inst
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:02 +00:00
Caio Oliveira f0f1e63f99 brw: Add brw_tex_inst
Incorporate some "control sources" directly into the instruction.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:02 +00:00
Caio Oliveira 0fcce2722f brw: Add brw_send_inst
Move all the SEND specific fields from brw_inst into brw_send_inst.
This new instruction kind will contain all variants of SENDs plus the
virtual opcodes that were already relying on those SEND fields.

Use the `as_send()` helper to go from a brw_inst into the brw_send_inst
when applicable.  Some of the code was changed to use the brw_send_inst
type directly.

Until other kinds are added, all the instructions are allocated the same
amount of space as brw_send_inst.  This ensures that all
brw_transform_inst() calls are still valid.  This will change after
a few patches so that BASE instructions can use less memory.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:01 +00:00
Caio Oliveira b27f6621ae brw: Add initial support for different instruction kinds
Prepare code for supporting subclasses of brw_inst for certain
specialized kinds of instructions.  This will allow

- Move certain fields from brw_inst to the specialized one,
  reducing its size and making it easy to understand what applies
  to which instruction;
- Move certain control sources into the specialized inst type, which
  currently take a full brw_reg to encode small integers.  Reducing
  the overall sources we walk and care also might help the code
  in general.

Next commits will add the new instruction kinds.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:01 +00:00
Caio Oliveira 339a4e8680 brw: Remove the extra function call when lowering samplers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:00 +00:00
Caio Oliveira 71c23c6722 brw: Add brw_builder::URB_READ and URB_WRITE helpers
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:25:00 +00:00
Caio Oliveira f92116832f brw: Add brw_builder::SEND() helper
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:59 +00:00
Caio Oliveira e194909b3f brw: Add and use brw_transform_inst()
The new function takes care of changing an instruction opcode and sources,
which will allow later patches to tweak how allocations are done in
those cases.  Like the instruction allocation, this also takes a shader
(or a builder, for it to get a shader).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:59 +00:00
Caio Oliveira 5d0160a87f brw: Pass brw_shader in fold_instruction
Will be used later for the general instruction transforming
function.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:58 +00:00
Caio Oliveira 8f16cac492 brw: Allow emit instruction with only number of sources
The emit will allocate the necessary number of sources but will
let the caller fill them in.

Change a couple of places to take advantage of that.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:58 +00:00
Caio Oliveira 3ef86a8d00 brw: Let the builder fill the sources of brw_inst
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:58 +00:00
Caio Oliveira 506fce20f0 brw: Bundle the allocation of brw_inst and its sources
Flatten all the work being done into brw_new_inst() and
brw_clone_inst() and allocate both the instruction and
the sources in one swoop.

For now we still keep a pointer to the array instead of
declaring an array as last element to still allow growing
the array -- which is used by the compiler in a few places.

This commit removes the constructors for brw_inst, the idea
is that the instructions are managed by the brw_shader, so
we always go through it for new ones.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:57 +00:00
Caio Oliveira c81c8c917f brw: Remove builtin sources from brw_inst
A later patch will add a different mechanism to achieve the same
goal.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:57 +00:00
Caio Oliveira 858162a2fc brw: Allocate brw_inst::src with ralloc
In the few cases we have to _increase_ the number of sources, the new
code will not attempt to recollect the memory, i.e. it delays freeing
the old smaller one source array.  For the instructions that may need
this (when making a SEND into a SEND_GATHER), this is not expected to
happen more than once.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:56 +00:00
Caio Oliveira 29c12bbebf brw: Centralize brw_inst allocation
Add and use brw_new_inst() and brw_clone_inst() and do not use stack
allocated brw_insts.  The builder was changed to not use the temporary
ones either.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:56 +00:00
Caio Oliveira c90ec6d7e7 brw: Use uint16_t for size_written
UINT16_MAX is larger than the maximum number of bytes in the
general register file: 256 GRFs * 16 slots * 4 bytes = 16384.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:55 +00:00
Kenneth Graunke 6281a12822 brw: Remove brw_inst::no_dd_check/no_dd_clear
These dependency hints were primarily useful for the vec4 backend, where
it was common to write subsets of a vec4's components across multiple
instructions.  In the scalar backend, we rarely used them.  They also no
longer exist on Tigerlake and later in favor of software scoreboarding.

Dropping this allows us to clean up the IR a bit.

We still use the hardware hints in the generator in a couple places:

   - Gfx9-12.0 scratch headers
   - Quad swizzles
   - Indirect MOV lowering

In theory we might want them back if we moved that lowering to the IR.
For scratch at least, I suspect it won't have a huge impact, as we're
already incurring the cost of spills/fills.  The others are fairly rare
as well, so it may not be worth keeping.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36730>
2025-09-12 00:24:55 +00:00
Caio Oliveira 03e9c01f0c brw: Add and use more brw_validate.cpp macros
Add and use more comparison variants (which provide more detailed print
out of the values), remove old references to "fsv" and "scalar", use
assertion names more similar to GoogleTest that we already use
elsewhere.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37267>
2025-09-10 17:44:38 -07:00
Dylan Baker f18aca8689 intel/brw: Fix implementaiton of |= operator for enum
The current implementation does nothing, since it has no side effects,
only a return value. By passing `x` as a reference we can mutate the
value before returning.

Fixes: df37c7ca74 ("brw: fix analysis dirtying with pulled constants")
CID: 1665293
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37263>
2025-09-10 16:30:19 +00:00
Lionel Landwerlin 33d2c31d7a brw: don't use brw_null_reg() for unused SEND sources
Just avoiding the validation assert.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 47fe9d28e7 ("brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/13777
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37112>
2025-09-10 09:08:27 +03:00
Francisco Jerez 5bf7bb5cf9 intel/brw/xe3+: Re-enable static analysis-based SIMD32 FS heuristic for the moment.
This disables for now the "optimistic" SIMD heuristic that was
implemented for xe3+ and makes it dependent on a debugging option,
instead use the static analysis-based codepath that was used in
previous generations and was extended by previous commits in this MR
to model the xe3 trade-off between register use and thread
parallelism.

The reason is that the main assumption of the optimistic SIMD
heuristic didn't hold up with reality: Real-world testing on PTL shows
that there are many cases where SIMD32 shows performance degradation
relative to SIMD16 despite the ability of xe3 hardware to scale the
GRF file of a thread on demand, unfortunately that scenario seems to
be more pervasive than hoped when the optimistic SIMD heuristic was
implemented pre-silicon.

In many cases what seems to be going on is that even when the register
file is able to scale with the increased register use of SIMD32, the
thread parallelism of the EU is scaled down by a similar factor, so at
the bottom line SIMD32 (depending on the actual ratio of register use
between both variants) may not buy us anything, and it frequently
encounters constraints (like SIMD lowering and less effective
scheduling) that lead to worse codegen than SIMD16, easily tipping the
balance in favor of SIMD16.  The extension of the performance analysis
pass that was done in a previous commit allows the original SIMD32
heuristic to take into account quantitatively this effect, and that
seems pretty effective at disabling SIMD32 shaders that underperform
judging from the statistically significant improvement of most Traci
test-cases that run on my PTL system (4 iterations, 5% significance),
no statistically significant regressions were observed:

Nba2K23-trace-dx11-2160p-ultra:                    10.16% ±0.34%
Superposition-trace-dx11-2160p-extreme:             4.06% ±0.50%
TotalWarWarhammer3-trace-dx11-1080p-high:           3.52% ±0.76%
Payday3-trace-dx11-1440p-ultra:                     2.41% ±0.81%
MetroExodus-trace-dx11-2160p-ultra:                 2.28% ±0.78%
Borderlands3-trace-dx11-2160p-ultra:                1.89% ±0.65%
MountAndBlade2-trace-dx11-1440p-veryhigh:           1.81% ±0.40%
Blackops3-trace-dx11-1080p-high:                    1.66% ±0.29%
HogwartsLegacy-trace-dx12-1080p-ultra:              1.53% ±0.22%
TotalWarPharaoh-trace-dx11-1440p-ultra:             1.44% ±0.31%
Fortnite-trace-dx11-2160p-epix:                     1.44% ±0.27%
Naraka-trace-dx11-1440p-highest:                    1.39% ±0.27%
PubG-trace-dx11-1440p-ultra:                        1.30% ±0.49%
Destiny2-trace-dx11-1440p-highest:                  1.10% ±0.23%
Factorio-trace-1080p-high:                          1.10% ±1.77%
TerminatorResistance-trace-dx11-2160p-ultra:        1.08% ±0.31%
Ghostrunner2-trace-dx11-1440p-ultra:                1.05% ±0.15%
ShadowTombRaider-trace-dx11-2160p-ultra:            0.98% ±0.19%
CitiesSkylines2-trace-dx11-1440p-high:              0.67% ±0.19%
Palworld-trace-dx11-1080p-med:                      0.44% ±0.22%

The downside is that this will reverse the large reduction in
compile-time we gained from the optimistic SIMD heuristic -- The
run-time of both shader-db and fossil-db jump back up by nearly 20%
with this change.  I'm working on a better compromise based on
run-time feedback that will hopefully allow us to preserve the
compile-time benefit of the optimistic heuristic without the reduction
in run-time performance, but in the meantime it seems like the
run-time performance gap from SIMD32 is the more urgent issue to
address since it has an impact on titles across the board.  Despite
the reversal of that compile-time improvement xe3 still achieves
slightly lower compile time on the average than previous generations
as a result of VRT, so this doesn't seem terribly tragic.

v2: Add bit to brw_get_compiler_config_value() (Lionel).

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>
2025-09-10 02:15:58 +00:00