mesa/src/intel/compiler at 35ac517780caff8546bbd4fcff6f6398e231148a - mesa

Files

T

Francisco Jerez 35ac517780 intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode.

This defines a new pre-RA scheduling mode similar to BRW_SCHEDULE_PRE
but more aggressive at optimizing for minimum latency rather than
minimum register usage.  The main motivation is that on recent xe3
platforms we use a register allocation heuristic that packs variables
more tightly at the bottom of the register file instead of the
round-robin heuristic we used on previous platforms, since as a result
of VRT there is a parallelism penalty when a program uses more GRF
registers than necessary.  Unfortunately the xe3 tight-packing
heuristic severely constrains the work of the post-RA scheduler due to
the false dependencies introduced during register allocation, so we
can do a better job by making the scheduler aware of instruction
latencies before the register allocator introduces any false
dependencies.

This can lead to higher register pressure, but only when the scheduler
decides it could save cycles by extending a live range.  It makes
sense to preserve the preexisting BRW_SCHEDULE_PRE as a separate mode
since some workloads can still benefit from neglecting latencies
pre-RA due to the trade-off mentioned between parallelism and GRF use,
a future commit will introduce a more accurate estimate of the
expected relative performance of BRW_SCHEDULE_PRE
vs. BRW_SCHEDULE_PRE_LATENCY taking into account this trade-off.

In theory this could also be helpful on earlier pre-xe3 platforms, but
the benefit should be significantly smaller due to the different RA
heuristic so it hasn't been tested extensively pre-xe3.

The following Traci tests are improved significantly by this change on
PTL (nearly all tests that run on my system are affected positively):

Ghostrunner2-trace-dx11-1440p-ultra:                7.12% ±0.36%
SpaceEngineers-trace-dx11-2160p-high:               5.77% ±0.43%
HogwartsLegacy-trace-dx12-1080p-ultra:              4.40% ±0.03%
Naraka-trace-dx11-1440p-highest:                    3.06% ±0.43%
MetroExodus-trace-dx11-2160p-ultra:                 2.26% ±0.60%
Fortnite-trace-dx11-2160p-epix:                     2.12% ±0.53%
Nba2K23-trace-dx11-2160p-ultra:                     1.98% ±0.30%
Control-trace-dx11-1440p-high:                      1.93% ±0.36%
GodOfWar-trace-dx11-2160p-ultra:                    1.62% ±0.47%
TotalWarPharaoh-trace-dx11-1440p-ultra:             1.55% ±0.18%
MountAndBlade2-trace-dx11-1440p-veryhigh:           1.51% ±0.37%
Destiny2-trace-dx11-1440p-highest:                  1.44% ±0.34%
GtaV-trace-dx11-2160p-ultra:                        1.26% ±0.27%
ShadowTombRaider-trace-dx11-2160p-ultra:            1.10% ±0.58%
Borderlands3-trace-dx11-2160p-ultra:                0.95% ±0.43%
TerminatorResistance-trace-dx11-2160p-ultra:        0.87% ±0.22%
BaldursGate3-trace-dx11-1440p-ultra:                0.84% ±0.28%
CitiesSkylines2-trace-dx11-1440p-high:              0.82% ±0.22%
PubG-trace-dx11-1440p-ultra:                        0.72% ±0.37%
Palworld-trace-dx11-1080p-med:                      0.71% ±0.26%
Superposition-trace-dx11-2160p-extreme:             0.69% ±0.19%

The compile-time cost of shader-db increases significantly by 1.85%
after this commit (14 iterations, 5% significance), the compile-time
of fossil-db doesn't change significantly in my setup.

v2: Addressed interaction with 81594d0db1,
    since the code that calculates deps, delays and exits is no longer
    mode-independent after this change.  Instead of reverting that
    commit (which is non-trivial and would have a greater compile-time
    hit) simply reconstruct the scheduler object during the transition
    between BRW_SCHEDULE_PRE_LATENCY and any other PRE mode that
    doesn't require instruction latencies.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36618>

2025-09-10 02:15:55 +00:00

elk

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

tests

intel/compiler tests: fix path-to-string conversion

2025-06-23 08:26:29 +00:00

brw_analysis_def.cpp

brw: consider LOAD_PAYLOAD fully defined

2025-07-30 07:57:19 +00:00

brw_analysis_liveness.cpp

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_analysis_performance.cpp

brw: Rename is_send_from_grf to is_send, replace other is_send() helper

2025-08-08 22:12:05 +00:00

brw_analysis.cpp

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_analysis.h

brw: fix analysis dirtying with pulled constants

2025-08-21 09:04:53 +00:00

brw_asm_internal.h

brw: Rework label tracking in assembler

2025-03-06 17:06:20 -08:00

brw_asm_tool.c

intel/compiler tests: fix variable type for getopt_long() return value

2025-06-23 08:26:29 +00:00

brw_asm.c

brw: Add FILE * parameter to dump_assembly

2025-09-09 10:40:42 -07:00

brw_asm.h

brw: Fix size in assembler when compacting

2025-03-03 20:43:56 +00:00

brw_builder.cpp

brw: Add brw_builder::uniform()

2025-04-04 23:07:21 +00:00

brw_builder.h

brw: fix broadcast opcode

2025-08-28 00:23:44 +03:00

brw_cfg.cpp

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_cfg.h

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_compile_bs.cpp

intel/brw: Take shader in the brw_generator::generate_code() parameters

2025-08-28 00:06:20 +00:00

brw_compile_cs.cpp

intel/brw: Take shader in the brw_generator::generate_code() parameters

2025-08-28 00:06:20 +00:00

brw_compile_fs.cpp

intel/brw: Take shader in the brw_generator::generate_code() parameters

2025-08-28 00:06:20 +00:00

brw_compile_gs.cpp

anv/brw/iris: move VS VUE computation to backend

2025-09-05 07:46:16 +00:00

brw_compile_mesh.cpp

intel/brw: Take shader in the brw_generator::generate_code() parameters

2025-08-28 00:06:20 +00:00

brw_compile_tcs.cpp

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

brw_compile_tes.cpp

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

brw_compile_vs.cpp

anv/brw/iris: move VS VUE computation to backend

2025-09-05 07:46:16 +00:00

brw_compiler.c

all: rename gl_shader_stage to mesa_shader_stage

2025-08-06 10:28:40 +08:00

brw_compiler.h

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

brw_debug_recompile.c

all: rename gl_shader_stage to mesa_shader_stage

2025-08-06 10:28:40 +08:00

brw_device_sha1_gen_c.py

intel/compiler: drop unused ray-tracing fields from cache hash

2024-03-22 00:01:28 +00:00

brw_disasm_info.cpp

brw: Add FILE * parameter to dump_assembly

2025-09-09 10:40:42 -07:00

brw_disasm_info.h

brw: Add FILE * parameter to dump_assembly

2025-09-09 10:40:42 -07:00

brw_disasm_tool.c

intel/brw: Remove Gfx8- code from disassembler

2024-02-28 05:45:38 +00:00

brw_disasm.c

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_disasm.h

intel/brw: support for dumping shader line numbers

2025-04-08 19:39:53 +00:00

brw_eu_compact.c

brw: Avoid invalid access when compacting out-of-bounds JIP/UIP

2025-08-20 00:54:41 +00:00

brw_eu_defines.h

Revert "brw: move texture offset packing to NIR"

2025-08-29 06:29:14 +00:00

brw_eu_emit.c

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_eu_inst.h

brw: Add BRW_TYPE_BF for bfloat16

2025-03-25 05:23:37 +00:00

brw_eu_validate.c

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_eu.c

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_eu.h

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_from_nir.cpp

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

brw_generator.cpp

brw: Add FILE * parameter to dump_assembly

2025-09-09 10:40:42 -07:00

brw_generator.h

intel/brw: Take shader in the brw_generator::generate_code() parameters

2025-08-28 00:06:20 +00:00

brw_gram.y

brw: Add EU assembler support for bfloat16

2025-03-25 05:23:37 +00:00

brw_inst.cpp

brw: fix broadcast opcode

2025-08-28 00:23:44 +03:00

brw_inst.h

brw: workaround broken indirect RT messages on Gfx11

2025-08-20 15:01:50 +00:00

brw_isa_info.h

intel/compiler: Use #pragma once instead of header guards

2024-12-11 19:47:44 +00:00

brw_kernel.c

intel: Update all NIR_PASS_V to NIR_PASS

2025-07-14 19:25:52 +00:00

brw_kernel.h

intel: rework CL pre-compile

2025-01-25 03:28:07 +00:00

brw_lex.l

brw: Add EU assembler support for bfloat16

2025-03-25 05:23:37 +00:00

brw_list.h

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_load_reg.cpp

brw: Add and use brw_reg_is_arf to test for a specific ARF

2025-07-24 23:08:07 +00:00

brw_lower_dpas.cpp

brw: Simplify brw_builder "insert before inst" constructor

2025-03-06 23:33:38 +00:00

brw_lower_integer_multiplication.cpp

brw: Remove bblock_t parameters from various passes

2025-03-06 23:33:38 +00:00

brw_lower_logical_sends.cpp

Revert "brw: move texture offset packing to NIR"

2025-08-29 06:29:14 +00:00

brw_lower_pack.cpp

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_lower_regioning.cpp

brw: Rename is_send_from_grf to is_send, replace other is_send() helper

2025-08-08 22:12:05 +00:00

brw_lower_scoreboard.cpp

brw: Rename is_send_from_grf to is_send, replace other is_send() helper

2025-08-08 22:12:05 +00:00

brw_lower_simd_width.cpp

brw: Use a builder to track position in lower_simd

2025-07-19 17:49:48 +00:00

brw_lower_subgroup_ops.cpp

brw: Strategically place flags initialization to help cmod prop

2025-08-28 22:08:20 +00:00

brw_lower.cpp

brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many

2025-08-08 22:12:08 +00:00

brw_nir_analyze_ubo_ranges.c

intel/compiler: take reg_unit size into account with ubo ranges

2025-01-07 21:38:06 +00:00

brw_nir_lower_alpha_to_coverage.c

nir: rename nir_lower_io_to_temporaries -> nir_lower_io_vars_to_temporaries

2025-06-26 18:20:54 +00:00

brw_nir_lower_cooperative_matrix.c

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_nir_lower_cs_intrinsics.c

all: rename gl_shader_stage_uses_workgroup to mesa_shader_stage_uses_workgroup

2025-08-06 10:28:41 +08:00

brw_nir_lower_fs_barycentrics.c

treewide: simplify nir_def_rewrite_uses_after

2025-08-01 15:34:24 +00:00

brw_nir_lower_fsign.py

intel/brw: Use range analysis to optimize fsign

2024-05-14 01:28:21 +00:00

brw_nir_lower_immediate_offsets.c

treewide: use nir_def_as_*

2025-08-01 15:34:24 +00:00

brw_nir_lower_intersection_shader.c

nir: make nir_block::predecessors & dom_frontier sets non-malloc'd

2025-08-21 06:13:48 +00:00

brw_nir_lower_ray_queries.c

intel/compiler: Fix ray geometry index

2025-08-19 09:32:55 +00:00

brw_nir_lower_rt_intrinsics_pre_trace.c

nir: Add a faster lowest common ancestor algorithm

2025-09-08 23:03:13 +00:00

brw_nir_lower_rt_intrinsics.c

intel/compiler: Fix ray geometry index

2025-08-19 09:32:55 +00:00

brw_nir_lower_sample_index_in_coord.c

intel/compiler: Lower sample index into coord for MSRT messages

2025-03-07 23:06:14 +00:00

brw_nir_lower_shader_calls.c

nir: make nir_block::predecessors & dom_frontier sets non-malloc'd

2025-08-21 06:13:48 +00:00

brw_nir_lower_storage_image.c

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_nir_lower_texel_address.c

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_nir_lower_texture.c

Revert "brw: move texture offset packing to NIR"

2025-08-29 06:29:14 +00:00

brw_nir_opt_fsat.c

nir: convert nir_instr_worklist to init/fini semantics w/out allocation

2025-08-21 06:13:49 +00:00

brw_nir_rt_builder.h

intel/rt: Update BVH instance leaf load for Xe3+

2025-04-21 20:10:45 +00:00

brw_nir_rt.c

all: rename gl_shader_stage to mesa_shader_stage

2025-08-06 10:28:40 +08:00

brw_nir_rt.h

intel: Update all NIR_PASS_V to NIR_PASS

2025-07-14 19:25:52 +00:00

brw_nir_trig_workarounds.py

…

brw_nir_wa_18019110168.c

treewide: use nir_def_as_*

2025-08-01 15:34:24 +00:00

brw_nir.c

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

brw_nir.h

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

brw_opt_address_reg_load.cpp

brw: Fix checking sources of wrong instruction in opt_address_reg_load

2025-08-27 22:50:23 +00:00

brw_opt_algebraic.cpp

brw: Fix folding case for MAD instruction with all immediates

2025-08-21 17:19:18 +00:00

brw_opt_bank_conflicts.cpp

util: crib SWAP macro from freedreno

2025-07-21 11:42:18 +00:00

brw_opt_cmod_propagation.cpp

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_opt_combine_constants.cpp

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_opt_copy_propagation.cpp

brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many

2025-08-08 22:12:08 +00:00

brw_opt_cse.cpp

brw: Stop using is_send_from_grf() in CSE pass

2025-08-08 22:12:05 +00:00

brw_opt_dead_code_eliminate.cpp

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_opt_register_coalesce.cpp

brw: enable opt_register_coalesce to work with multiple EOT blocks

2025-08-20 15:01:50 +00:00

brw_opt_saturate_propagation.cpp

brw: Clean up saturate propagation after non-defs version removal

2025-04-09 19:06:48 +00:00

brw_opt_txf_combiner.cpp

brw: Add more specific brw_builder helpers

2025-07-19 17:49:47 +00:00

brw_opt_virtual_grfs.cpp

brw: Don't assert about MAX_VGRF_SIZE in brw_opt_split_virtual_grfs()

2025-04-11 20:34:51 +00:00

brw_opt.cpp

brw: Do cmod prop again after brw_lower_subgroup_ops

2025-08-28 22:08:20 +00:00

brw_packed_float.c

…

brw_prim.h

intel/compiler: Use #pragma once instead of header guards

2024-12-11 19:47:44 +00:00

brw_print.cpp

intel: fork exec_node/list -> brw_exec_node/list as a private Intel utility

2025-07-31 20:23:02 +00:00

brw_private.h

intel/debug: shader dump filter

2025-05-23 19:57:02 +00:00

brw_reg_allocate.cpp

brw: fix INTEL_DEBUG=spill_fs

2025-08-27 15:08:35 +00:00

brw_reg_type.c

brw: Add BRW_TYPE_BF for bfloat16

2025-03-25 05:23:37 +00:00

brw_reg_type.h

brw: Add BRW_TYPE_BF for bfloat16

2025-03-25 05:23:37 +00:00

brw_reg.cpp

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_reg.h

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_rt.h

intel/compiler: Use #pragma once instead of header guards

2024-12-11 19:47:44 +00:00

brw_schedule_instructions.cpp

intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode.

2025-09-10 02:15:55 +00:00

brw_shader.cpp

intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode.

2025-09-10 02:15:55 +00:00

brw_shader.h

intel/brw/xe3+: Define BRW_SCHEDULE_PRE_LATENCY scheduling mode.

2025-09-10 02:15:55 +00:00

brw_simd_selection.cpp

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

brw_spirv.c

nir: add nir_vectorize_cb callback parameter to nir_lower_phis_to_scalar()

2025-07-08 15:33:59 +00:00

brw_thread_payload.cpp

all: rename gl_shader_stage_is_compute to mesa_shader_stage_is_compute

2025-08-06 10:28:41 +08:00

brw_thread_payload.h

intel/brw: Rename fs_visitor to brw_shader

2025-02-11 09:13:28 +00:00

brw_validate.cpp

brw: Run validation as soon as we have the CFG around

2025-09-03 20:42:05 +00:00

brw_vue_map.c

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

brw_workaround.cpp

brw: Rename is_send_from_grf to is_send, replace other is_send() helper

2025-08-08 22:12:05 +00:00

intel_gfx_ver_enum.h

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

intel_nir_blockify_uniform_loads.c

treewide: simplify nir_def_rewrite_uses_after

2025-08-01 15:34:24 +00:00

intel_nir_clamp_image_1d_2d_array_sizes.c

treewide: simplify nir_def_rewrite_uses_after

2025-08-01 15:34:24 +00:00

intel_nir_clamp_per_vertex_loads.c

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

intel_nir_lower_non_uniform_barycentric_at_sample.c

treewide: use nir_def_as_*

2025-08-01 15:34:24 +00:00

intel_nir_lower_non_uniform_resource_intel.c

treewide: use nir_metadata_control_flow

2024-06-17 16:28:14 -04:00

intel_nir_lower_printf.c

nir: drop printf_base_identifier

2025-02-05 20:33:15 +00:00

intel_nir_lower_shading_rate_output.c

treewide: simplify nir_def_rewrite_uses_after

2025-08-01 15:34:24 +00:00

intel_nir_lower_sparse.c

treewide: simplify nir_def_rewrite_uses_after

2025-08-01 15:34:24 +00:00

intel_nir_opt_peephole_ffma.c

treewide: use nir_def_as_*

2025-08-01 15:34:24 +00:00

intel_nir_opt_peephole_imul32x16.c

treewide: use nir_metadata_control_flow

2024-06-17 16:28:14 -04:00

intel_nir_tcs_workarounds.c

nir: make nir_block::predecessors & dom_frontier sets non-malloc'd

2025-08-21 06:13:48 +00:00

intel_nir.c

intel/compiler: Use nir_split_conversions()

2025-04-07 17:45:21 -05:00

intel_nir.h

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

intel_shader_enums.h

brw: add support for separate tessellation shader compilation

2025-09-05 07:46:17 +00:00

meson.build

brw: replace lower_fs_msaa with nir_inline_sysval

2025-08-03 21:27:47 +00:00

test_eu_compact.cpp

build: avoid redefining unreachable() which is standard in C23

2025-07-31 17:49:42 +00:00

test_eu_validate.cpp

brw: Add FILE * parameter to dump_assembly

2025-09-09 10:40:42 -07:00

test_helpers.cpp

brw: Simplify the test code for brw passes

2025-03-13 17:43:17 +00:00

test_helpers.h

brw: Add brw_shader_params

2025-08-28 00:06:18 +00:00

test_insert_load_reg.cpp

brw: Add passes to generate and lower load_reg

2025-04-04 06:45:02 +00:00

test_lower_scoreboard.cpp

brw: Enumerate SHADER_OPCODE_SEND sources and standardize how many

2025-08-08 22:12:08 +00:00

test_opt_algebraic.cpp

brw: Fix folding case for MAD instruction with all immediates

2025-08-21 17:19:18 +00:00

test_opt_cmod_propagation.cpp

brw/cmod: Don't propagate from CMP to possible Inf + (-Inf)

2025-04-28 19:44:23 +00:00

test_opt_combine_constants.cpp

brw: Add brw_builder::uniform()

2025-04-04 23:07:21 +00:00

test_opt_copy_propagation.cpp

brw: Simplify the test code for brw passes

2025-03-13 17:43:17 +00:00

test_opt_cse.cpp

brw: Simplify the test code for brw passes

2025-03-13 17:43:17 +00:00

test_opt_register_coalesce.cpp

brw: don't generate invalid instructions

2025-06-04 06:08:26 +00:00

test_opt_saturate_propagation.cpp

brw/sat: Eliminate non-defs saturate propagation

2025-04-04 06:45:02 +00:00

test_simd_selection.cpp

intel: Switch uint64_t intel_debug to a bitset

2025-04-22 23:09:26 +00:00

test_vf_float_conversions.cpp

…