Files
mesa/src
Ian Romanick 0f3a350087 brw/nir: Don't generate scalar byte to float conversions on DG2+ in optimize_extract_to_float
The lowering code does not generate efficient code. It is better to
just not emit the bad thing in the first place. The shaders that I
examined had blocks of NIR like:

    con 32     %527 = extract_u8 %456.o, %5 (0x0)
    con 32     %528 = extract_u8 %456.o, %35 (0x1)
    con 32     %529 = extract_u8 %456.o, %14 (0x2)
    con 32     %530 = extract_u8 %456.o, %11 (0x3)
    con 32     %531 = u2f32 %527
    con 32     %532 = u2f32 %528
    con 32     %533 = u2f32 %529
    con 32     %534 = u2f32 %530

In some cases the u2f results are multiplied with 1/255. There may be
a slightly more efficient way to do this by doing something like

    mov(8)    g40<1>UW        g12.1<32,8,4>UB
    mov(8)    g41<1>UW        g12.2<32,8,4>UB
    mov(8)    g42<1>UW        g12.3<32,8,4>UB
    mov(8)    g60<1>F         g12<32,8,4>UB
    mov(8)    g61<1>F         g40<1,1,0>UW
    mov(8)    g62<1>F         g41<1,1,0>UW
    mov(8)    g63<1>F         g42<1,1,0>UW

In SIMD16 and SIMD32 that would save temporary register space. It could
save a register in SIMD8 by using g40.8 instead of g42. Making that
happen might be tricky. Maybe we should just add a special NIR opcode
that converts a packed uint32 to a vec4?

v2: Add a bunch of documentation explaining what's going on. Suggested
by Ken.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 18228689 -> 18228720 (<.01%)
instructions in affected programs: 43091 -> 43122 (0.07%)
helped: 0 / HURT: 30

total cycles in shared programs: 932542994 -> 932544290 (<.01%)
cycles in affected programs: 8150758 -> 8152054 (0.02%)
helped: 15 / HURT: 17

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 142890605 -> 142890392 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21655049536 -> 21654693720 (-0.00%); split: -0.00%, +0.00%

Totals from 181 (0.03% of 553251) affected shaders:
Instrs: 188022 -> 187809 (-0.11%); split: -0.12%, +0.01%
Cycle count: 85291658 -> 84935842 (-0.42%); split: -0.47%, +0.05%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 154438050 -> 154436980 (-0.00%)
Cycle count: 15334650326 -> 15334644375 (-0.00%); split: -0.00%, +0.00%
Spill count: 56754 -> 56706 (-0.08%)
Fill count: 95919 -> 95808 (-0.12%)
Scratch Memory Size: 2306048 -> 2304000 (-0.09%)
Max live registers: 32469924 -> 32469899 (-0.00%)

Totals from 112 (0.02% of 642922) affected shaders:
Instrs: 156186 -> 155116 (-0.69%)
Cycle count: 11111478 -> 11105527 (-0.05%); split: -0.62%, +0.56%
Spill count: 1766 -> 1718 (-2.72%)
Fill count: 2815 -> 2704 (-3.94%)
Scratch Memory Size: 78848 -> 76800 (-2.60%)
Max live registers: 11526 -> 11501 (-0.22%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
..
2024-12-17 23:14:26 +00:00
2024-11-26 20:45:41 +00:00