Commit Graph

12992 Commits

Author SHA1 Message Date
Matt Turner
a2c4a34303 anv: Align anv_descriptor_pool::host_mem
Otherwise anv_descriptor_set is accessed through an unaligned pointer,
which is undefined behavior in C.

```
anv_descriptor_set.c:1620:17: runtime error: member access within misaligned address 0x61900002c2b5
               for type 'struct anv_descriptor_set', which requires 8 byte alignment 0x61900002c2b5
```

Fixes: 2570a58bcd ("anv: Implement descriptor pools")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32070>
2024-11-11 19:45:14 +00:00
Jianxun Zhang
8906816f49 anv,hasvk,genxml: Rename genxml files using verx10
It could be confusing that a newer platform named with a smaller
number than a half-generation of an older platform like 'gfx20' and
'gfx75' in xml files.

Down the road, it can be a little worse once we pass something like
'gfx40' when there is already a gfx45.xml for the oldest platform.

Unify naming xml files with verx10 numbers to resolve the issue.

Signed-off-by: Jianxun Zhang <jianxun.zhang@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Jordan Justen <jordan.l.justen@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31943>
2024-11-09 00:04:47 +00:00
Iván Briano
aee04bf4fb intel/rt: fix ray_query stack address calculation
While the documentation says to use NUM_SIMD_LANES_PER_DSS for the stack
address calculation, what the HW actually uses is
NUM_SYNC_STACKID_PER_DSS. The former may vary depending on the platform,
while the latter is fixed to 2048 for all current platforms.

Fixes: 6c84cbd8c9 ("intel/dev/xe: Set max_eus_per_subslice using topology query")

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32049>
2024-11-08 18:31:52 +00:00
Ian Romanick
7aad19ccd2 brw/lower: Lower invalid source conversion to better code
There are two fragment shaders from RDR2 that is hurt for spills and
fills on Lunar Lake.

    Totals from 2 (0.00% of 551413) affected shaders:
    Spill count: 1252 -> 1317 (+5.19%)
    Fill count: 2518 -> 2642 (+4.92%)

Those shaders... have a lot of room for improvement. There are some
patterns in those shaders that we handle very, very poorly. Improving
those patterns would likely improve the spills and fills in these
shaders quite dramatically.

Given how much other platforms are helped, I don't this should block
this commit.

No shader-db or fossil-db changes on any pre-Gfx12.5 Intel platforms.

v2: Add some comments and an additional assertion. Suggested by Ken.

shader-db:

Lunar Lake
total instructions in shared programs: 18094517 -> 18094511 (<.01%)
instructions in affected programs: 809 -> 803 (-0.74%)
helped: 6 / HURT: 0

total cycles in shared programs: 921532158 -> 921532168 (<.01%)
cycles in affected programs: 2266 -> 2276 (0.44%)
helped: 0 / HURT: 3

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19820845 -> 19820839 (<.01%)
instructions in affected programs: 803 -> 797 (-0.75%)
helped: 6 / HURT: 0

total cycles in shared programs: 906372999 -> 906372949 (<.01%)
cycles in affected programs: 3216 -> 3166 (-1.55%)
helped: 6 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 141887377 -> 141884465 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21990301498 -> 21990267232 (-0.00%); split: -0.00%, +0.00%
Spill count: 69732 -> 69797 (+0.09%)
Fill count: 128521 -> 128645 (+0.10%)

Totals from 349 (0.06% of 551413) affected shaders:
Instrs: 506117 -> 503205 (-0.58%); split: -0.79%, +0.21%
Cycle count: 32362996 -> 32328730 (-0.11%); split: -0.52%, +0.41%
Spill count: 1951 -> 2016 (+3.33%)
Fill count: 4899 -> 5023 (+2.53%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152773732 -> 152761383 (-0.01%); split: -0.01%, +0.00%
Cycle count: 17187529968 -> 17187450663 (-0.00%); split: -0.00%, +0.00%
Spill count: 79279 -> 79003 (-0.35%)
Fill count: 148803 -> 147942 (-0.58%)
Scratch Memory Size: 3949568 -> 3946496 (-0.08%)
Max live registers: 31879325 -> 31879230 (-0.00%)

Totals from 366 (0.06% of 633185) affected shaders:
Instrs: 557377 -> 545028 (-2.22%); split: -2.22%, +0.01%
Cycle count: 26171205 -> 26091900 (-0.30%); split: -0.54%, +0.24%
Spill count: 3238 -> 2962 (-8.52%)
Fill count: 10018 -> 9157 (-8.59%)
Scratch Memory Size: 257024 -> 253952 (-1.20%)
Max live registers: 28187 -> 28092 (-0.34%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
2a57568ebd brw/build: Add scalar_group() helper
Some uses of the old pattern still exist. The use in brw_fs_nir.cpp is
deleted by commits !29884. The use in brw_lower_logical_sends.cpp seems
different, so I decided to keep it.

The next commit wants to use this.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
5dfea87623 brw/opt: Always do both kinds of copy propagation before lower_load_payload
shader-db:

All Intel platforms except Skylake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18092932 -> 18092713 (<.01%)
instructions in affected programs: 139290 -> 139071 (-0.16%)
helped: 103
HURT: 18
helped stats (abs) min: 1 max: 8 x̄: 2.43 x̃: 2
helped stats (rel) min: 0.02% max: 9.09% x̄: 0.73% x̃: 0.29%
HURT stats (abs)   min: 1 max: 5 x̄: 1.72 x̃: 1
HURT stats (rel)   min: 0.02% max: 0.55% x̄: 0.10% x̃: 0.08%
95% mean confidence interval for instructions value: -2.17 -1.45
95% mean confidence interval for instructions %-change: -0.83% -0.38%
Instructions are helped.

total cycles in shared programs: 922792268 -> 921495900 (-0.14%)
cycles in affected programs: 400296984 -> 399000616 (-0.32%)
helped: 765
HURT: 635
helped stats (abs) min: 2 max: 77018 x̄: 6739.33 x̃: 60
helped stats (rel) min: <.01% max: 35.59% x̄: 1.98% x̃: 0.32%
HURT stats (abs)   min: 2 max: 88658 x̄: 6077.51 x̃: 152
HURT stats (rel)   min: <.01% max: 51.33% x̄: 2.75% x̃: 0.63%
95% mean confidence interval for cycles value: -1620.41 -231.54
95% mean confidence interval for cycles %-change: -0.10% 0.44%
Inconclusive result (%-change mean confidence interval includes 0).

LOST:   4
GAINED: 3

Skylake
total instructions in shared programs: 18658324 -> 18579715 (-0.42%)
instructions in affected programs: 2089957 -> 2011348 (-3.76%)
helped: 9842
HURT: 23
helped stats (abs) min: 1 max: 24 x̄: 7.99 x̃: 8
helped stats (rel) min: 0.05% max: 40.00% x̄: 5.37% x̃: 4.52%
HURT stats (abs)   min: 1 max: 5 x̄: 1.57 x̃: 1
HURT stats (rel)   min: 0.02% max: 1.28% x̄: 0.36% x̃: 0.24%
95% mean confidence interval for instructions value: -7.98 -7.95
95% mean confidence interval for instructions %-change: -5.43% -5.29%
Instructions are helped.

total cycles in shared programs: 860031654 -> 860237548 (0.02%)
cycles in affected programs: 449175235 -> 449381129 (0.05%)
helped: 7895
HURT: 4416
helped stats (abs) min: 1 max: 14129 x̄: 113.70 x̃: 22
helped stats (rel) min: <.01% max: 40.95% x̄: 1.31% x̃: 0.56%
HURT stats (abs)   min: 1 max: 33397 x̄: 249.89 x̃: 34
HURT stats (rel)   min: <.01% max: 67.47% x̄: 2.65% x̃: 0.65%
95% mean confidence interval for cycles value: 1.46 31.98
95% mean confidence interval for cycles %-change: 0.02% 0.19%
Cycles are HURT.

LOST:   557
GAINED: 900

fossil-db:

Lunar Lake
Totals:
Instrs: 141933621 -> 141884681 (-0.03%); split: -0.03%, +0.00%
Cycle count: 21990657282 -> 21990200212 (-0.00%); split: -0.14%, +0.14%
Spill count: 69754 -> 69732 (-0.03%); split: -0.05%, +0.02%
Fill count: 128559 -> 128521 (-0.03%); split: -0.05%, +0.02%
Scratch Memory Size: 5934080 -> 5925888 (-0.14%)
Max live registers: 48021653 -> 48051253 (+0.06%); split: -0.00%, +0.06%

Totals from 13510 (2.45% of 551410) affected shaders:
Instrs: 19497180 -> 19448240 (-0.25%); split: -0.25%, +0.00%
Cycle count: 2455370202 -> 2454913132 (-0.02%); split: -1.25%, +1.23%
Spill count: 10975 -> 10953 (-0.20%); split: -0.32%, +0.12%
Fill count: 21709 -> 21671 (-0.18%); split: -0.28%, +0.10%
Scratch Memory Size: 674816 -> 666624 (-1.21%)
Max live registers: 2502653 -> 2532253 (+1.18%); split: -0.01%, +1.19%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 152763523 -> 152772716 (+0.01%); split: -0.00%, +0.01%
Cycle count: 17188701887 -> 17187510768 (-0.01%); split: -0.10%, +0.09%
Spill count: 79280 -> 79279 (-0.00%); split: -0.00%, +0.00%
Fill count: 148809 -> 148803 (-0.00%)
Max live registers: 31879240 -> 31879093 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5559984 -> 5559712 (-0.00%); split: +0.00%, -0.01%

Totals from 20524 (3.24% of 633183) affected shaders:
Instrs: 20366964 -> 20376157 (+0.05%); split: -0.01%, +0.05%
Cycle count: 2406162382 -> 2404971263 (-0.05%); split: -0.68%, +0.63%
Spill count: 19935 -> 19934 (-0.01%); split: -0.02%, +0.01%
Fill count: 34487 -> 34481 (-0.02%)
Max live registers: 1745598 -> 1745451 (-0.01%); split: -0.01%, +0.01%
Max dispatch width: 117992 -> 117720 (-0.23%); split: +0.03%, -0.26%

Tiger Lake and Ice Lake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150694108 -> 150683859 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15526754059 -> 15529031079 (+0.01%); split: -0.10%, +0.12%
Max live registers: 31791599 -> 31791441 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5569488 -> 5569296 (-0.00%); split: +0.00%, -0.01%

Totals from 15000 (2.37% of 632406) affected shaders:
Instrs: 10965577 -> 10955328 (-0.09%); split: -0.11%, +0.02%
Cycle count: 2025347115 -> 2027624135 (+0.11%); split: -0.80%, +0.91%
Max live registers: 983373 -> 983215 (-0.02%); split: -0.02%, +0.00%
Max dispatch width: 83064 -> 82872 (-0.23%); split: +0.12%, -0.35%

Skylake
Totals:
Instrs: 140588784 -> 140413758 (-0.12%); split: -0.13%, +0.00%
Cycle count: 14724286265 -> 14723402393 (-0.01%); split: -0.04%, +0.04%
Fill count: 100130 -> 100129 (-0.00%)
Max live registers: 31418029 -> 31417146 (-0.00%); split: -0.00%, +0.00%
Max dispatch width: 5513400 -> 5535192 (+0.40%); split: +0.89%, -0.49%

Totals from 39733 (6.35% of 625986) affected shaders:
Instrs: 17240737 -> 17065711 (-1.02%); split: -1.02%, +0.01%
Cycle count: 1994668203 -> 1993784331 (-0.04%); split: -0.31%, +0.27%
Fill count: 44481 -> 44480 (-0.00%)
Max live registers: 2766781 -> 2765898 (-0.03%); split: -0.03%, +0.00%
Max dispatch width: 210600 -> 232392 (+10.35%); split: +23.23%, -12.89%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
be26012f1d brw/opt: Always do copy prop, DCE, and register coalesce after lower_regioning
shader-db:

Lunar Lake
total instructions in shared programs: 18100289 -> 18083853 (-0.09%)
instructions in affected programs: 790048 -> 773612 (-2.08%)
helped: 3058 / HURT: 1

total cycles in shared programs: 921691992 -> 921293816 (-0.04%)
cycles in affected programs: 37210762 -> 36812586 (-1.07%)
helped: 2329 / HURT: 624

LOST:   27
GAINED: 26

Meteor Lake, DG2, Tiger Lake, and Ice Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19825635 -> 19821391 (-0.02%)
instructions in affected programs: 138675 -> 134431 (-3.06%)
helped: 877 / HURT: 0

total cycles in shared programs: 907900598 -> 907885713 (<.01%)
cycles in affected programs: 7127161 -> 7112276 (-0.21%)
helped: 318 / HURT: 242

total spills in shared programs: 5790 -> 5758 (-0.55%)
spills in affected programs: 660 -> 628 (-4.85%)
helped: 8 / HURT: 0

total fills in shared programs: 6744 -> 6712 (-0.47%)
fills in affected programs: 708 -> 676 (-4.52%)
helped: 8 / HURT: 0

LOST:   10
GAINED: 0

Skylake
total instructions in shared programs: 18722197 -> 18637637 (-0.45%)
instructions in affected programs: 2757553 -> 2672993 (-3.07%)
helped: 12290 / HURT: 1

total cycles in shared programs: 859716039 -> 859432560 (-0.03%)
cycles in affected programs: 113731837 -> 113448358 (-0.25%)
helped: 9555 / HURT: 2422

LOST:   265
GAINED: 714

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 142000618 -> 141928331 (-0.05%); split: -0.05%, +0.00%
Subgroup size: 10995136 -> 10995072 (-0.00%)
Cycle count: 21994723230 -> 21990481140 (-0.02%); split: -0.08%, +0.06%
Spill count: 69911 -> 69754 (-0.22%); split: -0.23%, +0.00%
Fill count: 128723 -> 128559 (-0.13%); split: -0.15%, +0.02%
Scratch Memory Size: 5936128 -> 5934080 (-0.03%)
Max live registers: 48006880 -> 48020936 (+0.03%); split: -0.01%, +0.04%

Totals from 17450 (3.16% of 551410) affected shaders:
Instrs: 14984149 -> 14911862 (-0.48%); split: -0.48%, +0.00%
Subgroup size: 365744 -> 365680 (-0.02%)
Cycle count: 2585095128 -> 2580853038 (-0.16%); split: -0.71%, +0.54%
Spill count: 20893 -> 20736 (-0.75%); split: -0.76%, +0.00%
Fill count: 44181 -> 44017 (-0.37%); split: -0.44%, +0.07%
Scratch Memory Size: 995328 -> 993280 (-0.21%)
Max live registers: 2378069 -> 2392125 (+0.59%); split: -0.20%, +0.79%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 150719758 -> 150676269 (-0.03%); split: -0.04%, +0.01%
Subgroup size: 7764560 -> 7764632 (+0.00%)
Cycle count: 15526689814 -> 15525687740 (-0.01%); split: -0.03%, +0.02%
Spill count: 60120 -> 59472 (-1.08%); split: -1.17%, +0.10%
Fill count: 105973 -> 104675 (-1.22%); split: -1.40%, +0.17%
Scratch Memory Size: 2396160 -> 2381824 (-0.60%); split: -0.73%, +0.13%
Max live registers: 31782879 -> 31788857 (+0.02%); split: -0.01%, +0.03%
Max dispatch width: 5569200 -> 5569344 (+0.00%); split: +0.00%, -0.00%

Totals from 10089 (1.60% of 632405) affected shaders:
Instrs: 6389866 -> 6346377 (-0.68%); split: -0.87%, +0.19%
Subgroup size: 102912 -> 102984 (+0.07%)
Cycle count: 681310278 -> 680308204 (-0.15%); split: -0.65%, +0.51%
Spill count: 19571 -> 18923 (-3.31%); split: -3.61%, +0.30%
Fill count: 38229 -> 36931 (-3.40%); split: -3.88%, +0.48%
Scratch Memory Size: 808960 -> 794624 (-1.77%); split: -2.15%, +0.38%
Max live registers: 677473 -> 683451 (+0.88%); split: -0.45%, +1.33%
Max dispatch width: 88672 -> 88816 (+0.16%); split: +0.27%, -0.11%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
b2d7a823be brw/lower: Don't emit spurious moves to or from NULL register
Previously an instruction like

    cmp.l.f0.0(16) null:F, v359:F, 0f

would get lowered to

    undef(16) v13703:UD
    cmp.l.f0.0(16) v13703:F, v359:F, 0f
    mov(16) null:UD, v13703:UD

After copy propagation and dead-code elimination are run again, the
original CMP gets turned back into its original form!

Some cases can also emit MOVs from the original NULL register.

It should be possible to not do any lowering here, but there are some
interactions with source lowering passes for things like

    cmp.l.f0.0(16) null:HF, g89.1<16,16,1>:HF, 0hf

What inspired this was... diff'ing step-by-step dumps from
INTEL_DEBUG=optimizer had a lot of useless changes due to these MOVs
and undefs. It was very annoying.  This low-effort change gets the
majority of the possible benefit.

No shader-db or fossil-db changes on any Intel platform.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
9aba731d03 brw/cse: Don't eliminate instructions that write flags
With other changes in my tree, I observed this code from
dEQP-VK.subgroups.vote.compute.subgroupallequal_float have the second
cmp.z removed.

    undef(8) %69:UD
    cmp.z.f0.0(8) %69:F, %37:F, %57+0.0<0>:F
    mov(1) v58+0.0:D, 0d NoMask group0
    (+f0.0) mov(1) v58+0.0:D, -1d NoMask group0
    cmp.nz.f0.0(8) null:D, v58+0.0<0>:D, 0d
    ...
    undef(8) %72:UD
    cmp.z.f0.0(8) %72:F, %37:F, %57+0.0<0>:F
    mov(1) v63+0.0:D, 0d NoMask group0
    (+f0.0) mov(1) v63+0.0:D, -1d NoMask group0

This was also fixed by running dead-code elimination before CSE. That
seems more like avoiding the problem than fixing it, though.

I believe this affects shader-db results because leaving the second
CMP in the shader can give more opportunities for cmod propagation.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 234c45c929 ("intel/brw: Write a new global CSE pass that works on defs")

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total cycles in shared programs: 922097690 -> 922260862 (0.02%)
cycles in affected programs: 3178926 -> 3342098 (5.13%)
helped: 130
HURT: 88
helped stats (abs) min: 2 max: 2194 x̄: 296.71 x̃: 16
helped stats (rel) min: <.01% max: 16.56% x̄: 1.86% x̃: 0.18%
HURT stats (abs)   min: 4 max: 11992 x̄: 2292.55 x̃: 47
HURT stats (rel)   min: 0.04% max: 57.32% x̄: 11.82% x̃: 0.61%
95% mean confidence interval for cycles value: 320.36 1176.63
95% mean confidence interval for cycles %-change: 1.59% 5.73%
Cycles are HURT.

LOST:   2
GAINED: 1

fossil-db:

Lunar Lake, Meteor Lake, Tiger Lake had similar results. (Lunar Lake shown)
Totals:
Instrs: 142022960 -> 142022928 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21995242782 -> 21995384040 (+0.00%); split: -0.00%, +0.00%
Max live registers: 48013385 -> 48013343 (-0.00%)

Totals from 507 (0.09% of 551441) affected shaders:
Instrs: 886191 -> 886159 (-0.00%); split: -0.01%, +0.01%
Cycle count: 69302492 -> 69443750 (+0.20%); split: -0.66%, +0.86%
Max live registers: 94413 -> 94371 (-0.04%)

DG2
Totals:
Instrs: 152856370 -> 152856093 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17237159885 -> 17236804052 (-0.00%); split: -0.00%, +0.00%
Fill count: 150673 -> 150631 (-0.03%)
Max live registers: 31871520 -> 31871476 (-0.00%)

Totals from 506 (0.08% of 633197) affected shaders:
Instrs: 831795 -> 831518 (-0.03%); split: -0.04%, +0.01%
Cycle count: 55578509 -> 55222676 (-0.64%); split: -1.38%, +0.74%
Fill count: 2779 -> 2737 (-1.51%)
Max live registers: 51383 -> 51339 (-0.09%)

Ice Lake and Skylake had similar results. (Ice Lake shown)
Totals:
Instrs: 152017826 -> 152017793 (-0.00%); split: -0.00%, +0.00%
Cycle count: 15180773451 -> 15180761166 (-0.00%); split: -0.00%, +0.00%
Fill count: 106610 -> 106614 (+0.00%)
Max live registers: 32195006 -> 32194966 (-0.00%)

Totals from 411 (0.06% of 637268) affected shaders:
Instrs: 705935 -> 705902 (-0.00%); split: -0.01%, +0.01%
Cycle count: 47830019 -> 47817734 (-0.03%); split: -0.05%, +0.02%
Fill count: 2865 -> 2869 (+0.14%)
Max live registers: 42883 -> 42843 (-0.09%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
80a5d158ae brw/copy: Don't copy propagate through smaller entry dest size
Copy propagation would incorrectly occur in this code

    mov(16) v4+2.0:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0

to create

    mov(16) v4+2.0:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, u0<0>:UD NoMask group0

This has different behavior. I think I just made a mistake when I
changed this condition in e3f502e007.

It seems like this condition could be relaxed to cover cases like (note
the change of destination stride)

    mov(16) v4+2.0<2>:UW, u0<0>:UW NoMask
    ...
    mov(8) v6+2.0:UD, v4+2.0:UD NoMask group0

I'm not sure it's worth it.

No shader-db or fossil-db changes on any Intel platform. Even the code
for the test case mentioned in the original commit did not change.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: e3f502e007 ("intel/fs: Allow copy propagation between MOVs of mixed sizes")
Closes: #12116
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32041>
2024-11-08 17:46:45 +00:00
Ian Romanick
c1c09e3c4a brw/emit: Add correct 3-source instruction assertions for each platform
Specifically, allow two immediate sources for BFE on Gfx12+. I stumbled
on this while trying some stuff with !31852.

v2: Don't be lazy. Add proper assertions for all the things on all the
platforms. Based on a suggestion by Ken.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Fixes: 7bed11fbde ("intel/brw: Allow immediates in the BFE instruction on Gfx12+")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31858>
2024-11-08 16:48:57 +00:00
Lionel Landwerlin
3ecf2a0518 anv: fix extent computation in image->image host copies
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 0317c44872 ("anv: add VK_EXT_host_image_copy support")
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32027>
2024-11-07 22:44:41 +00:00
Felix DeGrood
bf96702985 intel/measure: increase size of filename malloc to account for \0
Corrects regression caused by prior commit that created memory
overwrite by not mallocing enough space for filename string.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32013>
2024-11-06 22:12:29 +00:00
Lionel Landwerlin
0ab2849597 anv: move pipe control debug to anv_util.c
We're going to add more printing.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31928>
2024-11-06 12:20:23 +00:00
Lionel Landwerlin
b5403a4e40 anv: fix indentation
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31928>
2024-11-06 12:20:23 +00:00
Lionel Landwerlin
f9e76e8ca6 anv: add texture cache inval after binding pool update
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31928>
2024-11-06 12:20:22 +00:00
Lionel Landwerlin
b3f487bd0d anv: fix even set/reset on blitter engine
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Cc: mesa-stable
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31928>
2024-11-06 12:20:22 +00:00
Matt Turner
5068a6b4ce anv: Set shader_spilling_rate=11
This has the best fossil-db results across in a sweep from 0..15.

fossil-db results on Alderlake:

Instructions in all programs: 152849904 -> 152824116 (-0.0%)
SENDs in all programs: 7677830 -> 7677830 (+0.0%)
Loops in all programs: 48470 -> 48470 (+0.0%)
Cycles in all programs: 11988670382 -> 11987530942 (-0.0%)
Spills in all programs: 42863 -> 41777 (-2.5%)
Fills in all programs: 77114 -> 73044 (-5.3%)

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31990>
2024-11-06 02:47:26 +00:00
Kenneth Graunke
22b511ef02 intel: Set shader_spilling_rate=11 in intel_clc
A while back Matt enabled shader_spilling_rate by default for anv.
But intel_clc doesn't use the driconf mechanism that we use there.

The GRL shaders spill a lot, and with us now compiling additional
generations of the shaders, Mesa build time is getting prohibitively
expensive.  By setting this, we drop the time taken for a clean debug
build by approximately 35% on my current laptop.

Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31993>
2024-11-06 01:57:10 +00:00
José Roberto de Souza
a991935088 anv: Enable perf metrics id set syncronization
Now actually making use of new Xe KMD OA syncronization uAPI.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31283>
2024-11-05 19:25:53 +00:00
José Roberto de Souza
953abc7d1e intel/perf: Add INTEL_PERF_FEATURE_METRIC_SYNC and check if KMD supports it
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31283>
2024-11-05 19:25:53 +00:00
José Roberto de Souza
a38a98c4cb intel/perf: Extend intel_perf_stream_set_metrics_id() to syncronize metrics id changes
Xe KMD added a uAPI to syncronze metrics id changes, so we can make
it wait for all previous workloads in exec_queue and all previous
metrics id changes to finish before start change it again.
This should make Vulkan queries more robust.

So this makes use of intel_bind_timeline to syncronize the metrics id
changes and xe_queue_get_syncobj_for_idle() to syncronize with
exec_queue.

As i915 and some versions of Xe KMD will not support it, this feature
will only be used then intel_bind_timeline parameter is not NULL and
timeline has a valid syncobj id.
At this patch level all callers will set it to NULL, next patch will
add and initialize timeline in ANV when supported by Xe KMD.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31283>
2024-11-05 19:25:53 +00:00
José Roberto de Souza
27fef94851 intel/perf: Add OA support to ARL
ARL has enough differences in OA files to have its own set of files.

Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31685>
2024-11-05 14:56:49 +00:00
itycodes
10c92cbd39 intel: Fix a typo in intel_device_info.c:has_get_tiling
The structs are of equal size and both ioctls were added at the same
time, so the functionality is equivalent, but it's nonetheless the
incorrect type being passed.

Signed-off-by: tranquillitycodes@proton.me
Fixes: 762e601f77
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31974>
2024-11-05 04:31:50 +01:00
Felix DeGrood
99e8502013 intel/measure: defer file open until first write
Fixes abort on steam.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31938>
2024-11-04 20:25:14 +00:00
Felix DeGrood
f345019830 intel/measure: add nogl feature
Do not trigger INTEL_MEASURE for ogl apps with INTEL_MEASURE=nogl

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31938>
2024-11-04 20:25:14 +00:00
Sviatoslav Peleshko
3a962a28e7 intel/elk_asm: Add BranchCtrl support
We emit it for gfx8, so the assembler should support it too.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
2024-11-02 18:01:20 +00:00
Sviatoslav Peleshko
cd4c328408 intel/elk: List all instructions that have BranchCtrl bit
Previously this bit was not clearly documented in PRMs, but gfx12 PRMs
finally list all the instructions where it is present.

Although it's unclear if it's functional for anything other than "if",
"else", and "goto", we probably still should acknowledge its existence
in other instructions.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
2024-11-02 18:01:20 +00:00
Sviatoslav Peleshko
445df8d611 intel/brw_asm: Add BranchCtrl support
We emit it for gfx9, so the assembler should support it too.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
2024-11-02 18:01:19 +00:00
Sviatoslav Peleshko
aea7366613 intel/brw: List all instructions that have BranchCtrl bit
Previously this bit was not clearly documented in PRMs, but gfx12 PRMs
finally list all the instructions where it is present.

Although it's unclear if it's functional for anything other than "if",
"else", and "goto", we probably still should acknowledge its existence
in other instructions.

Signed-off-by: Sviatoslav Peleshko <sviatoslav.peleshko@globallogic.com>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31747>
2024-11-02 18:01:19 +00:00
Paulo Zanoni
5ca883505e brw: add a NOP in between WHILE instructions on LNL
This is a workaround that is still in progress, see HSD 22020521218.
If we don't have these NOPs we may see rendering corruption or even
GPU hangs.

While we still don't fully understand the issue from the hardware
point of view, let's have this workaround so we can pass CTS and move
things forward. If we need to change this later, we can. Besides, the
impact is minimal. Shaderdb/fossilize report no changes for this
patch.

On our Blackops trace, the lack of this patch causes corruption in fog
rendering (rectangles where fog was supposed to be shown don't show
the fog).

On dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters, without
this patch we get a GPU hang.

Backport-to: 24.2
Testcase: dEQP-VK.graphicsfuzz.cov-array-copies-loops-with-limiters
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11813
Reviewed-by: Ivan Briano <ivan.briano@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31331>
2024-10-31 23:57:10 +00:00
Jordan Justen
39fab9b240 intel/dev: Set L3 bank count for Xe2+ from Xe KMD
Rather than updating intel_device_info_update_l3_banks(), the Xe KMD
provides this info via the DRM_XE_DEVICE_QUERY_GT_TOPOLOGY query item.

Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31894>
2024-10-31 18:40:27 +00:00
Lionel Landwerlin
1485b5659a anv: update some of the indirect invalidations
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31915>
2024-10-30 20:39:31 +00:00
Lionel Landwerlin
cb224370b6 anv: avoid L3 fabric flush in pipeline barriers
This bit is not needed for barriers and appears to trigger a
performance regression. So leave it for just for AUX-TT
flushing/invalidation.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: e3814dee1a ("anv: add plumbing/support for L3 fabric flush")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12090
Reviewed-by: Nanley Chery <nanley.g.chery@intel.com>
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31915>
2024-10-30 20:39:31 +00:00
Sagar Ghuge
17096f87c1 intel: Switch to COMPUTE_WALKER_BODY
Stuff COMPUTE_WALKER_BODY in COMPUTER_WALKER in both iris and anv.

This also fixes the tracepoint for ray dispatches. Stuffing
COMPUTE_WALKER_BODY allow us to set the
cmd_buffer->state.last_compute_walker.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31822>
2024-10-29 15:54:43 +00:00
José Roberto de Souza
6a0f2dd44b intel/dev: Fix max_cs_threads value on simulator
intel_device_info_update_after_hwconfig() updates max_cs_threads
based on max_eus_per_subslice and num_thread_per_eu but in some
platforms simulator the hwconfig don't have the
INTEL_HWCONFIG_MAX_NUM_EU_PER_DSS value, causing max_cs_threads to
be set to a wrong value and then causing issues when programing
CFE_STATE with a invalid value.

Fortunately we can also get max_eus_per_subslice from topology query,
so here moving the hwconfig query and
intel_device_info_update_after_hwconfig() call to after topology.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31850>
2024-10-28 21:24:09 +00:00
José Roberto de Souza
6c84cbd8c9 intel/dev/xe: Set max_eus_per_subslice using topology query
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31850>
2024-10-28 21:24:09 +00:00
Nanley Chery
334b368fc9 anv: Allow more fast clear colors for layouts
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/9983
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
2024-10-28 17:43:21 +00:00
Nanley Chery
4e17452387 anv: Load fast clear colors more often
If a render area covers an area that is smaller than an attachment's
extent and is not aligned to the CCS block size, we must load the clear
color so that the pixels outside of that area are decompressed with the
right clear color.

Prevents the next patch from causing the following test failure on gfx9:

dEQP-VK.renderpass.suballocation.load_store_op_none.color_load_op_none_store_op_none

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
2024-10-28 17:43:21 +00:00
Nanley Chery
0e6b132a75 anv: Access more colors in fast_clear_memory_range
Store an array of clear values, one for each view format of the image.
Load the clear value based on the view format.

anv_image_msaa_resolve() may override the source or destination with
ISL_FORMAT_UNSUPPORTED, so make anv_image_get_clear_color_addr() handle
that format.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
2024-10-28 17:43:21 +00:00
Nanley Chery
43bc4f4576 anv: Refactor clear color loading functions
Rename the functions and update the parameters in preparation for the
next patch.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
2024-10-28 17:43:21 +00:00
Nanley Chery
0d4f2a2db1 anv: Move code out of loop in anv_CmdClearColorImage
According to the spec, the clear range's aspect will always be the color
aspect.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
2024-10-28 17:43:21 +00:00
Nanley Chery
8f9ed7e932 anv: Prepare dmabufs for clear color arrays
In later commits, we'll rely on the number of view formats used by an
image to determine the size allocated for an array of clear colors in
the aux-state tracking buffer. Having a single view format for dmabufs
with clear color support allows anv to transparently handle this case.

Restrict the number of view formats by explicitly setting the image
format list to incomplete. Secondly, loosen the non-zero clear color
restriction on clear color supporting dmabufs. Those images can support
any clear color even with an incomplete list because we restrict
problematic accesses for the clear color during the negotiation phase.
Lastly, update add_all_surfaces_explicit_layout() to assert that the
sizing of the imported clear color struct meets expectations.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
2024-10-28 17:43:21 +00:00
Nanley Chery
f5f0354447 anv: Add an array of view formats to anv_image
Stores the format list for the image in terms of ISL formats.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31743>
2024-10-28 17:43:20 +00:00
Valentine Burley
e18733300e anv/ci: Remove additive blending fails on ADL
This was a VKCTS bug on earlier version of the CTS.

These tests have been actually passing since the VKCTS was uprevved to
1.3.9.0, which landed a bit before ADL testing in CI was turned on.

Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31862>
2024-10-27 21:43:18 +00:00
Valentine Burley
3b5e49a7f8 intel/ci: Fix Alder Lake's configuration
There's currently no GL or GLES testing on the iris gallium driver,
and the VKCTS expectations were erroneously listed under iris-*.txt.

Fix the rules set for anv-adl-full, change the GPU_VERSION to anv-adl
and move the expectations around accordingly.

Signed-off-by: Valentine Burley <valentine.burley@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31862>
2024-10-27 21:43:18 +00:00
Iván Briano
13db5fad27 brw: fix task/mesh push constant loading
The InlineData passed to the shader is a fixed size unrelated to the
register size. It happens to match pre-Xe2, but by considering it the
same in Xe2, we ended up reading pushed constants from the wrong place
when they didn't fit in the InlineData.

Fixes: 97b17aa0b1 ("brw/nir: rework inline_data_intel to work with compute")

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31856>
2024-10-26 18:12:41 +00:00
Jordan Justen
b7560fa048 anv: Build for Xe3
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>
2024-10-26 07:39:30 +00:00
Jordan Justen
35ace9d4e2 intel/compiler: Xe2 and Xe3 use the same compaction tables
Ref: bspec 56709
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>
2024-10-26 07:39:30 +00:00
Jordan Justen
688a673c5a intel/brw: Allow Xe3 in brw_stage_has_packed_dispatch()
Signed-off-by: Jordan Justen <jordan.l.justen@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31838>
2024-10-26 07:39:30 +00:00