Commit Graph

13275 Commits

Author SHA1 Message Date
Vinson Lee 83809f06a7 intel/elk: Fix assert with side effect
Fix defect reported by Coverity Scan.

Side effect in assertion (ASSERT_SIDE_EFFECT)
assert_side_effect: Argument ++eot_count of assert() has a side effect.
The containing function might work differently in a non-debug build.

Fixes: ebd6738260 ("intel/elk/chv: Implement WaClearArfDependenciesBeforeEot")
Signed-off-by: Vinson Lee <vlee@freedesktop.org>
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32884>
2025-01-09 04:07:42 +00:00
Matt Turner 89da5a9626 intel/decoder: Avoid duplicate symbols when expat is not available
Fixes: 0669210ef4 ("intel/decoder: Add ELK support")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12335
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32732>
2025-01-08 18:58:35 +00:00
Kenneth Graunke 35f175301d brw: Fix vectorizer hole_size condition after signedness change
Marek recently changed hole_size to be signed, rather than unsigned.
A negative hole_size means that the two loads overlap - and thus are
prime candidates to be combined.

My original hole_size handling was:

   if hole_size > 4 * (8 - low->num_components) then don't vectorize

For non-overlapping loads, this worked: NIR's largest vector is vec16,
and if low was already a vec16, combining it with anything would exceed
that, so it'd never be considered.  That meant low would always be a
vec8 or less, so (8 - low->num_components) was a positive number.

Now that we see overlapping loads, we can see a vec16 low, vec4 high,
and also a negative hole size, giving us fun comparisons like:

   -16 > 4 * (8 - 16)   =>   -16 > -32   => true, don't vectorize

Which is absolutely the wrong thing to do, because the high load's data
is entirely included within the former load's data.

The idea here was to make sure the second load would be able to pack at
least one component into the first's V8 result.  But even this isn't the
best, because...even if it's simply adjacent, doing one V16 load is more
efficient than requesting two back to back V8 loads.

So, we just simplify down to a static check: if there's an entire V8 of
hole, don't vectorize.  This already won't happen because the core pass
has max_hole set to 28 bytes (7 32-bit components), but that could
change based on the needs of other drivers, so let's be defensive.

fossil-db results on Alchemist:

   Instrs: 161533978 -> 161295137 (-0.15%); split: -0.20%, +0.05%
   Subgroup size: 8092544 -> 8092568 (+0.00%)
   Send messages: 7915233 -> 7844503 (-0.89%); split: -0.94%, +0.05%
   Cycle count: 16577700697 -> 16702609256 (+0.75%); split: -0.59%, +1.35%
   Spill count: 72338 -> 67226 (-7.07%); split: -7.36%, +0.29%
   Fill count: 134058 -> 125980 (-6.03%); split: -6.83%, +0.80%
   Scratch Memory Size: 4092928 -> 3786752 (-7.48%); split: -7.53%, +0.05%
   Max live registers: 33031460 -> 32945994 (-0.26%); split: -0.27%, +0.01%
   Max dispatch width: 5778384 -> 5778536 (+0.00%); split: +0.26%, -0.26%
   Non SSA regs after NIR: 179809505 -> 152735471 (-15.06%); split: -15.08%, +0.03%

Fixes: c21bc65ba7 ("nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32932>
2025-01-08 00:19:54 +00:00
Sagar Ghuge 33d9a685a5 anv: Add pipelined coarse pixel state
3DSTATE_CPS_POINTERS is deprecated on PTL, so let's switch to
3DSTATE_COARSE_PIXEL to deliver CPS state as pipelined state.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32737>
2025-01-07 23:53:44 +00:00
Sagar Ghuge 9d33443d7b intel/genxml: Add coarse pixel related changes
This change adds CPS related new state instruction, structure and
enum.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32737>
2025-01-07 23:53:44 +00:00
Caio Oliveira 868016d92c intel/brw/xe2+: Do not use $.dst or $.src SWSB annotations in SENDs
When a SEND instruction is a EOT, the scoreboard lowering will not
allocate a new SBID for it, since nothing needs to wait for it.  In
Gfx12 this allowed the SEND to get out-of-order $.dst or $.src
dependencies.

Starting on Xe2+ this is not supported anymore, in favor of supporting
more combined modes.

Reviewed-by: Francisco Jerez <currojerez@riseup.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32712>
2025-01-07 22:23:59 +00:00
Tapani Pälli 1cc17e9ce9 intel/compiler: take reg_unit size into account with ubo ranges
Fixes: 1ab4fe2dd6 ("brw: Don't shrink UBO push ranges in the backend")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12423
Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32925>
2025-01-07 21:38:06 +00:00
Sagar Ghuge 385977955b intel: Set correct maxComputeSharedMemorySize for Xe3+
For Xe3+, set preferred SLM and SLM per threadgroup size.

Bspec: 73211
Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32872>
2025-01-07 07:06:09 +00:00
Chia-I Wu dd0f8cc7de hasvk: use common calibrated timestamp support
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32689>
2025-01-07 03:39:29 +00:00
Chia-I Wu 83dec767da anv: use common calibrated timestamp support partially
Use the common GetPhysicalDeviceCalibrateableTimeDomainsKHR.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32689>
2025-01-07 03:39:29 +00:00
José Roberto de Souza 7ac9ac0f93 anv: Allow larger SLM sizes for task and mesh shader
It was hard-coded to 64k but Xe2 platforms and newer supports
larger SLM sizes.

Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Dylan Baker <dylan.c.baker@intel.com>
Cc: mesa-stable
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32874>
2025-01-06 18:31:20 +00:00
Kenneth Graunke 4ab04799ee brw: Delete assign_constant_locations and push_constant_loc[]
The push_constant_loc[] array is always an identity mapping these days,
so it's kind of pointless.  Just use the original uniform number and
skip the unnecessary "remap" step.  With that gone, and shrinking UBO
ranges gone, assign_constant_locations() is now empty and can be removed
as well.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
2025-01-06 12:45:47 +00:00
Kenneth Graunke 93e186e1a4 brw: Delete pull constant lowering
Now that we never shrink ranges in the backend, we never lower push
constants to pull constants late in the backend either.  get_pull_loc
will never return true, and so all of brw_lower_constant_loads becomes
a noop.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
2025-01-06 12:45:47 +00:00
Kenneth Graunke 1ab4fe2dd6 brw: Don't shrink UBO push ranges in the backend
Back in the bad old days (vec4?) we had a bunch of smarts in the backend
to dead code eliminate unused vector components and re-pack regular
uniforms, so we really couldn't decide how much data we were pushing
until very late in the backend.  Nowadays we have none of that - we do
all of our elimination and packing in NIR.  anv shrinks ranges to deal
with Vulkan API push constants, and iris treats everything as a UBO and
as of the previous commit will also shrink appropriately.

So we don't need to do this anymore...which will let us simplify quite
a bit of code.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
2025-01-06 12:45:47 +00:00
Kenneth Graunke 583ad35455 brw: Limit maximum push UBO ranges to 64 registers in the NIR pass.
anv already does this limiting, since it needs to handle non-UBO push
constants as well.  iris treats everything as a UBO, but doesn't have
a limiter and was relying on the backend to handle it.

Do this in the NIR pass so that we can eliminate the backend code.
It's not necessary for anv, but handling it here is simple and less
error prone for iris, which calls this in a number of places.  We know
we need to limit things to this much; anv can limit more if needed.

Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32841>
2025-01-06 12:45:47 +00:00
Tapani Pälli 72351afe24 anv: handle mesh in sbe_primitive_id_override
This prevents crashes seen in some upcoming cts tests.

Signed-off-by: Tapani Pälli <tapani.palli@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32861>
2025-01-06 08:41:18 +00:00
Hyunjun Ko 5ecea6ec4a anv: handle negative value of slot index for h265 decoding.
Fixes: 8d519eb5 ("anv: add initial video decode support for h265")
Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32823>
2025-01-06 01:02:14 +00:00
Hyunjun Ko 168298b891 anv: Enable remapping picture ID
Fix to handle 16 refs.

v1. handle the case where a slot index is negative.
(Lionel Landwerlin <lionel.g.landwerlin@intel.com>)

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32823>
2025-01-06 01:02:14 +00:00
Hyunjun Ko 9221feaf79 anv: define ANV_VIDEO_H264_MAX_DPB_SLOTS
prep work for remapping slot ids for h264 decoding.

Signed-off-by: Hyunjun Ko <zzoon@igalia.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32823>
2025-01-06 01:02:13 +00:00
Lionel Landwerlin 98cdb9349a anv: ensure null-rt bit in compiler isn't used when there is ds attachment
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: 15987f49bb ("anv: avoid setting up a null RT unless needed")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12396
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32867>
2025-01-03 23:12:22 +00:00
Caio Oliveira 6968794c50 intel/brw: Add missing bits in 3-src SWSB encoding for Xe2+
Fix invalid SWSB annotation in dEQP-VK.glsl.builtin.precision.mix.mediump.vec4 for LNL.

Fixes: 4a24f49b57 ("intel/compiler/xe2: Implement codegen of three-source instructions.")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32846>
2025-01-03 21:19:26 +00:00
Lionel Landwerlin 1448778385 anv: rework tbimr push constant workaround
We'll want to know about the empty push constant for device generated
commands. It's easier if the information is stored in
anv_pipeline_bind_map::push_ranges[].

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32828>
2025-01-03 11:48:42 +00:00
Lionel Landwerlin 6281b207db anv: add tracepoints timestamp mode for empty dispatches
When the runtime is going to potentially emit no dispatch, we need to
have a way to capture a timestamp. Add a new flag for this to tell
whether we don't have a HW instruction to capture the timestamp and
rely on MI_STORE_REGISTER_MEM instead.

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Fixes: de00fe3f66 ("anv: add BVH building tracking through u_trace")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12382
Reviewed-by: Sagar Ghuge <sagar.ghuge@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32835>
2025-01-03 10:36:49 +00:00
Dylan Baker d9429229cf intel/tests: Fix missing assignment of error condition
Coverity notices that `err` might be used uninitialized, which is true
as we don't assign the value we want to check! Fix that assignment so
the EXPECT_EQ macro does what we expect.

CID: 1635272
Fixes: 6b931a68c7 ("intel/common: Implement Xe KMD in mi_builder tests")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32849>
2025-01-03 00:53:49 +00:00
Dylan Baker 5420fc16d6 intel/tests: Fix coverity warning about possibly leaked memory
If the assert were to fail the memory would leak, which is pretty
harmless in a unit test, but the fix is trivial.

CID: 1635429
Fixes: 6b931a68c7 ("intel/common: Implement Xe KMD in mi_builder tests")
Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32849>
2025-01-03 00:53:49 +00:00
Caio Oliveira e1aebf8a0c intel/brw: Remove 'fs' prefix from passes and related functions
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>
2025-01-02 18:11:05 +00:00
Caio Oliveira 25384dccc0 intel/brw: Remove 'fs' prefix from passes filenames
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32813>
2025-01-02 18:11:05 +00:00
Lionel Landwerlin 6fb2d3b163 anv: limit the memcpy data for push constants
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32824>
2025-01-02 16:48:04 +00:00
Marek Olšák c21bc65ba7 nir/opt_load_store_vectorize: make hole_size signed to indicate overlapping loads
A negative hole size means the loads overlap. This will be used by drivers
to handle overlapping loads in the callback easily.

Reviewed-by: Mel Henning <drawoc@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32699>
2025-01-01 00:03:55 +00:00
Sagar Ghuge 76e85df2d2 anv: Switch to ANISOTROPIC_FAST filter mode
Same thing as ANISOTROPIC including all restrictions except HW is
allowed to take liberties with precision to speed things up, Currently
only has an affect on formats of type *_sRGB.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32738>
2024-12-31 21:49:41 +00:00
Sagar Ghuge 15063d79d3 intel/genxml: Update SAMPLER_STATE structure
Add new ANISOTROPIC_FAST filter mode value to the Min/MagModeFilter
field.

Signed-off-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32738>
2024-12-31 21:49:41 +00:00
Valentine Burley 6b0d551e8b angle/ci: Update expectations
Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32749>
2024-12-31 08:03:46 +00:00
Caio Oliveira 056b14b882 intel/brw: Move two NIR passes to brw_nir.c
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32799>
2024-12-30 20:18:23 +00:00
Caio Oliveira 1154b07d09 intel/brw: Add missing call to invalidate analysis
Reviewed-by: Rohan Garg <rohan.garg@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32798>
2024-12-30 19:01:40 +00:00
Caio Oliveira 3ca6fa7487 intel/brw: Gather brw_reg related implementations in brw_reg.cpp
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32800>
2024-12-30 18:26:59 +00:00
Caio Oliveira 5860e07f92 intel/brw: Rename brw_compact_inst_* helpers to brw_eu_compact_inst_*
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Caio Oliveira 228aba779f intel/brw: Rename brw_inst_* helpers to brw_eu_inst_*
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Caio Oliveira 3031b22a8a intel/brw: Rename brw_inst_bits/set_bits to brw_eu_inst_bits/set_bits
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Caio Oliveira 06ccaad5f1 intel/brw: Rename brw_compact_inst to brw_eu_compact_inst
Consistent with brw_eu_inst.

Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Caio Oliveira 3c3f4a1235 intel/brw: Rename brw_inst to brw_eu_inst
Free the old name for the BRW IR instruction.

Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Caio Oliveira 9caa845e0f intel/brw: Rename brw_inst.h to brw_eu_inst.h
Acked-by: Sagar Ghuge <sagar.ghuge@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32643>
2024-12-30 17:16:15 +00:00
Lionel Landwerlin 5e4aeb3ad7 anv: fix index buffer size changes
With vkCmdBindIndexBuffer2KHR only the provided size can change which
currently fails to reprogram the index buffer properly.

Signed-off-by: Lionel Landwerlin <llandwerlin@gmail.com>
Fixes: 5c2aca456e ("anv: implement vkCmdBindIndexBuffer2KHR")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12376
Reviewed-by: Tapani Pälli <tapani.palli@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32785>
2024-12-27 13:20:49 +00:00
Ian Romanick 0f3a350087 brw/nir: Don't generate scalar byte to float conversions on DG2+ in optimize_extract_to_float
The lowering code does not generate efficient code. It is better to
just not emit the bad thing in the first place. The shaders that I
examined had blocks of NIR like:

    con 32     %527 = extract_u8 %456.o, %5 (0x0)
    con 32     %528 = extract_u8 %456.o, %35 (0x1)
    con 32     %529 = extract_u8 %456.o, %14 (0x2)
    con 32     %530 = extract_u8 %456.o, %11 (0x3)
    con 32     %531 = u2f32 %527
    con 32     %532 = u2f32 %528
    con 32     %533 = u2f32 %529
    con 32     %534 = u2f32 %530

In some cases the u2f results are multiplied with 1/255. There may be
a slightly more efficient way to do this by doing something like

    mov(8)    g40<1>UW        g12.1<32,8,4>UB
    mov(8)    g41<1>UW        g12.2<32,8,4>UB
    mov(8)    g42<1>UW        g12.3<32,8,4>UB
    mov(8)    g60<1>F         g12<32,8,4>UB
    mov(8)    g61<1>F         g40<1,1,0>UW
    mov(8)    g62<1>F         g41<1,1,0>UW
    mov(8)    g63<1>F         g42<1,1,0>UW

In SIMD16 and SIMD32 that would save temporary register space. It could
save a register in SIMD8 by using g40.8 instead of g42. Making that
happen might be tricky. Maybe we should just add a special NIR opcode
that converts a packed uint32 to a vec4?

v2: Add a bunch of documentation explaining what's going on. Suggested
by Ken.

shader-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 18228689 -> 18228720 (<.01%)
instructions in affected programs: 43091 -> 43122 (0.07%)
helped: 0 / HURT: 30

total cycles in shared programs: 932542994 -> 932544290 (<.01%)
cycles in affected programs: 8150758 -> 8152054 (0.02%)
helped: 15 / HURT: 17

fossil-db:

Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 142890605 -> 142890392 (-0.00%); split: -0.00%, +0.00%
Cycle count: 21655049536 -> 21654693720 (-0.00%); split: -0.00%, +0.00%

Totals from 181 (0.03% of 553251) affected shaders:
Instrs: 188022 -> 187809 (-0.11%); split: -0.12%, +0.01%
Cycle count: 85291658 -> 84935842 (-0.42%); split: -0.47%, +0.05%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 154438050 -> 154436980 (-0.00%)
Cycle count: 15334650326 -> 15334644375 (-0.00%); split: -0.00%, +0.00%
Spill count: 56754 -> 56706 (-0.08%)
Fill count: 95919 -> 95808 (-0.12%)
Scratch Memory Size: 2306048 -> 2304000 (-0.09%)
Max live registers: 32469924 -> 32469899 (-0.00%)

Totals from 112 (0.02% of 642922) affected shaders:
Instrs: 156186 -> 155116 (-0.69%)
Cycle count: 11111478 -> 11105527 (-0.05%); split: -0.62%, +0.56%
Spill count: 1766 -> 1718 (-2.72%)
Fill count: 2815 -> 2704 (-3.94%)
Scratch Memory Size: 78848 -> 76800 (-2.60%)
Max live registers: 11526 -> 11501 (-0.22%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick 1a7593ed36 brw/nir: Treat some ballot as convergent
v2: Fix for Xe2.

v3: Add a comment explaining the use of bld instead of xbld. Suggested
by Ken. Fix a bug in handing is_scalar source. Noticed by me while
applying Ken's review feedback.

shader-db:

Lunar Lake, Meteor Lake, DG2, and Tiger Lake had similar results. (Lunar Lake shown)
total instructions in shared programs: 18228657 -> 18228689 (<.01%)
instructions in affected programs: 9333 -> 9365 (0.34%)
helped: 2 / HURT: 26

total cycles in shared programs: 932511560 -> 932542994 (<.01%)
cycles in affected programs: 2263040 -> 2294474 (1.39%)
helped: 7 / HURT: 27

Ice Lake and Skylake had similar results. (Ice Lake shown)
total instructions in shared programs: 20700370 -> 20700392 (<.01%)
instructions in affected programs: 18579 -> 18601 (0.12%)
helped: 1 / HURT: 28

total cycles in shared programs: 888385851 -> 888386325 (<.01%)
cycles in affected programs: 2571368 -> 2571842 (0.02%)
helped: 14 / HURT: 6

total spills in shared programs: 4373 -> 4371 (-0.05%)
spills in affected programs: 71 -> 69 (-2.82%)
helped: 1 / HURT: 0

total fills in shared programs: 4657 -> 4653 (-0.09%)
fills in affected programs: 196 -> 192 (-2.04%)
helped: 1 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 142887258 -> 142890605 (+0.00%); split: -0.00%, +0.00%
Cycle count: 21653599282 -> 21655049536 (+0.01%); split: -0.00%, +0.01%
Max live registers: 47942973 -> 47942837 (-0.00%)

Totals from 22209 (4.01% of 553251) affected shaders:
Instrs: 4337679 -> 4341026 (+0.08%); split: -0.00%, +0.08%
Cycle count: 261852040 -> 263302294 (+0.55%); split: -0.38%, +0.93%
Max live registers: 1299670 -> 1299534 (-0.01%)

Meteor Lake, DG2, Tiger Lake, and Skylake had similar results. (Meteor Lake shown)
Totals:
Instrs: 156599915 -> 156590882 (-0.01%); split: -0.01%, +0.00%
Cycle count: 16940072009 -> 16940902317 (+0.00%); split: -0.01%, +0.01%
Max live registers: 32610801 -> 32610488 (-0.00%)
Max dispatch width: 5730736 -> 5731744 (+0.02%); split: +0.12%, -0.11%

Totals from 35528 (5.52% of 643617) affected shaders:
Instrs: 6175409 -> 6166376 (-0.15%); split: -0.21%, +0.06%
Cycle count: 230679923 -> 231510231 (+0.36%); split: -0.46%, +0.82%
Max live registers: 1354716 -> 1354403 (-0.02%)
Max dispatch width: 167648 -> 168656 (+0.60%); split: +4.26%, -3.66%

Ice Lake
Totals:
Instrs: 155330276 -> 155318037 (-0.01%); split: -0.01%, +0.00%
Cycle count: 15019092327 -> 15019637026 (+0.00%); split: -0.00%, +0.01%
Max live registers: 32640341 -> 32637305 (-0.01%)
Max dispatch width: 5780720 -> 5780688 (-0.00%); split: +0.02%, -0.02%

Totals from 37773 (5.85% of 645641) affected shaders:
Instrs: 6643030 -> 6630791 (-0.18%); split: -0.24%, +0.05%
Cycle count: 223589025 -> 224133724 (+0.24%); split: -0.29%, +0.53%
Max live registers: 1491781 -> 1488745 (-0.20%)
Max dispatch width: 167600 -> 167568 (-0.02%); split: +0.75%, -0.77%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick f2d2014636 brw/nir: Simplify get_nir_image_intrinsic_image and get_nir_buffer_intrinsic_index
shader-db:

All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 20041625 -> 20041634 (<.01%)
instructions in affected programs: 1206 -> 1215 (0.75%)
helped: 0 / HURT: 5

total cycles in shared programs: 929993812 -> 929993816 (<.01%)
cycles in affected programs: 10930 -> 10934 (0.04%)
helped: 1 / HURT: 2

fossil-db:

Lunar Lake
Totals:
Instrs: 142892951 -> 142893049 (+0.00%)
Send messages: 6591165 -> 6591186 (+0.00%)
Cycle count: 21653727624 -> 21653732470 (+0.00%); split: -0.00%, +0.00%
Scratch Memory Size: 5664768 -> 5660672 (-0.07%)
Max live registers: 47944999 -> 47944983 (-0.00%)

Totals from 19 (0.00% of 553292) affected shaders:
Instrs: 10671 -> 10769 (+0.92%)
Send messages: 697 -> 718 (+3.01%)
Cycle count: 234508 -> 239354 (+2.07%); split: -0.01%, +2.08%
Scratch Memory Size: 38912 -> 34816 (-10.53%)
Max live registers: 2203 -> 2187 (-0.73%)

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 156744203 -> 156743428 (-0.00%); split: -0.00%, +0.00%
Send messages: 7654787 -> 7654808 (+0.00%)
Cycle count: 16942341318 -> 16942329195 (-0.00%); split: -0.00%, +0.00%
Spill count: 75549 -> 75499 (-0.07%)
Fill count: 140094 -> 140012 (-0.06%)
Scratch Memory Size: 3945472 -> 3944448 (-0.03%)
Max live registers: 32642020 -> 32642009 (-0.00%)

Totals from 19 (0.00% of 644000) affected shaders:
Instrs: 12489 -> 11714 (-6.21%); split: -7.00%, +0.79%
Send messages: 697 -> 718 (+3.01%)
Cycle count: 203873 -> 191750 (-5.95%); split: -6.77%, +0.82%
Spill count: 50 -> 0 (-inf%)
Fill count: 82 -> 0 (-inf%)
Scratch Memory Size: 25600 -> 24576 (-4.00%)
Max live registers: 1150 -> 1139 (-0.96%)

No fossil-db changes on Tiger Lake, Ice Lake, or Skylake.

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick 9a967c5ec4 brw/nir: Don't try optimize around emit_uniformize
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick 63e395fa87 brw/nir: Eliminate nir_to_brw_state::uniform_values
No shader-db changes on any Intel platform. No fossil-db changes on
Tiger Lake, Ice Lake, or Skylake.

fossil-db:

Lunar Lake
Totals:
Cycle count: 21653230858 -> 21653230518 (-0.00%); split: -0.00%, +0.00%
Max live registers: 47941741 -> 47941737 (-0.00%)

Totals from 17 (0.00% of 553202) affected shaders:
Cycle count: 201232 -> 200892 (-0.17%); split: -0.19%, +0.02%
Max live registers: 1354 -> 1350 (-0.30%)

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 156455123 -> 156453396 (-0.00%); split: -0.00%, +0.00%
Cycle count: 16904545026 -> 16904393943 (-0.00%); split: -0.00%, +0.00%
Max live registers: 32638039 -> 32638035 (-0.00%)

Totals from 1201 (0.19% of 643905) affected shaders:
Instrs: 509360 -> 507633 (-0.34%); split: -0.34%, +0.00%
Cycle count: 1579931758 -> 1579780675 (-0.01%); split: -0.01%, +0.00%
Max live registers: 59633 -> 59629 (-0.01%)

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick a13244e57b brw/nir: Treat some resource_intel as convergent
No shader-db changes on any Intel platform. No fossil-db changes on Ice
Lake or Skylake.

fossil-db:

Lunar Lake
Totals:
Cycle count: 21653232202 -> 21653230858 (-0.00%); split: -0.00%, +0.00%

Totals from 4 (0.00% of 553202) affected shaders:
Cycle count: 14276568 -> 14275224 (-0.01%); split: -0.01%, +0.00%

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
Totals:
Instrs: 156453398 -> 156455123 (+0.00%); split: -0.00%, +0.00%
Cycle count: 16904394153 -> 16904545026 (+0.00%); split: -0.00%, +0.00%

Totals from 1189 (0.18% of 643905) affected shaders:
Instrs: 502891 -> 504616 (+0.34%); split: -0.00%, +0.34%
Cycle count: 1579688485 -> 1579839358 (+0.01%); split: -0.00%, +0.01%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick 1b24612c57 brw/nir: Treat load_*_uniform_block_intel as convergent
Between 5 and 10 shaders (depending on the platform) from Blender are
massively helped for spills and fills (e.g., from 45 spills to 0, and
180 fills to 0).

Previously this commit cause a lot of spill and fill damage to
Wolfenstein Youngblood and Red Dead Redemption 2. I believe due to
!32041 and !32097, this is no longer the case. RDR2 is helped, and
Wolfenstein Youngblood has no changes.

However, q2rtx/q2rtx-rt-pipeline is hurt:

    Spill count: 126 -> 175 (+38.89%); split: -0.79%, +39.68%
    Fill count: 156 -> 235 (+50.64%); split: -1.92%, +52.56%

By the end of this series this damage is fixed, and q2rtx is helped
overall by -0.79% spills and -1.92% fills.

v2: Fix for Xe2.

v3: Just keep using bld for the group(1, 0) call. Suggested by Ken.

v4: Major re-write. Pass bld and xbld to fs_emit_memory_access. The big
fix is changing the way srcs[MEMORY_LOGICAL_ADDRESS] is calculated
(around line 7180). In previous versions of the commit, the address
would be calculated using bld (which is now xbld) even if the address
source was not is_scalar. This could cause the emit_uniformize (later in
the function) to fetch garbage. This also drops the special case
handling of constant offset. Constant propagation and algebraic will
handle this.

v5: Fix a subtle bug that was ultimately caused by the removal of
offset_to_component. The MEMORY_LOGICAL_ADDRESS for
load_shared_uniform_block_intel was being calculated as SIMD16 on LNL,
but the later emit_uniformize would treat it as SIMD32. This caused GPU
hangs in Assassin's Creed Valhalla.

v6: Fix a bug in D16 to D16U32 expansion. Noticed by Ken. Add a comment
explaining bld vs xbld vs ubld in fs_nir_emit_memory_access. Suggested
by Ken.

v7: Revert some of the v6 changes related to D16 to D16U32
expansion. This code was mostly correct. xbld is correct because DATA0
needs to be generated in size of the eventual SEND instruction. Using
offset(nir_src, xbld, c) will cause offset() to correctly added
component(..., 0) if nir_src.is_scalar but xbld is not scalar_group().

v8: nir_intrinsic_load_shared_uniform_block_intel was removed. This
caused reproducible hangs in Assassin's Creed: Valhalla. There are some
other compiler issues related to this game, and we're not yet sure
exactly what the cause of any of it is.

shader-db:

Lunar Lake
total instructions in shared programs: 18058270 -> 18068886 (0.06%)
instructions in affected programs: 5196846 -> 5207462 (0.20%)
helped: 4442 / HURT: 11416

total cycles in shared programs: 921324492 -> 919819398 (-0.16%)
cycles in affected programs: 733274162 -> 731769068 (-0.21%)
helped: 11312 / HURT: 31788

total spills in shared programs: 3633 -> 3585 (-1.32%)
spills in affected programs: 48 -> 0
helped: 5 / HURT: 0

total fills in shared programs: 2277 -> 2198 (-3.47%)
fills in affected programs: 79 -> 0
helped: 5 / HURT: 0

LOST:   123
GAINED: 377

Meteor Lake, DG2, and Tiger Lake had similar results. (Meteor Lake shown)
total instructions in shared programs: 19703458 -> 19699173 (-0.02%)
instructions in affected programs: 5885251 -> 5880966 (-0.07%)
helped: 4545 / HURT: 14971

total cycles in shared programs: 903497253 -> 902054570 (-0.16%)
cycles in affected programs: 691762248 -> 690319565 (-0.21%)
helped: 16412 / HURT: 28080

total spills in shared programs: 4894 -> 4646 (-5.07%)
spills in affected programs: 248 -> 0
helped: 7 / HURT: 0

total fills in shared programs: 6638 -> 5581 (-15.92%)
fills in affected programs: 1057 -> 0
helped: 7 / HURT: 0

LOST:   427
GAINED: 978

Ice Lake and Skylake had similar results. (Ice Lake shonw)
total instructions in shared programs: 20384200 -> 20384889 (<.01%)
instructions in affected programs: 5295084 -> 5295773 (0.01%)
helped: 5309 / HURT: 12564

total cycles in shared programs: 873002832 -> 872515246 (-0.06%)
cycles in affected programs: 463413458 -> 462925872 (-0.11%)
helped: 16079 / HURT: 13339

total spills in shared programs: 4552 -> 4373 (-3.93%)
spills in affected programs: 546 -> 367 (-32.78%)
helped: 11 / HURT: 0

total fills in shared programs: 5298 -> 4657 (-12.10%)
fills in affected programs: 1798 -> 1157 (-35.65%)
helped: 10 / HURT: 0

LOST:   380
GAINED: 925

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 141528822 -> 141728392 (+0.14%); split: -0.21%, +0.35%
Subgroup size: 10968048 -> 10968144 (+0.00%)
Send messages: 6567930 -> 6567909 (-0.00%)
Cycle count: 22165780202 -> 21624534624 (-2.44%); split: -3.09%, +0.65%
Spill count: 69890 -> 66665 (-4.61%); split: -5.06%, +0.44%
Fill count: 128331 -> 120189 (-6.34%); split: -7.44%, +1.09%
Scratch Memory Size: 5829632 -> 5664768 (-2.83%); split: -2.86%, +0.04%
Max live registers: 47928290 -> 47611371 (-0.66%); split: -0.71%, +0.05%

Totals from 364369 (66.18% of 550563) affected shaders:
Instrs: 113448842 -> 113648412 (+0.18%); split: -0.26%, +0.44%
Subgroup size: 7694080 -> 7694176 (+0.00%)
Send messages: 5308287 -> 5308266 (-0.00%)
Cycle count: 21885237842 -> 21343992264 (-2.47%); split: -3.13%, +0.65%
Spill count: 65152 -> 61927 (-4.95%); split: -5.42%, +0.47%
Fill count: 122811 -> 114669 (-6.63%); split: -7.77%, +1.14%
Scratch Memory Size: 5438464 -> 5273600 (-3.03%); split: -3.07%, +0.04%
Max live registers: 34355310 -> 34038391 (-0.92%); split: -1.00%, +0.07%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00
Ian Romanick db2b1e4d76 brw/nir: Treat load_btd_{global,local}_arg_addr_intel and load_btd_shader_type_intel as convergent
No shader-db changes on any Intel platform. No fossil-db changes on
Tiger Lake, Ice Lake, or Skylake.

fossil-db:

Lunar Lake
Totals:
Instrs: 141808714 -> 141808513 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22177889310 -> 22181410192 (+0.02%); split: -0.00%, +0.02%
Spill count: 69892 -> 69890 (-0.00%); split: -0.01%, +0.01%
Fill count: 128313 -> 128331 (+0.01%)
Max live registers: 48052083 -> 48052742 (+0.00%); split: -0.00%, +0.00%

Totals from 549 (0.10% of 551446) affected shaders:
Instrs: 911251 -> 911050 (-0.02%); split: -0.10%, +0.07%
Cycle count: 1244153266 -> 1247674148 (+0.28%); split: -0.04%, +0.32%
Spill count: 15849 -> 15847 (-0.01%); split: -0.04%, +0.03%
Fill count: 35087 -> 35105 (+0.05%)
Max live registers: 68047 -> 68706 (+0.97%); split: -0.25%, +1.22%

Meteor Lake
Totals:
Instrs: 152744298 -> 152741241 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17410258529 -> 17405949054 (-0.02%); split: -0.04%, +0.01%
Spill count: 78528 -> 78598 (+0.09%); split: -0.01%, +0.09%
Fill count: 147893 -> 147978 (+0.06%); split: -0.00%, +0.06%
Scratch Memory Size: 3962880 -> 3969024 (+0.16%)
Max live registers: 31887206 -> 31887413 (+0.00%); split: -0.00%, +0.00%

Totals from 552 (0.09% of 633315) affected shaders:
Instrs: 907279 -> 904222 (-0.34%); split: -0.48%, +0.15%
Cycle count: 1152358569 -> 1148049094 (-0.37%); split: -0.56%, +0.19%
Spill count: 15290 -> 15360 (+0.46%); split: -0.03%, +0.48%
Fill count: 35313 -> 35398 (+0.24%); split: -0.02%, +0.26%
Scratch Memory Size: 1313792 -> 1319936 (+0.47%)
Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08%

DG2
Totals:
Instrs: 152766492 -> 152763061 (-0.00%); split: -0.00%, +0.00%
Cycle count: 17406058608 -> 17406396943 (+0.00%); split: -0.02%, +0.02%
Spill count: 78626 -> 78624 (-0.00%); split: -0.01%, +0.01%
Fill count: 147956 -> 148007 (+0.03%); split: -0.01%, +0.04%
Scratch Memory Size: 3962880 -> 3969024 (+0.16%)
Max live registers: 31887158 -> 31887365 (+0.00%); split: -0.00%, +0.00%

Totals from 552 (0.09% of 633315) affected shaders:
Instrs: 908513 -> 905082 (-0.38%); split: -0.47%, +0.09%
Cycle count: 1148162185 -> 1148500520 (+0.03%); split: -0.23%, +0.26%
Spill count: 15364 -> 15362 (-0.01%); split: -0.07%, +0.06%
Fill count: 35343 -> 35394 (+0.14%); split: -0.03%, +0.17%
Scratch Memory Size: 1313792 -> 1319936 (+0.47%)
Max live registers: 34218 -> 34425 (+0.60%); split: -0.47%, +1.08%

Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29884>
2024-12-24 18:09:59 -08:00