Commit Graph

186467 Commits

Author SHA1 Message Date
Alyssa Rosenzweig f8b69ebdc2 hk: drop assert
works fine without.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Alyssa Rosenzweig ece3bd74db agx: make imad+ishl rules actually work
total instructions in shared programs: 2750211 -> 2750184 (<.01%)
instructions in affected programs: 50499 -> 50472 (-0.05%)
helped: 27
HURT: 0
Instructions are helped.

total alu in shared programs: 2273669 -> 2273642 (<.01%)
alu in affected programs: 29874 -> 29847 (-0.09%)
helped: 27
HURT: 0
Alu are helped.

total fscib in shared programs: 2271986 -> 2271959 (<.01%)
fscib in affected programs: 29874 -> 29847 (-0.09%)
helped: 27
HURT: 0
Fscib are helped.

total bytes in shared programs: 21475184 -> 21474968 (<.01%)
bytes in affected programs: 371574 -> 371358 (-0.06%)
helped: 27
HURT: 0
Bytes are helped.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Alyssa Rosenzweig f737470736 agx: fuse iadd+large shift into imad
total instructions in shared programs: 2750352 -> 2750211 (<.01%)
instructions in affected programs: 86944 -> 86803 (-0.16%)
helped: 32
HURT: 18
Instructions are helped.

total alu in shared programs: 2273810 -> 2273669 (<.01%)
alu in affected programs: 76720 -> 76579 (-0.18%)
helped: 32
HURT: 18
Alu are helped.

total fscib in shared programs: 2272127 -> 2271986 (<.01%)
fscib in affected programs: 76720 -> 76579 (-0.18%)
helped: 32
HURT: 18
Fscib are helped.

total bytes in shared programs: 21476424 -> 21475184 (<.01%)
bytes in affected programs: 649884 -> 648644 (-0.19%)
helped: 33
HURT: 18
Bytes are helped.

total regs in shared programs: 865114 -> 865090 (<.01%)
regs in affected programs: 525 -> 501 (-4.57%)
helped: 3
HURT: 0

total uniforms in shared programs: 2120792 -> 2120848 (<.01%)
uniforms in affected programs: 414 -> 470 (13.53%)
helped: 0
HURT: 8
Uniforms are HURT.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Alyssa Rosenzweig c9e42073a1 agx: optimize signext imad
improves clpeak short.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Asahi Lina cf0261980a hk: Enable missing swapchainMaintenance1 support
This was inconsistent with claiming the extension is supported, and that
trips up GTK4.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Alyssa Rosenzweig d449800e46 hk: don't advertise impossible modifiers
fixes dEQP-VK.drm_format_modifiers.bound_to_dma_buf.a2b10g10r10_sint_pack32,Crash

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Asahi Lina e5d61631fe hk: Fix DRM modifier selection for compressed surfaces
We have to reject DRM_FORMAT_MOD_APPLE_TWIDDLED_COMPRESSED for surfaces
which are too small. Since the modifier is for all planes, that means
that for multiplane images we need to test all planes for compression
support.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Asahi Lina da1601a4ec hk: Add virtio implicit sync support
Since we can't know what BOs are written easily, just sync against all
external BOs.

This should go away once we have proper fence passing support so we can
do implicit sync passing in muvm-x11bridge.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Mary Guillemard 1a621a6967 agx: Add support for EGL_NV_context_priority_realtime
Signed-off-by: Mary Guillemard <mary@mary.zone>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Alyssa Rosenzweig ddc6d9e984 agx: fix atomics in tess count shaders
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Alyssa Rosenzweig 2c7635ab63 agx: add tests for sign/zero-extend propagate
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:02 +00:00
Alyssa Rosenzweig 6d56c8bc02 agx: fold zext into int sources
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Alyssa Rosenzweig 200d0794e2 agx: optimize signext+iadd
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Alyssa Rosenzweig cfe0a9acec agx: add pseudo for signext
easier to optimize

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Alyssa Rosenzweig 8de339c0d8 agx: change int conversion test
it's not useful as is but we can salvage

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Asahi Lina 85c5a25ec3 asahi: In-place decompress shared resources for feedback loops
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Asahi Lina f04387a415 asahi: Introduce batch->feedback to disable compression in PBE
Used for RTs that have feedback with in-place decompression.

Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Asahi Lina 9288a3a583 asahi: Extract agx_decompress_inplace()
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Asahi Lina f28a1b3fcf asahi: Add PIPE_BIND_SHARED to imported resources
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Asahi Lina 59501af723 asahi: Add pipe bind flags to resource debug
Signed-off-by: Asahi Lina <lina@asahilina.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32081>
2024-11-11 14:33:01 +00:00
Zan Dobersek e17038cc88 fd/pps: provide derived counters on a7xx
Provide various derived counters that can be reported by the freedreno
perfetto producer on a7xx devices.

Specific to a7xx is the split of counters for some countables between the
rendering and visibility bins. Such counters have to be configured
separately inside the appropriate perfcounter group, which then enables
the derived counter to use the separate counter values in its measured
metrics.

Not all possible derived counters are enabled because the perfcounter
groups cannot handle as many counters as would be necessary. There's also
disabled derived counters that would require counters from the VBIF group
which isn't exposed for now due to its more complex way of enabling the
relevant counters.

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29677>
2024-11-11 13:39:40 +00:00
Zan Dobersek fae4a23ab1 fd/pps: specify counter group for each countable
For each countable that's being set up, the specific counter group is now
also required. This way on a7xx it will be possible to differentiate
between countables that have the same name but can be used through counter
groups for rendering bin or for visibility bin (e.g. CP and BV_CP).

Signed-off-by: Zan Dobersek <zdobersek@igalia.com>
Acked-by: Rob Clark <robclark@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29677>
2024-11-11 13:39:39 +00:00
Danylo Piliaiev 21359417ba ir3/parser: Print the line where parsing error occurred
Super useful with rddecompiler, otherwise it's impossible to
determine the instruction which is failed to be parsed.

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31954>
2024-11-11 11:38:17 +00:00
Samuel Pitoiset 30d9166d80 radv: dump the trap handler shader with RADV_DEBUG=dump_trap_handler
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32031>
2024-11-11 09:34:05 +00:00
Samuel Pitoiset 4d50691ae9 radv: remove unused parameter to radv_fill_nir_compiler_options()
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32031>
2024-11-11 09:34:05 +00:00
Konstantin Seurer e3cf6290e0 radv: Add RADV_DEBUG=nirdebuginfo
Annotates the shader with source locations into the nir shader.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:14 +00:00
Konstantin Seurer cf447c5da1 nir: Do not gather source locations for phis
Phi instructions are expected to be the first instructions in a block.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:14 +00:00
Konstantin Seurer f2c204daf0 nir: Add a first_line parameter to gather_debug_info
Useful when the file contains multiple shaders.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:14 +00:00
Konstantin Seurer 736c8c6f23 radv: Dump nir shaders before compiling
It will allow adding source locations that point to the nir_string to
the shader.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:14 +00:00
Konstantin Seurer aaf65d6219 radv: Store debug info inside radv_shader
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:14 +00:00
Konstantin Seurer 54c22656b8 radv: Add a helper for accessing the shader binary
Use pointers into the blob instead of hardcoding the layout everywhere.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:13 +00:00
Konstantin Seurer 69ebba82d4 aco: Pass debug information to the driver
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:13 +00:00
Konstantin Seurer f8ef1afec8 aco: Handle nir_debug_info_instr
Propagated debug info using p_debug_info and Program::debug_info.
Offsets into the shader binary are gathered during assembly.
This will be usefull for mapping back the disassembled shader to
nir, glsl or spirv.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:13 +00:00
Konstantin Seurer 7dd9840128 amd: Add ac_shader_debug_info
This is very similar to nir_debug_info_instr but it can exist outside of
a nir shader.

Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29298>
2024-11-11 08:39:13 +00:00
Konstantin 4d09cd7fa5 nir/lower_non_uniform_access: Group accesses using the same resource
Avoids emitting the waterfall loop for every access if they use the same
resource:

waterfall_loop {
   access
}
waterfall_loop {
   access
}

->

waterfall_loop {
   access
   access
}

Totals from 276 (0.33% of 84770) affected shaders:
MaxWaves: 3360 -> 3356 (-0.12%)
Instrs: 3759927 -> 3730650 (-0.78%)
CodeSize: 21125784 -> 20899580 (-1.07%)
VGPRs: 23096 -> 23104 (+0.03%)
Latency: 35593716 -> 35315455 (-0.78%); split: -0.78%, +0.00%
InvThroughput: 7353071 -> 7297309 (-0.76%); split: -0.76%, +0.00%
VClause: 120983 -> 118579 (-1.99%)
SClause: 113073 -> 110671 (-2.12%)
Copies: 358272 -> 348686 (-2.68%)
Branches: 166706 -> 159500 (-4.32%)
PreSGPRs: 18598 -> 18596 (-0.01%)
PreVGPRs: 21417 -> 21424 (+0.03%); split: -0.01%, +0.04%
VALU: 2354862 -> 2350053 (-0.20%)
SALU: 582291 -> 567638 (-2.52%)
SMEM: 139875 -> 137473 (-1.72%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30509>
2024-11-11 07:53:13 +00:00
Konstantin Seurer c5e40a60f8 radv: Lower non-uniform access after vectorization
Scalar access can make nir_lower_non_uniform_access emit a lot of
waterfall loops.

Totals from 83 (0.10% of 84770) affected shaders:
Instrs: 2747926 -> 2745959 (-0.07%); split: -0.07%, +0.00%
CodeSize: 15022460 -> 14998240 (-0.16%); split: -0.16%, +0.00%
Latency: 18602932 -> 18404976 (-1.06%); split: -1.18%, +0.12%
InvThroughput: 4500730 -> 4450364 (-1.12%); split: -1.18%, +0.06%
VClause: 93651 -> 91848 (-1.93%); split: -1.93%, +0.00%
SClause: 63672 -> 63595 (-0.12%); split: -0.13%, +0.00%
Copies: 229377 -> 229896 (+0.23%); split: -0.04%, +0.27%
Branches: 107630 -> 107627 (-0.00%); split: -0.01%, +0.00%
PreSGPRs: 5247 -> 5253 (+0.11%)
PreVGPRs: 5911 -> 5903 (-0.14%); split: -0.29%, +0.15%
VALU: 1761158 -> 1761540 (+0.02%); split: -0.01%, +0.03%
SALU: 419743 -> 419783 (+0.01%); split: -0.01%, +0.02%
VMEM: 152142 -> 150208 (-1.27%)
SMEM: 80251 -> 80244 (-0.01%)

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30509>
2024-11-11 07:53:13 +00:00
Konstantin Seurer d44f74896e nir: Add missing access flags to print_access
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30509>
2024-11-11 07:53:13 +00:00
Konstantin Seurer 01ca436263 util: Fix some brackets in util_dynarray_.*_ptr
Fixes a compiler error when directly accessing members of the returned
pointer.

Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30509>
2024-11-11 07:53:13 +00:00
Visan, Tiberiu d379a3a428 amd/vpelib: remove luma offset (#459)
\[WHY\]
Shader and VPE does not apply brightness adjs in the same manner

\[HOW\]
Removed luma offset added in VPE

\[TESTING\]
Tested on real time video rendering

Co-authored-by: Tiberiu Visan <tvisan@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Reviewed-by: Navid Assadian <Navid.Assadian@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32075>
2024-11-11 13:00:54 +08:00
Visan, Tiberiu 2172ab2c2a amd/vpelib: patch to match shader (#456)
\[WHY\]
Shader and VPE had different behavior while adjusting the brightness

\[HOW\]
Apply the same normalization factor

\[TESTING\]
Tested on real video outputs

Co-authored-by: Tiberiu Visan <tvisan@amd.com>
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32075>
2024-11-11 13:00:44 +08:00
Leder, Brendan Steve 891c4694ba amd/vpelib: Refactor OCSC and update missing check
Missing check for 601 in limited format check, updated that.
Refactored OCSC to use specific limited depths.
Cleaned up general color processing.

Co-authored-by: Brendan <breleder@amd.com>
Reviewed-by: Jesse Agate <Jesse.Agate@amd.com>
Reviewed-by: Krunoslav Kovac <Krunoslav.Kovac@amd.com>
Acked-by: Chenyu Chen <Chen-Yu.Chen@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32075>
2024-11-11 13:00:29 +08:00
Martin Roukala (né Peres) dc1fe83aa5 zink/ci: document new-ish vangogh flakes
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32071>
2024-11-10 07:21:41 +02:00
Marek Olšák 1299f5c50a gallium/radeon: import libdrm_radeon source code, drop the dependency
Only radeon_surface.h/c is used from libdrm and radeon_drm.h is imported
too. This code doesn't change anymore. We don't need the dependency.

Acked-by: Pavel Ondračka <pavel.ondracka@gmail.com>
Acked-by: Gert Wollny <gert.wollny@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31827>
2024-11-10 00:52:18 +00:00
Russell Greene ae9d365686 perfetto: fix macos compile
On macos, <sys/types.h> does not declare clockid_t,
but it's instead in <time.h>, which also includes
<sys/types.h> on Linux, so just include <time.h> on
all UNIX platforms.

Fixes: a871eabc
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/12064
Tested-by: Vinson Lee <vlee@freedesktop.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31881>
2024-11-09 09:23:22 +00:00
Deborah Brouwer e368623fff freedreno/ci: add prefix for a630-vk-asan tests
Currently a630-vk-asan has separate files for its expected failures and
skips, but by using the deqp-runner prefix option, the job can use the
common a630 expectation files. This simplifies `a630-vk-asan` without any
substantive changes to the ci job.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31970>
2024-11-09 08:15:36 +00:00
Alyssa Rosenzweig 0a81434adf agx: rewrite address mode lowering
AGX load/stores supports a single family of addressing modes:

   64-bit base + sign/zero-extend(32-bit) << (format shift + optional shift)

This is a base-index addressing mode, where the index is minimally in elements
(never bytes, unless we're doing 8-bit load/stores). Both base and the resulting
address must be aligned to the format size; the mandatory shift means that
alignment of base is equivalent to alignment of the final address, which is
taken care of by lower_mem_access_bit_size anyhow.

The other key thing to note is that this is a 64-bit shift, after the sign- or
zero-extension of the 32-bit index. That means that AGX does NOT implement

   64-bit base + sign/zero-extend(32-bit << shift)

This has sweeping implications.

For addressing math from C-based languages (including OpenCL C), the AGX mode is
more helpful, since we tend to get 64-bit shifts instead of 32-bit shifts.
However, for addressing math coming from GLSL, the AGX mode is rather annoying
since we know UBOs/SSBOs are at most 4GB so nir_lower_io & friends are all
32-bit byte indexing. It's tricky to teach them to do otherwise, and would not
be optimal either since 64-bit adds&shifts are *usually* much more expensive
than 32-bit on AGX *except* for when fused into the load/store.

So we don't want 32-bit NIR, since then we can't use the hardware addressing
mode at all. We also don't want 64-bit NIR, since then we have excessive 64-bit
math resulting from deep deref chains from complex struct/array cases. Instead,
we want a middle ground: 32-bit operations that are guaranteed not to overflow
32-bit and can therefore be losslessly promoted to 64-bit.

We can make that no-overflow guarantee as a consequence of the maximum UBO/SSBO
size, and indeed Mesa relies on this already all over the place. So, in this
series, we use relaxed amul opcodes for addressing
math. Then, we rewrite our address mode pattern matching to fuse AGX address
modes.

The actual pattern matching is rewritten. The old code was brittle handwritten
nir_scalar chasing, based on a faulty model of the hardware (with the 32-bit
shift). We delete it all, it's broken. In the new approach, we add some NIR
pseudo-opcodes for address math (ulea_agx/ilea_agx) which we pattern match with
NIR algebraic rules. Then the chasing required to fuse LEA's into load/stores is
trivial because we never go deeper than 1 level. After fusing, we then lower the
leftover lea/amul opcodes and let regular nir_opt_algebraic take it from
here.

We do need to be very careful around pass order to make sure things like
load/store vectorization still happen. Some passes are shuffled in this commit
to make this work. We also need to cleanup amul before fusing since we
specifically do not have nir_opt_algebraic do so - the entire point of the
pseudo-opcodes is to make nir_opt_algebraic ignore the opcodes until we've had a
chance to fuse. If we simply used the .nuw bit on iadd/imul, nir_opt_algebraic
would "optimize" things and lose the bit and then we would fail to fuse
addressing modes, which is a much more expensive failure case than anything
nir_opt_algebraic can do for us. I don't know what the "optimal" pass order for
AGX would look like at this point, but what we have here is good enough for now
and is a net positive for shader-db.

That all ends up being much less code and much simpler code, while fixing the
soundness holes in the old code, and also optimizing a significantly richer set
of addressing calculations. Now we don't juts optimize GL/VK modes, but also CL.
This is crucial even for GL/VK performance, since we rely on CL via libagx even
in graphics shaders.

Terraintessellation is up 10% to ~310fps, which is quite nice.

The following stats are for the end of the series together, including this
change + libagx change + the NIR changes building up to this... but not
including the SSBO vectorizer stats or the IC modelling fix. In other words,
these are the stats for "rewriting address mode handling". This is on OpenGL,
and since the old code was targeted at GL, anything that's not a loss is good
enough - we need this for the soundness fix regardless.

total instructions in shared programs: 2751356 -> 2750518 (-0.03%)
instructions in affected programs: 372143 -> 371305 (-0.23%)
helped: 715
HURT: 75
Instructions are helped.

total alu in shared programs: 2279559 -> 2278721 (-0.04%)
alu in affected programs: 304170 -> 303332 (-0.28%)
helped: 715
HURT: 75
Alu are helped.

total fscib in shared programs: 2277843 -> 2277008 (-0.04%)
fscib in affected programs: 304167 -> 303332 (-0.27%)
helped: 715
HURT: 75
Fscib are helped.

total ic in shared programs: 632686 -> 621886 (-1.71%)
ic in affected programs: 113078 -> 102278 (-9.55%)
helped: 1159
HURT: 82
Ic are helped.

total bytes in shared programs: 21489034 -> 21477530 (-0.05%)
bytes in affected programs: 3018456 -> 3006952 (-0.38%)
helped: 751
HURT: 107
Bytes are helped.

total regs in shared programs: 865148 -> 865114 (<.01%)
regs in affected programs: 1603 -> 1569 (-2.12%)
helped: 10
HURT: 9
Inconclusive result (value mean confidence interval includes 0).

total uniforms in shared programs: 2120735 -> 2120792 (<.01%)
uniforms in affected programs: 22752 -> 22809 (0.25%)
helped: 76
HURT: 49
Inconclusive result (value mean confidence interval includes 0).

total threads in shared programs: 27613312 -> 27613504 (<.01%)
threads in affected programs: 1536 -> 1728 (12.50%)
helped: 3
HURT: 0

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig d466ccc6bd libagx: promote math to use AGX address mode
we want to fit into the 64 + ext() << #n pattern to let us fuse address
arithmetic into our loads, so rework some libagx addressing to better match that

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig 77ce91e99b hk: reduce max SSBO size
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig 01d2aa1d53 agx: fix bfeil timing
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00
Alyssa Rosenzweig db8d467ec6 agx: model IC dispatch
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31964>
2024-11-08 21:15:42 -04:00