Commit Graph

202095 Commits

Author SHA1 Message Date
Alyssa Rosenzweig
479d2ab53e libagx: fix wraparound issue with robust draw kernel
fixes dEQP-VK.robustness.index_access.draw_multi_indexed_2 with hard faults.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
bec073d3ca libagx: fix subgroup id confusion
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
4949ae3920 asahi: switch tib lower to intrinsic pass
fixes metadata issue.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
e203d04f43 asahi: use NIR_PASS to validate more
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
290b8da8b6 asahi: perf debug indirect tess
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
3060b471b5 libagx: add missing null pointer check
fixes KHR-GL46.pipeline_statistics_query_tests_ARB.functional_tess_queries
with hard fault

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
c7a8200dcd hk: don't allocate zero sink
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
f0d680437f hk: use zero sink for null index buffer
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
eff6b884cb asahi: use zero sink for vbuf
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
c14df405b9 libagx: use zero page
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
04eb91c68b asahi: bind zero-page
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
3adbf53ed6 hk: do not incorrectly offset host-image-copy sources
the source is indexed from layer 0, the dest image is indexed from whatever the
base layer is. fixes new CTS dEQP-VK.image.host_image_copy.array.*

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
4559cdb94b hk: fix increment CS invs
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Janne Grunau
27d27e08ea hk: Use rowPitch from VkImageDrmFormatModifierExplicitCreateInfoEXT
Imported linear images may have an arbitrary row pitch. As long as it is
aligned to 16 agx can support. Initialize `.linear_stride_B` from the
supplied parameter and let ail verify it.

Fixes gtk dmabuf based tests with a pitch aligned to 256.

Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
5d9e600ce9 hk: implement calibrated timestamps
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Janne Grunau
9c704dd759 hk: Replace alloca with malloc in queue_submit
`command_count` is under control of the vulkan application and can
become quite large. At a command count around 30000 the size of the
alloca() allocated buffers exceeds the default stack size of 16MB.

Fixes fixes segfaults in 'gtk:compare vulkan lots-of-offscreens-nogl*'
gtk 4 test cases which end up with a `command_count` around 32768.

Fixes: https://gitlab.freedesktop.org/asahi/mesa/-/issues/47
Signed-off-by: Janne Grunau <j@jannau.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
3e9f70570a asahi: fix cull distance with GS
no, I don't know how this worked before.

fixes KHR-GL46.cull_distance.functional with nir_opt_varyings changes but
this seemed to be passing just by luck otherwise.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
cc04a65828 asahi: fix libwrap.dylib
libwrap.dylib is helpful to trace control streams on macOS. When it was
originally implemented, we..

* supported macOS in our OpenGL driver and needed to actually exercise these
  interfaces
* didn't have Linux support or hypervisor support or anything so needed the
  traces to be utterly thorough
* only had a single macOS version to worry about

The landscape today is very different

* no macOS support in our driver stack
* we can trace registers via the hypervisor - libwrap.dylib is no longer
  "correctness" bearing, it's just a convenience tool
* what counts is the hardware side - tracing all the macOS software structs is
  not actually useful, the hypervisor is the right place to grab control regs
* piles of macOS versions, this code only ever worked properly on 11.x and 12.x,
  but with m4 r/e coming up soon we need a lot more versions working.

So... we keep around libwrap.dylib, but slim it down to only decode the bare
minimum of macOS versioned structures, just enough to grab the control stream
pointer and dump that. This is a loss of functionality around CRs (but we have the
hypervisor as a much better way to grab CRs). In exchange it makes the code much
more manageable and less likely to break every 6 months.

So in exchange for all this deletion we also get things working again, this time
on 13.x. But porting back to 12.x or 11.x would be a very small diffstat given
the reduced focus of the new code.

Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Alyssa Rosenzweig
07a2abd14d asahi: clang-format
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33682>
2025-02-22 02:24:28 +00:00
Paulo Zanoni
1d23cf192b brw: don't mark instructions read from text assembly as compacted
I dumped assembly generated by our driver with INTEL_DEBUG=shaders,
copied and pasted it into a lua file, tried to run it with
src/intel/executor, but the disassembler started telling me some
instructions were invalid.

This happened because we print the "compacted" flag in our assembly
text, so when brw_gram.y parses our assembly flag, it sees the
"compacted" flag and sets it to the instruction by calling
add_instruction_option(). But the executor tool never sets the
BRW_ASSEMBLE_COMPACT flag when it calls brw_assemble(), so when
brw_assemble() calls dump_assembly(), which calls brw_disassbemble(),
the disassembler gets confused and prints misinterpreted instructions
and calls them invalid.

It is not the job of brw_gram.y (our text assembly parser) to mark
instructions as compacted.  Whatever is later assembling the
instruction is the entity that should decide if the instructions are
compacted or not. So in this patch we just ignore this flag.

Reviewed-by: Caio Oliveira <caio.oliveira@intel.com>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33614>
2025-02-22 00:38:53 +00:00
Dave Airlie
c49423ca2c vulkan/wsi/x11: don't use update_region for damage if not created
If we don't have a region in the X no MIT-SHM case don't go using
the damage call set region.

Fixes: bbdf7e45b1 ("wsi/x11: Hook up KHR_incremental_present")
Reviewed-by: Adam Jackson <ajax@redhat.com>
Acked-by: Michel Dänzer <mdaenzer@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33592>
2025-02-21 21:41:58 +00:00
Valentine Burley
b331713f20 ci: Use new kernel that supports more Mediatek devices
The only change since the previous kernel is that the new one includes
the device tree blobs for the mt8195-cherry-tomato-r2 and
mt8186-corsola-steelix-sku131072 devices.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33606>
2025-02-21 14:52:57 +00:00
Valentine Burley
c45d7dffca intel/ci: Update GuC firmware for ADL-S and ADL-N
Certain ADL devices, like nissa, use the tgl_guc_70.bin firmware.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33606>
2025-02-21 14:52:56 +00:00
Valentine Burley
eb5bd3bee2 ci: Don't download the kernel image in lava_build.sh
The kernel+rootfs jobs previously downloaded the prebuilt kernel iamge,
but this was unnecessary as LAVA doesn't use them here, and the images
were never uploaded to S3. LAVA acquires the kernel in lava_submit.sh,
and baremetal downloads the required images and dtbs in baremetal_build.sh.

The kernel modules are still required for some devices.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33606>
2025-02-21 14:52:56 +00:00
Valentine Burley
5b65bbf72c ci: Simplify downloading kernel for crosvm
Directly download the kernel instead of using the
download-prebuilt-kernel.sh script.
Save the kernel to /kernel for clarity, replacing the previous
/lava-files directory.

Signed-off-by: Valentine Burley <valentine.burley@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33606>
2025-02-21 14:52:56 +00:00
Mike Blumenkrantz
d979cd8d9d zink: support cl_gl_sharing if dmabuf is supported
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33652>
2025-02-21 14:18:44 +00:00
Mike Blumenkrantz
93cd4ae0c0 zink: verify that adding a dmabuf bind actually chooses a modifier
this at least provides some checking to catch cases where something
stupid happens and it does a fallback to linear

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33652>
2025-02-21 14:18:44 +00:00
Mike Blumenkrantz
5176370694 zink: handle buffer import/export
just noping out of some image codepaths

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33652>
2025-02-21 14:18:44 +00:00
Mike Blumenkrantz
f7002369fa zink: wait on tc fence before checking for fd semaphore
this forces sync with pending flushes

cc: mesa-stable

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33652>
2025-02-21 14:18:44 +00:00
Daniel Schürmann
df2697c9ab aco/scheduler: remove unused include of unordered_set
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00
Daniel Schürmann
93872270f0 aco/scheduler: keep track of RegisterDemand at DownwardsCursor::insert_idx{_clause}
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00
Daniel Schürmann
f58654e98f aco/scheduler: keep track of RegisterDemand at UpwardsCursor::insert_idx
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00
Daniel Schürmann
52253da783 aco: unify get_addr_sgpr_from_waves() and get_addr_vgpr_from_waves() into one function
which returns the limit as RegisterDemand.

Also remove the unused get_extra_sgprs() from aco_ir.h.

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00
Daniel Schürmann
6ea9443726 aco/scheduler: stop rounding down the target number of waves on GFX10+
This way, it can make use of uneven wave numbers.

Totals from 4078 (5.14% of 79395) affected shaders: (Navi21)
MaxWaves: 58715 -> 65460 (+11.49%); split: +11.49%, -0.01%
Instrs: 5033684 -> 5048244 (+0.29%); split: -0.09%, +0.38%
CodeSize: 26833884 -> 26898780 (+0.24%); split: -0.07%, +0.32%
VGPRs: 302360 -> 265312 (-12.25%); split: -12.26%, +0.01%
Latency: 34636448 -> 36044242 (+4.06%); split: -0.08%, +4.14%
InvThroughput: 7999403 -> 7662697 (-4.21%); split: -4.55%, +0.34%
VClause: 105403 -> 111996 (+6.26%); split: -0.40%, +6.66%
SClause: 132996 -> 133460 (+0.35%); split: -0.81%, +1.16%
Copies: 297036 -> 308122 (+3.73%); split: -0.64%, +4.37%
Branches: 89376 -> 89390 (+0.02%); split: -0.00%, +0.02%
VALU: 3477621 -> 3488510 (+0.31%); split: -0.05%, +0.36%
SALU: 484211 -> 484191 (-0.00%); split: -0.08%, +0.08%

Totals from 1840 (2.32% of 79395) affected shaders: (Navi31)

MaxWaves: 30714 -> 34182 (+11.29%)
Instrs: 3102955 -> 3131001 (+0.90%); split: -0.05%, +0.95%
CodeSize: 16160564 -> 16273100 (+0.70%); split: -0.04%, +0.74%
VGPRs: 174540 -> 150600 (-13.72%)
Latency: 23521914 -> 24515055 (+4.22%); split: -0.07%, +4.29%
InvThroughput: 4373397 -> 4202912 (-3.90%); split: -4.40%, +0.50%
VClause: 59087 -> 64091 (+8.47%); split: -0.24%, +8.71%
SClause: 74844 -> 75366 (+0.70%); split: -0.53%, +1.22%
Copies: 184396 -> 197747 (+7.24%); split: -0.25%, +7.49%
Branches: 46015 -> 46028 (+0.03%); split: -0.00%, +0.03%
VALU: 1929286 -> 1942709 (+0.70%); split: -0.02%, +0.71%
SALU: 216126 -> 215983 (-0.07%); split: -0.18%, +0.12%
VOPD: 1216 -> 1217 (+0.08%); split: +1.40%, -1.32%
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:41 +00:00
Daniel Schürmann
676b39d31f aco/scheduler: always respect min_waves on GFX10+
It could theoretically happen that for large workgroups,
the scheduler used more registers than allowed.

No fossil changes.

Cc: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33644>
2025-02-21 13:49:40 +00:00
Collabora's Gfx CI Team
9befbf54a6 Uprev Piglit to 04d901e49de6b650f9dceaf73220371273d87f73
fc8179d319...04d901e49d

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33457>
2025-02-21 11:53:36 +00:00
Danylo Piliaiev
763ddd0fd3 nir/nir_lower_multiview: Don't assert if load_deref doesn't have var
If deref chain has nir_deref_type_cast nir_intrinsic_get_var will
return null, which is valid for e.g. shader inputs, since the pass
only care about outputs.

NIR excerpt that caused issues:

```
    32x3    %6 = deref_cast (block *)%5 (ubo block)  (ptr_stride=0, align_mul=0, align_offset=0)
    32x3    %7 = deref_struct &%6->field0 (ubo vec4[4])  // &((block *)%5)->field0
    32      %8 = load_const (0x00000001)
    32x3    %9 = deref_array &(*%7)[1] (ubo vec4)  // &((block *)%5)->field0[1]
    32x4   %10 = @load_deref (%9) (access=none)
```

Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33647>
2025-02-21 11:09:22 +00:00
Daniel Stone
4f11b8d950 ci/zink: Expand flake definition on radv
We've seen a few variants of this now, so just mark them all as flaky.

Signed-off-by: Daniel Stone <daniels@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33546>
2025-02-21 09:22:03 +00:00
Erik Faye-Lund
fde6aeb886 mesa/main: wire up glapi bits for EXT_multi_draw_indirect
Turns out we were missing the glapi bits, making it impossible to use get
the function pointers for this extension. Whoops?!

[daniels: Squashed in a618 SkQP fails, presumably caused by these not
          being skipped anymore.]

Fixes: 9f5af68995 ("mesa/main: expose `EXT_multi_draw_indirect`")
Reviewed-by: Antonino Maniscalco <antomani103@gmail.com>
Tested-by: Chris Healy <healych@amazon.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33546>
2025-02-21 09:22:03 +00:00
Emma Anholt
2f57cf0323 egl: Retire NV_post_sub_buffer support.
It's never been ported to DRI3, but nobody seems to care.  Since DRI2 is
untested at this point, just drop the code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33517>
2025-02-21 02:50:56 +00:00
Emma Anholt
f6aa27a294 egl: Retire NOK_swap_region support.
It's never been ported to DRI3, but nobody seems to care.  Since DRI2 is
untested at this point, just drop the code.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33517>
2025-02-21 02:50:56 +00:00
Emma Anholt
58e73e792f egl: Apply autopep8.
My editor does this on save, so let's just apply it to EGL's python for
consistency.  The only exception is that the genCommon import needs the
sys.path.insert, so that part of autopep8 was reverted.

Reviewed-by: Yonggang Luo <luoyonggang@gmail.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33517>
2025-02-21 02:50:56 +00:00
Emma Anholt
34fe896715 docs: Drop some weird unhelpful text about DRI2.
Both instructions for building were the same, and there's not much sense
in calling out just xcb-dri2 out of all the deps there are.

Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33517>
2025-02-21 02:50:56 +00:00
Lorenzo Rossi
a3ddb223e2 nvk, nak: Implement shaderSharedInt64Atomics
Current nvidia devices miss support for 64-bit arithmetic atomics, we
replace them with compare-and-swap loops using nir_lower_atomics.

Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/10330
Signed-off-by: Lorenzo Rossi <snowycoder@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33572>
2025-02-21 00:33:17 +00:00
Lorenzo Rossi
26079c1a93 nir: support shared atomics in nir_lower_atomics
Add support to rewrite shared atomics into compare-and-swap loops,
previously the nir_lower_atomics pass only supported global and ssbo
atomics.

Only freedreno irc3 reuses nir_lower_atomics, this change does not
impact their usage since they do not support shared atomics.

Signed-off-by: Lorenzo Rossi <snowycoder@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33572>
2025-02-21 00:33:16 +00:00
Ian Romanick
15544ed858 nir/algebraic: Undistribute b2i from logic-ops
shader-db:
All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973309 -> 16973173 (<.01%)
instructions in affected programs: 13780 -> 13644 (-0.99%)
helped: 31 / HURT: 0

total cycles in shared programs: 915620550 -> 915618604 (<.01%)
cycles in affected programs: 185962 -> 184016 (-1.05%)
helped: 30 / HURT: 1

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748003 -> 209745278 (-0.00%)
Cycle count: 30514920400 -> 30514716506 (-0.00%); split: -0.00%, +0.00%
Max live registers: 65477183 -> 65477584 (+0.00%)
Non SSA regs after NIR: 237334710 -> 237333632 (-0.00%)

Totals from 1257 (0.18% of 706651) affected shaders:
Instrs: 693039 -> 690314 (-0.39%)
Cycle count: 39792504 -> 39588610 (-0.51%); split: -0.97%, +0.46%
Max live registers: 194170 -> 194571 (+0.21%)
Non SSA regs after NIR: 821978 -> 820900 (-0.13%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Ian Romanick
a48a044cf6 nir/algebraic: Simplify equality comparisons of b2T with 1 or 0
Adding the b2i(a) == 1 and b2i(a) != 1 patterns also helps prevent
regressions when spurious negations are removed from integer equality
comparisons, as is done in !33498.

v2: Make all variables part of the iteration instead of calculating some
of them. Suggested by Alyssa.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973331 -> 16973309 (<.01%)
instructions in affected programs: 266 -> 244 (-8.27%)
helped: 2 / HURT: 0

total cycles in shared programs: 915620774 -> 915620550 (<.01%)
cycles in affected programs: 4360 -> 4136 (-5.14%)
helped: 2 / HURT: 0

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748011 -> 209748003 (-0.00%)
Cycle count: 30514920286 -> 30514920400 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237334726 -> 237334710 (-0.00%)

Totals from 8 (0.00% of 706651) affected shaders:
Instrs: 16956 -> 16948 (-0.05%)
Cycle count: 261052 -> 261166 (+0.04%); split: -0.92%, +0.96%
Non SSA regs after NIR: 20000 -> 19984 (-0.08%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Ian Romanick
3f39d8f4ff nir/algebraic: Optimize zero comparisons of umax or umin
I observered some of the existing patterns stopped being applied after
some of the ult-to-ieq optimizations in !33498. It turns out that these
patterns occur even without those changes.

shader-db:

All Intel platforms had similar results. (Lunar Lake shown)
total instructions in shared programs: 16973339 -> 16973331 (<.01%)
instructions in affected programs: 7977 -> 7969 (-0.10%)
helped: 2 / HURT: 0

total cycles in shared programs: 915620938 -> 915620774 (<.01%)
cycles in affected programs: 136022 -> 135858 (-0.12%)
helped: 2 / HURT: 0

fossil-db:

Lunar Lake
Totals:
Instrs: 209748173 -> 209748011 (-0.00%); split: -0.00%, +0.00%
Cycle count: 30514361348 -> 30514920286 (+0.00%); split: -0.00%, +0.00%
Spill count: 511813 -> 511808 (-0.00%)
Fill count: 622537 -> 622533 (-0.00%)
Max live registers: 65477033 -> 65477183 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 237334728 -> 237334726 (-0.00%); split: -0.00%, +0.00%

Totals from 26 (0.00% of 706651) affected shaders:
Instrs: 332073 -> 331911 (-0.05%); split: -0.05%, +0.00%
Cycle count: 959758560 -> 960317498 (+0.06%); split: -0.03%, +0.09%
Spill count: 10293 -> 10288 (-0.05%)
Fill count: 23784 -> 23780 (-0.02%)
Max live registers: 9682 -> 9832 (+1.55%); split: -0.08%, +1.63%
Non SSA regs after NIR: 232135 -> 232133 (-0.00%); split: -0.03%, +0.03%

Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 233538532 -> 233536113 (-0.00%); split: -0.00%, +0.00%
Cycle count: 24428142259 -> 24426705655 (-0.01%); split: -0.01%, +0.00%
Spill count: 513128 -> 512923 (-0.04%)
Fill count: 557329 -> 557108 (-0.04%)
Max live registers: 42129806 -> 42129881 (+0.00%); split: -0.00%, +0.00%
Non SSA regs after NIR: 256711720 -> 256711718 (-0.00%); split: -0.00%, +0.00%

Totals from 26 (0.00% of 805759) affected shaders:
Instrs: 325629 -> 323210 (-0.74%); split: -0.74%, +0.00%
Cycle count: 893896782 -> 892460178 (-0.16%); split: -0.21%, +0.05%
Spill count: 10467 -> 10262 (-1.96%)
Fill count: 24291 -> 24070 (-0.91%)
Max live registers: 4946 -> 5021 (+1.52%); split: -0.08%, +1.60%
Non SSA regs after NIR: 232980 -> 232978 (-0.00%); split: -0.03%, +0.03%

Tiger Lake, Ice Lake, and Skylake had similar results. (Tiger Lake shown)
Totals:
Instrs: 237289818 -> 237289714 (-0.00%); split: -0.00%, +0.00%
Cycle count: 22959586058 -> 22960049302 (+0.00%); split: -0.00%, +0.00%
Max live registers: 42182257 -> 42182337 (+0.00%)
Non SSA regs after NIR: 255579974 -> 255579970 (-0.00%); split: -0.00%, +0.00%

Totals from 23 (0.00% of 802019) affected shaders:
Instrs: 27051 -> 26947 (-0.38%); split: -0.39%, +0.01%
Cycle count: 10545917 -> 11009161 (+4.39%); split: -0.09%, +4.49%
Max live registers: 2198 -> 2278 (+3.64%)
Non SSA regs after NIR: 31741 -> 31737 (-0.01%); split: -0.20%, +0.19%

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Ian Romanick
4311121e73 nir/algebraic: More (a == 0 || a == 1 || ...) patterns
At least some Total War: Warhammer3 vertex shaders associate the
comparisons differntly, so the existing patterns were not triggered.

No shader-db changes on any Intel platform.

fossil-db:

All Intel platforms had similar results. (Lunar Lake shown)
Totals:
Instrs: 209748654 -> 209748173 (-0.00%)
Cycle count: 30514333964 -> 30514361348 (+0.00%); split: -0.00%, +0.00%
Fill count: 622688 -> 622537 (-0.02%)
Max live registers: 65477039 -> 65477033 (-0.00%)
Non SSA regs after NIR: 237334768 -> 237334728 (-0.00%)

Totals from 512 (0.07% of 706651) affected shaders:
Instrs: 1000693 -> 1000212 (-0.05%)
Cycle count: 42174312 -> 42201696 (+0.06%); split: -0.15%, +0.21%
Fill count: 11456 -> 11305 (-1.32%)
Max live registers: 121599 -> 121593 (-0.00%)
Non SSA regs after NIR: 1253445 -> 1253405 (-0.00%)

Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33648>
2025-02-21 00:01:11 +00:00
Eric R. Smith
414dba9f5c panfrost: use an accessor function to read from bi_opcode_props
Use an accessor function to read opcode properties or to change the
opcode. This would allow for different instruction descriptions to
be used for different architectures. Not necessary now, but may
be useful groundwork.

Reviewed-by: Erik Faye-Lund <erik.faye-lund@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29765>
2025-02-20 23:33:00 +00:00