On AMD, this is a clear win. 2.0 is a free constant,
the multiplication can be fused into fma, or it can
be done as a free output modifier. Otherwise, fmul
and fadd have the same throughput/latency, with the only
possible downside being potentially power consumption.
For other hardware this might not be as clear,
but we should at least choose one option for NIR because
it allows more CSE.
Foz-DB Navi21:
Totals from 12231 (15.41% of 79395) affected shaders:
MaxWaves: 309068 -> 309032 (-0.01%)
Instrs: 11826395 -> 11790132 (-0.31%); split: -0.31%, +0.00%
CodeSize: 63531496 -> 63512868 (-0.03%); split: -0.03%, +0.00%
VGPRs: 551256 -> 551328 (+0.01%); split: -0.00%, +0.02%
SpillSGPRs: 984 -> 979 (-0.51%)
Latency: 88486492 -> 88394296 (-0.10%); split: -0.11%, +0.01%
InvThroughput: 22360595 -> 22300790 (-0.27%); split: -0.27%, +0.00%
VClause: 226267 -> 226273 (+0.00%); split: -0.01%, +0.01%
SClause: 293820 -> 293783 (-0.01%); split: -0.02%, +0.00%
Copies: 727187 -> 727106 (-0.01%); split: -0.03%, +0.02%
PreSGPRs: 539623 -> 539625 (+0.00%)
PreVGPRs: 440843 -> 440946 (+0.02%); split: -0.00%, +0.03%
VALU: 8324962 -> 8288809 (-0.43%); split: -0.43%, +0.00%
SALU: 1278550 -> 1278538 (-0.00%); split: -0.00%, +0.00%
VMEM: 440600 -> 440599 (-0.00%)
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32989>
The drm device file is closed and re-opened between perfetto traces. So
we either need to re-create the fd_device, or use fd_device_new_dup() so
we hold on to our own fd. The former is preferred so that the kernel
can realize when we are no longer reading the perfcntrs.
Signed-off-by: Rob Clark <robdclark@chromium.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33073>
Requests with no reply will report errors by default to the event
loop, which then usually cause the not very useful log like this
to be printed:
X Error of failed request: BadAlloc (insufficient resources for operation)
Major opcode of failed request: 149 ()
Minor opcode of failed request: 2
Serial number of failed request: 33
Current serial number in output stream: 34
This commit introduce some helpers to handle the xcb errors in Mesa,
and be able to report errors properly.
For instance the same error will now log:
MESA: error: dri3_alloc_render_buffer:1634 xcb_dri3_pixmap_from_buffer[s] failed
MESA: error: X error: 11
It's not fixing the underlying issue, but at least now tests like
"glx-visuals-stencil -pixmap" and "glx-visuals-depth pixmap" fail
properly.
cc: mesa-stable
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Eric Engestrom <eric@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33036>
This allows us to drop the reboot condition on SaharaMode which means
that every time the reboot pattern is hit, it is due to a test
execution issue (GPU hang) and not something unrelated to the job.
Additionally, this allows us to try booting up to 5 times which should
help boot reliability.
Signed-off-by: Martin Roukala (né Peres) <martin.roukala@mupuf.org>
Reviewed-by: Eric Engestrom <None>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32572>
Indeed, has_streamout is not yet properly initialized at the time
of the call of r600_init_screen_caps(). This change fixes this
issue.
Here is the issue visible on palm at the glxinfo level; the right column is affected:
Preferred profile: core (0x1) | Preferred profile: compat (0x2)
Max core profile version: 4.5 | Max core profile version: 0.0
Max compat profile version: 4.5 | Max compat profile version: 2.1
Max GLES1 profile version: 1.1 Max GLES1 profile version: 1.1
Max GLES[23] profile version: 3.1 | Max GLES[23] profile version: 2.0
Fixes: 7cd606f01b ("r600: add r600_init_screen_caps")
Signed-off-by: Patrick Lerda <patrick9876@free.fr>
Reviewed-by: Qiang Yu <yuq825@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33106>
It's not rare when only last few draws in a big renderpass disable
LRZ, we shouldn't bail out in such case.
If LRZ is disabled in dir tracking bit during binning - LRZ would
be disabled for the whole IB in the tiling step, so we should avoid
disabling via dir tracking bit and track the state inside the driver.
This doesn't work with secondary command buffers (and renderpass
resume/suspend), in such cases we have to disable LRZ via dir tracking
bit, if LRZ is not valid.
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32868>
If blend LRZ state was already calculated from static info,
re-calculating it with dynamic state would bring stale values
and therefor result in a wrong calculations.
This resulted in LRZ being disabled when it should have not in
native VK titles.
CC: mesa-stable
Signed-off-by: Danylo Piliaiev <dpiliaiev@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32868>
Currently, when allocating the dst of shared collects, the registers of
its srcs would only be reused if they are killed. This results in a lot
of needless moves due to suboptimal register allocation.
This commit addresses this generically: allow unavailable registers to
be used for a new dst iff it shares a merge set with, and has the same
offset as, the currently live value.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32021>
Detecting non-trivial collects after the fact, i.e., once they are
created by a register assignment of their dst, does not work as this may
cause two different intervals to share a physreg. For example:
_meta:collect sssa_361:150(r50.x) (wrmask=0xf),
sssa_19:12(r50.x), sssa_103:13(r50.y),
sssa_355:102(r51.z), sssa_356:103(r51.w)
This is a non-trivial collect with a partial overlap with one of its
child intervals. After moving its dst to a new interval, it will have
the same physreg as the existing interval for sssa_19, causing all sorts
of trouble for RA.
Prevent this by detecting that a future collect may become non-trivial
at the moment one of its sources gets a register assignment that does
not correspond with it merge set's preferred reg and allocating a new
interval for this component.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Fixes: b36a7ce0f1 ("ir3/ra: prevent moving source intervals for shared collects")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32021>
The userq packets are added using _pkt_begin(), _pkt_add(), _pkt_end()
functions. As of now _pkt_being() and _pkt_add() is called once. It
is not advisible to update wptr value in mqd multiple times. Hence use
next_wptr as cache in the macros and update mqd mptr before job submission
only once.
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32700>
The signal ioctl should only be called after guaranteeing that the hardware
started working on the submissions and that is only after doorbell is ringed.
Otherwise it can in theory happen that the application creates the fence and
is then interrupted before ringing the doorbell. That can result in a GPU
reset because the fence times out.
Suggested-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/32700>
Determine whether the device has hardware raytracing support early, and
then use this result where needed, instead of checking for `gfx_level`
every time.
This is a prerequisite for CYAN_SKILLFISH chip enablement. This chip is
still GFX10, not GFX10_3, but has hardware support for accelerated
`image_bvh{,64}_intersect_ray` instructions. Just checking for `gfx_level`
is insufficient for it.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/33109>