Otherwise, a noop FS will be always compiled during linking if not
provided by the application and that is too slow for fast-linking.
This should be improved to use a global noop FS but it's really tricky
because NIR linking doesn't do anything when the next stage is unknown,
and hence doesn't remove unused varyings.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21042>
compute programs can be reused across contexts, which means storing any
pointers directly like this is going to lead to desync and crash
instead, make this like regular descriptor templates and calculate the offset
from the current context to ensure that everything works as it should
fixes#8201
Fixes: 7ab5c5d36d ("zink: use EXT_descriptor_buffer with ZINK_DESCRIPTORS=db")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21020>
So we're not tied to the macOS or Linux UAPIs and are not translating awkwardly
from one to the other when creating BOs. They're not quite equivalent -- macOS
doesn't include writeback information in this flag field, and Linux doesn't have
a executable flag. (Maybe we should add one, though? Then we can enforce W^X.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21058>
On sway+xwayland, both explicit and implicit modifiers are advertised.
While dri3proto says nothing about it, zwp_linux_dmabuf_v1 says
A compositor that sends valid modifiers and DRM_FORMAT_MOD_INVALID for
a given format supports both explicit modifiers and implicit
modifiers.
"glmark2 -b build:model=bunny --fullscreen" goes from 468 to 598fps on
a618 @ 2160x1440.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20892>
We lower NIR's load_constant to load_global_constant, which uses A64
bindless messages. As such, we do the following math to produce the
address for each load:
base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW
base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH
base@64 <- pack_64_2x32_split(base_lo, base_hi)
addr@64 <- iadd(base@64, u2u64(offset@32))
On platforms that emulate 64-bit math, we have to emit additional code
for the 64-bit iadd to handle the possibility of a carry happening and
affecting the top bits.
However, NIR constant data is always uploaded adjacent to the shader
assembly, in the same buffer. These buffers are required to live in a
4GB region of memory starting at Instruction State Base Address. We
always place the base address at a 4GB address. So the constant data
always lives in a buffer entirely contained within a 4GB region, which
means any offsets from the start of the buffer cannot possibly affect
the high bits.
So instead, we can simply do a 32-bit addition between the low bits of
the base and the offset, then pack that with the unchanged high bits.
On anv, INSTRUCTION_STATE_POOL_MIN_ADDRESS is 8GB, so the high bits are
always 0x2. We don't even need to patch that portion of the address and
can just use an immediate value. We do still need to pack, however.
fossil-db on Icelake indicates the following for affected shaders:
Instrs: 10830023 -> 10750080 (-0.74%)
Cycles: 1048521282 -> 1046770379 (-0.17%); split: -0.33%, +0.16%
Subgroup size: 103104 -> 103112 (+0.01%)
Send messages: 570886 -> 570760 (-0.02%)
Loop count: 14428 -> 14429 (+0.01%)
Spill count: 14246 -> 14244 (-0.01%); split: -0.06%, +0.04%
Fill count: 22802 -> 22794 (-0.04%); split: -0.04%, +0.01%
Scratch Memory Size: 654336 -> 662528 (+1.25%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
We lower NIR's load_constant to load_global_constant, which uses A64
bindless messages. As such, we do the following math to produce the
address for each load:
base_lo@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_LOW
base_hi@32 <- BRW_SHADER_RELOC_CONST_DATA_ADDR_HIGH
base@64 <- pack_64_2x32_split(base_lo, base_hi)
addr@64 <- iadd(base@64, u2u64(offset@32))
On platforms that emulate 64-bit math, we have to emit additional code
for the 64-bit iadd to handle the possibility of a carry happening and
affecting the top bits.
However, NIR constant data is always uploaded adjacent to the shader
assembly, in the same buffer. These buffers are required to live in a
4GB region of memory starting at Instruction State Base Address. We
always place the base address at a 4GB address. So the constant data
always lives in a buffer entirely contained within a 4GB region, which
means any offsets from the start of the buffer cannot possibly affect
the high bits.
So instead, we can simply do a 32-bit addition between the low bits of
the base and the offset, then pack that with the unchanged high bits.
On iris, IRIS_MEMZONE_SHADER is at [0, 4GB) so the high bits are always
zero. We don't even need to patch that portion of the address and can
simply use u2u64 to promote the 32-bit add result to a 64-bit value
where the top bits are 0.
shader-db on Icelake indicates that this:
- Helps instructions: -1.13% in 135 affected programs
- Helps spills/fills: -4.08% / -4.18% in 4 affected programs
- Gains us 1 SIMD16 compute shader instead of SIMD8
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20999>
They got accidentally disabled entirely, so they didn't block merge, but
once they re-enable then they'll block us again. The problem was that I
moved allow_failure to a .performance-rules section, but we only ever
inherit the rules from that location, not the rest of yml.
This is basically a revert of 67547a04b6 ("ci: Move the performance
jobs' allow_failure:true to the gl rules."), though I still keep the
allow_failure in a more common location with comments, since perf jobs are
a huge trap.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21002>
Complicated CFG and lots of SALU can cause this to take an extremely long
time to finish.
Fixes
dEQP-VK.graphicsfuzz.cov-value-tracking-selection-dag-negation-clamp-loop
and Monster Hunter Rise demo compile times.
fossil-db (gfx1100):
Totals from 57 (0.04% of 134574) affected shaders:
Instrs: 170919 -> 171165 (+0.14%)
CodeSize: 860144 -> 861128 (+0.11%)
Latency: 961466 -> 961505 (+0.00%)
InvThroughput: 127598 -> 127608 (+0.01%)
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8153
Fixes: 5806f0246f ("aco/gfx11: workaround VALUPartialForwardingHazard")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20941>
u_vector_add() don't keep the returned pointers valid.
After the initial size allocated in u_vector_init() is reached it will
allocate a bigger buffer and copy data from older buffer to the new
one and free the old buffer, making all the previous pointers returned
by u_vector_add() invalid and crashing the application when trying to
access it.
This is reproduced when running
dEQP-VK.synchronization.signal_order.timeline_semaphore.* in DG2 SKUs
that has 4 CCS engines, INTEL_COMPUTE_CLASS=1 is set and of course
perfetto build is enabled.
To fix this issue here I'm moving the storage/allocation of
struct intel_ds_queue to struct anv_queue/iris_batch and using
struct list_head to maintain a chain of intel_ds_queue of the
intel_ds_device.
This allows us to append or remove queues dynamically in future if
necessary.
Fixes: e760c5b37b ("anv: add perfetto source")
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Signed-off-by: José Roberto de Souza <jose.souza@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/20977>