Previously, there was some mixing up of alignments between the alignment
provided by the caller, and the minimum alignment we have (4KiB). Additionally,
there was some redundant aligning being done to data already passed in aligned.
This didn't matter because we were always using 4K pages anyways due to kernel
limitations. However, this now needs fixing to allow for larger page support.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
Previously, for imports we wouldn't carry over the PTE kind with the import,
which worked fine up till now. However, compression depends on the PTE kind
being correct otherwise there will be a mismatch between both sides.
The GEM info object we get from the kernel already has the PTE kind embedded in
the tile flags object, so all we have to do is retrieve it and store in the bo
object, and then the lower layers can retrieve the kind from the bo directly.
Reviewed-by: Mel Henning <mhenning@darkrefraction.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
In bccb9fe091 ("nvk/nvkmd: nouveau uses the OS page size"),
the alignment size was narrowed to the OS page size in
nvkmd_nouveau_alloc_tiled_mem. This makes the same change
for nvk_AllocateMemory.
This is being done in preparation for large page support, which will
be more picky about alignments.
Reviewed-by: Mohamed Ahmed <mohamedahmedegypt2001@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38702>
Currently, GPU state is reset immediately after each flush and during
context creation, even when the next command might be a simple BLT/RS
operation that doesn't require the full GPU rendering pipeline.
This patch introduces lazy GPU state reset by:
- Adding a needs_gpu_state_reset flag to track when reset is needed
- Setting the flag to true after flush instead of immediately resetting
- Only performing the actual reset in etna_draw_vbo() when rendering
Signed-off-by: Christian Gmeiner <cgmeiner@igalia.com>
Acked-by: Lucas Stach <l.stach@pengutronix.de>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/36565>
To implement sparse partially-resident images, we need to be able
to express mapping in terms of rectangles of texel blocks.
With row_align_B we can constrain the rows of ordered blocks to
start at mapping boundary (i.e. page size) and using array_align_B
we can ensure that each subresource starts at a multiple of
whatever sparse block size we decide to use.
Not setting each of these fields is the same as setting them to 1.
Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37483>
The staging buffer is persistent until the destruction of the pvr_pipeline
object, so we should set the allocation scope to PVR_ALLOC_SCOPE_OBJECT instead
of PVR_ALLOC_SCOPE_COMMAND.
Also did the same change in the function pvr_pds_coeff_program_create_and_upload
for the staging buffer, because that buffer is also destroyed at pipeline destruction.
Fixes dEQP-VK.api.object_management.single_alloc_callbacks.graphics_pipeline.
Signed-off-by: Leon Perianu <leon.perianu@imgtec.com>
Reviewed-by: Karmjit Mahil <karmjit.mahil@igalia.com>
Tested-by: Icenowy Zheng <uwu@icenowy.me>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38662>
We want to add some disassembly options in the future. Add new
ir3_shader_disasm_options function that takes options from a new
ir3_disasm_options struct in which we can add options later. The
original ir3_shader_disasm becomes a wrapper for the new function to not
have to update all call sites now.
Signed-off-by: Job Noorman <jnoorman@igalia.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37595>
All things related to selecting the position when no sample is covered
isn't actually dependent on fragment shader loop iteration, in fact
it's not even dependent on the shader invocation, only the sample mask
(which is from jit context, not from shader key, otherwise could just
precalculate all of it). And certainly there's no need for all the extra
per-sample selects.
Just calculate it once at interpolation context init. LLVM should be able
to easily toss out (as with the previous version) all extra code done at
interpolation init if centroid interpolation isn't actually used.
(Although the code didn't turn out as simple as I hoped...)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38664>
D3D11 is pretty strict about how to do centroid interpolation.
In particular, llvmpipe didn't honor these rules when no sample was
covered for a pixel (relevant for helper pixels), in this case llvmpipe
selected the position of the sample with the highest index (just due to
initialization, not really by choice).
Given that helper pixels are only really used for derivative calculations,
and derivatives are generally sketchy with centroid interpolation, this
seems quite a lot of work, but I suppose it could be useful if the state
sample mask has only 1 sample set (since these d3d11 rules then guarantee
that even with centroid the derivatives are actually useful as the
interpolation will be done at the position defined by the sample specified
in the sample mask, regardless if that sample is covered by the primitive
or not).
Other APIs might technically not need this (they tend to not even define
at which position centroid interpolation is done, other than it must be
inside the primitive), but it shouldn't really hurt them neither.
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38664>
This fixes a real issue when ESO uses fbfetch output because this
was determined after instead of before.
This solution isn't the most elegant one but binding graphics shaders
earlier would require more work. Let's just handle this specific corner
case for now.
This fixes
dEQP-VK.renderpasses.dynamic_rendering.primary_cmd_buff.custom_resolve.shader_objects.fragment_region*
on some GPUs.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/38617>
When the register allocator decides to spill a value, all reads of that
value are filled. This can result in cases where the same value is
filled many times in a single block. In those cases, the result of an
earlier fill may still be available when a later fill occurs.
This optimization replaces the later fill with a move from the result of
the earlier fill.
v2: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.
v3: Use brw_transform_inst. Suggested by Caio. Add
brw_scratch_inst::offset instead of storing it as a source. Suggested by
Lionel.
v4: In intervening spill to the same location also invalidates the
value. 🤦
v5: Don't eliminate a fill if its destination partially overlaps the
preceeding fill destination. Fixes failures in cooperative matrix CTS.
shader-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
total instructions in shared programs: 17249903 -> 17249653 (<.01%)
instructions in affected programs: 35550 -> 35300 (-0.70%)
helped: 20 / HURT: 0
total cycles in shared programs: 893092398 -> 893101836 (<.01%)
cycles in affected programs: 2501720 -> 2511158 (0.38%)
helped: 6 / HURT: 14
total fills in shared programs: 1901 -> 1776 (-6.58%)
fills in affected programs: 1757 -> 1632 (-7.11%)
helped: 20 / HURT: 0
fossil-db:
Lunar Lake, Meteor Lake, and DG2 had similar results. (Lunar Lake shown)
Totals:
Instrs: 929949528 -> 926770338 (-0.34%)
Cycle count: 105126671329 -> 104851299099 (-0.26%); split: -0.28%, +0.02%
Fill count: 6520785 -> 5021518 (-22.99%)
Totals from 54281 (2.69% of 2018922) affected shaders:
Instrs: 239616289 -> 236437099 (-1.33%)
Cycle count: 22051883404 -> 21776511174 (-1.25%); split: -1.33%, +0.08%
Fill count: 6406295 -> 4907028 (-23.40%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
When the register allocator decides to spill a value, all writes to that
value are spilled and all reads are filled. In regions where there is
not high register pressure, a spill of a value may be followed by a fill
of that same file while the spilled register is still live. This
optimization pass finds these cases, and it converts the fill to a move
from the still-live register.
The restriction that the spill and the fill must have matching NoMask
really hampers this optimization. With the restriction removed, the pass
was more than 2x helpful.
v2: Require force_writemask_all to be the same for the spill and the fill.
v3: Use FIXED_GRF for register overlap tests. Since this is after
register allocation, the VGRF values will not tell the whole truth.
v4: Use brw_transform_inst. Suggested by Caio. The allows two of the
loops to be merged. Add brw_scratch_inst::offset instead of storing it
as a source. Suggested by Lionel.
v5: Add no-fill-opt debug option to disable optimizations. Suggested by
Lionel.
v6: Move a calculation outside a loop. Suggested by Lionel.
v7: Check that spill ranges overlap instead of just checking initial
offset. Zero shaders in fossil-db were affected, but some CTS with
spill_fs were fixed (e.g.,
dEQP-VK.subgroups.arithmetic.compute.subgroupmin_uint64_t_requiredsubgroupsize).
Suggested by Lionel.
v8: Add DEBUG_NO_FILL_OPT to debug_bits in
brw_get_compiler_config_value(). Noticed by Lionel.
shader-db:
Lunar Lake
total instructions in shared programs: 17249907 -> 17249903 (<.01%)
instructions in affected programs: 10684 -> 10680 (-0.04%)
helped: 2 / HURT: 0
total cycles in shared programs: 893092630 -> 893092398 (<.01%)
cycles in affected programs: 237320 -> 237088 (-0.10%)
helped: 2 / HURT: 0
total fills in shared programs: 1903 -> 1901 (-0.11%)
fills in affected programs: 110 -> 108 (-1.82%)
helped: 2 / HURT: 0
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
total instructions in shared programs: 19968898 -> 19968778 (<.01%)
instructions in affected programs: 33020 -> 32900 (-0.36%)
helped: 10 / HURT: 0
total cycles in shared programs: 885157211 -> 884925015 (-0.03%)
cycles in affected programs: 39944544 -> 39712348 (-0.58%)
helped: 8 / HURT: 2
total fills in shared programs: 4454 -> 4394 (-1.35%)
fills in affected programs: 2678 -> 2618 (-2.24%)
helped: 10 / HURT: 0
fossil-db:
Lunar Lake
Totals:
Instrs: 930445228 -> 929949528 (-0.05%)
Cycle count: 105195579417 -> 105126671329 (-0.07%); split: -0.07%, +0.00%
Spill count: 3495279 -> 3494400 (-0.03%)
Fill count: 6767063 -> 6520785 (-3.64%)
Totals from 43844 (2.17% of 2018922) affected shaders:
Instrs: 212614840 -> 212119140 (-0.23%)
Cycle count: 19151130510 -> 19082222422 (-0.36%); split: -0.39%, +0.03%
Spill count: 2831100 -> 2830221 (-0.03%)
Fill count: 6128316 -> 5882038 (-4.02%)
Meteor Lake and DG2 had similar results. (Meteor Lake shown)
Totals:
Instrs: 1001375893 -> 1001113407 (-0.03%)
Cycle count: 92746180943 -> 92679877883 (-0.07%); split: -0.08%, +0.01%
Spill count: 3729157 -> 3728585 (-0.02%)
Fill count: 6697296 -> 6566874 (-1.95%)
Totals from 35062 (1.53% of 2284674) affected shaders:
Instrs: 179819265 -> 179556779 (-0.15%)
Cycle count: 18111194752 -> 18044891692 (-0.37%); split: -0.41%, +0.04%
Spill count: 2453752 -> 2453180 (-0.02%)
Fill count: 5279259 -> 5148837 (-2.47%)
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>
These opcodes are emitted during register allocation instead of the
scratch reads and writes that were previously emitted. These
instructions contain additional information (i.e., the instruction
encodes the scratch offset) that enable optimizations to be added
later.
The fill and spill opcodes are lowered to scratch reads and writes
shortly after register allocation. Eventually this lower may have some
optimizations (e.g., reuse previous address calculations for successive
spills).
v2: Add brw_scratch_inst::offset instead of storing it as a
source. Suggested by Lionel.
Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/37827>