Copied from radeonsi.
Putting in the correct metadata flush commands for eventually not
flushing L2 on CB/DB switch.
Does not remove the need for V_028A90_CACHE_FLUSH_AND_INV_TS_EVENT
at the moment.
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
When rasterization is disabled we can have that few.
Fixes: 76603aa90b "radv: Drop the default viewport when 0 viewports are given."
Reviewed-by: Dave Airlie <airlied@redhat.com>
Overall it does not really help or hurt. The deferred demo gets 1%
improvement and some games a 3% decrease, so I don't think this
should be enabled by default.
But with the code upstream it is easier to experiment with it.
v2: Remove initializing the registers from si_emit_config.
Reviewed-by: Dave Airlie <airlied@redhat.com>
amdvlk is probably more subtle than this but it never uses
the inv cb/db variants, we fail some CTS tests without this.
Fixes:
dEQP-VK.renderpass.dedicated_allocation.formats.d32_sfloat_s8_uint.input*.
Fixes: c2fbeb7ca0 (radv: add GFX9 cache flushing support.)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> (for now :-)
Signed-off-by: Dave Airlie <airlied@redhat.com>
Fix a bunch of labels indicating when registers were added/removed
and normalize the SI-class GRBM_GFX_INDEX.
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
This uses the new kernel interfaces for reduced cs overhead,
We only set the local flag for memory allocations that don't have
a dedicated allocation and ones that aren't imports.
v2: add to all the internal buffer creation paths.
v3: missed some command submission paths, handle 0/empty bo lists.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This patch helps lower high priority compute latency. Found by
bisecting a perf regression on computeparticles with high priority
compute queues enabled.
Reverting this micro-optimization doesn't seem to have any negative
effect on performance on Dota2 or ssao.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When WAVE_LIMIT is set, a submission will opt-in for SPI based resource
scheduling. Because this mechanism is cooperative, we must ensure that
all submissions have this field set, otherwise they will bypass resource
arbitration.
We always hardcode the field to its maximum value, instead of attempting
to calculate an approximate usage. In testing, there were no benefits to
using anything other than the maximum.
Signed-off-by: Andres Rodriguez <andresx7@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Looking at shader traces I noticed some registers were missing,
one of them was being eaten by the wrong clear state length.
Fixes: 4f42ea4dc (radv: use CLEAR_STATE for initializing some registers)
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>
These registers don't change during the lifetime of the
command buffer, there is no need to re-emit them when
binding a new pipeline.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
We are really not going to use a winsys which does not need to store
the va, so might as well store it in a standard field.
Not sure this helps perf much though, as most of the cost is in the
cache miss accessing the bo anyway, which we stil need to do.
Reviewed-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
This moves a bunch of non-draw dependent calcs into the pipeline code,
to reduce CPU overheads in the draw path.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This realigns this code with the radeonsi version and fixes
the indirect case to work properly.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reduce size of radv_pipeline.c and improve code isolation. More
code can probably moved but it's a start.
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
When I added gfx9 I did it wrong, this fixes it.
Fixes: 5247b311e9 "radv/gfx9: fix set predication packet."
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
When using DCC some clear values don't require a cmask eliminate
step. This patch adds support for black and black with alpha 1,
there are other values, but I don't have access to a comprehensive list.
This works by setting the cmask eliminate predicate when doing the
fast clear, and later when doing the cmask elimination making sure
the draws are predicated.
This increases the fps on Sascha Willems deferred.
Tonga: 580fps->670fps on a Tonga PRO card.
Polaris 730->850fps
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This doesn't get used yet, it just adds support to various PKT3
emissions to enable it later.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
The register header (and radeonsi comment) states V_411_SRC_ADDR_TC_L2
is for CIK+ only, so let's assert on earlier ASICs.
Signed-off-by: Grazvydas Ignotas <notasas@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
This seems to matter here in a profile, without this we spend a lot
more time exiting this function with no flush bits.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
No sense checking each bit separately in the common case of none
being set.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
This adds some rb+ support, as on GFX9 we have to disable
it as per radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
GFX9 needs to write event EOP to a fence buffer, allocate some
space for this, and just write an ever increasing number to it,
this isn't exactly what radeonsi does, but it seems to work.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds support for initialising some GFX9 registers,
and handles the different init for the VGT reuse reg.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This adds support to the CP dma code for GFX9, ported from
radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the strings and includes the gfx9 register defs
in some files that we need them in.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This reworks this code to be like radeonsi, which will make it
easier to add GFX9 support to it in the future.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
In prep for GFX9 refactor some of the eop event writing code
out.
This changes behaviour, but aligns with what radeonsi does,
it does double emits on CIK/VI, whereas previously it only
did this on CIK.
v2: bump the size checks.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>
This just adds the chip in the right places.
We don't set the partial_vs_wave workaround, as radeonsi
doesn't, but have to confirm it's not required.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Cc: "17.1" <mesa-stable@lists.freedesktop.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>
Gives me approximately a 2% perf increase in bot dota2 & talos.
Having descriptors (both sets and vertex buffers) prefetched
didn't help so I didn't include that.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We want the guardband_x/y to be the largerst scalars such that each
viewport scaled by that amount is still a subrange of [-32767, 32767].
The old code has a couple of issues:
1) It used scissor instead of viewport_scissor, potentially taking into
account a viewport that is too small and therefore selecting a scale
that is too large.
2) Merging the viewports isn't ideal, as for example viewports with
boundaries [0,1] and [1000, 1001] would allow a guardband scale of ~30k,
while their union [0, 1001] only allows a scale of ~32.
The new code just determines the guardband per viewport and takes the minimum.
Signed-off-by: Bas Nieuwenhuizen <basni@google.com>
Acked-by: Dave Airlie <airlied@redhat.com>
y is vert, x is horiz.
Noticed in visual inspection compared to radeonsi.
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Signed-off-by: Dave Airlie <airlied@redhat.com>