Commit Graph

7677 Commits

Author SHA1 Message Date
Michel Zou
e4c0a34bfe radv: fix build with mingw
Cc: 21.2 mesa-stable
Reviewed-by: Joshua Ashton <joshua@froggi.es>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>

Closes #5092

Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12178>
2021-08-13 12:13:21 +02:00
Rhys Perry
c14a85e756 radv: enable DCC with signedness reinterpretation
It seems we can enable DCC if the possible formats differ in signedness
and are otherwise compatible. We just need a fast-clear eliminate for
certain clear colors.

Improves Trine 4 performance.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9387>
2021-08-12 17:14:00 +01:00
Samuel Pitoiset
96403c1ec4 radv: allow fast clears for concurrent images if comp-to-single is supported
Only GFX10+ is affected because older chips don't support
comp-to-single. For them, we need to implement FCE on compute with DCC
and eventually CMASK.

Fixes the gap between concurrent vs exclusive queue with Scarlet Nexus,
also gives a boost with Doom Eternal.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12088>
2021-08-10 12:52:14 +02:00
Samuel Pitoiset
7ccecf2096 radv: enable DCC fast-clears with comp-to-single on GFX10+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
2021-08-10 08:20:17 +00:00
Samuel Pitoiset
aafe73561e radv: skip FCE for images that are fast-cleared using comp-to-single
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
2021-08-10 08:20:17 +00:00
Samuel Pitoiset
7451eb1d61 radv: implement DCC fast clears with comp-to-single
When an image supports comp-to-single, DCC is cleared to 0x10 (single)
and the clear color value is written to the beginning of each 256B
block in the image.

This allows to skip FCE.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
2021-08-10 08:20:17 +00:00
Samuel Pitoiset
782e0d05b0 radv: determine if an image support fast clears using comp-to-single
Only on GFX10+ with DCC enabled.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
2021-08-10 08:20:17 +00:00
Samuel Pitoiset
c336c4b0cb radv: add RADV_DCC_CLEAR_SINGLE
When DCC is cleared with that code, the hardware expects the clear
color value to be stored at the beginning of each 256B block in
the image.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
2021-08-10 08:20:17 +00:00
Samuel Pitoiset
6b1afe33b2 radv: pass an image view to vi_get_fast_clear_parameters()
image_format was unused.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
2021-08-10 08:20:17 +00:00
Samuel Pitoiset
139d34d657 radv: use more explicit DCC clear codes
No functional changes.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/10518>
2021-08-10 08:20:17 +00:00
Samuel Pitoiset
deecc7d109 radv: fix reported sample counts for VRS 1x1
The Vulkan spec requires ~0 for 1x1.

Fixes dEQP-VK.fragment_shading_rate.misc.shading_rates.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
2021-08-10 07:24:22 +00:00
Samuel Pitoiset
1a7eb424b0 radv: bump maxFragmentShadingRateCoverageSamples to 32
Minimum required value is 16 but we support up to 32
(2x2 VRS with MSAA 8x).

Fixes dEQP-VK.fragment_shading_rate.misc.limits.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
2021-08-10 07:24:22 +00:00
Samuel Pitoiset
0d605bb8e0 radv: disable fragmentShadingRateWithCustomSampleLocations
From the Vulkan spec 1.2.187.

"fragmentShadingRateWithCustomSampleLocations specifies whether
 custom sample locations are supported for multi-pixel fragments.
 It must be VK_FALSE if VK_EXT_sample_locations is not supported."

VK_EXT_sample_locations is disabled on GFX10+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
2021-08-10 07:24:22 +00:00
Samuel Pitoiset
e7e8704611 radv: bump maxFragmentSizeAspectRatio to 2
Minimum required value is 2.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12245>
2021-08-10 07:24:22 +00:00
Samuel Pitoiset
7ae3881a4b radv: flush caches before performing separate depth/stencil aspect init
It's a RMW operation, also note that DB doesn't use L2 on GFX6-8.

Fixes test_clear_depth_stencil_view() and test_discard_resource() tests
from vkd3d-proton.

Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12223>
2021-08-09 16:35:41 +00:00
Bas Nieuwenhuizen
02b6015945 radv: Allocate space for inline push constants.
In the compute dispatch path we do not allocate a huge amount
of space to cover everything so the individual functions have to
allocate. This was missing here, causing a hang in Cyberpunk when
accessing the system menu at some locations with thread tracing
enabled.

Fixes: bd1186572f ("radv: add support for push constants inlining when possible")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12271>
2021-08-09 14:26:21 +00:00
Bas Nieuwenhuizen
b2b1e8e40a radv: Use correct signedness in misalign test.
Lots of the MAX2 args end up subtracting two unsigned numbers, which
blows up when the result is negative.

Fixes: 4c99d6ff54 ("radv: flush L2 for images affected by the pipe misaligned issue on GFX10+")
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12272>
2021-08-09 14:03:37 +00:00
Samuel Pitoiset
21c8a95e34 radv: remove unnecessary FIXME about custom sample locations
VK_EXT_sample_locations is disabled on GFX10+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12247>
2021-08-09 11:40:13 +00:00
Samuel Pitoiset
1db36422b9 radv: fix initializing the DS clear metadata value for separate aspects
We shouldn't overwrite the clear value of the other aspect (in case
separate depth/stencil layouts are used).

Found by inspection.

Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12222>
2021-08-09 10:41:39 +00:00
Samuel Pitoiset
ade66c1aeb radv: allow DCC MSAA fast clears if a FCE is needed
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12180>
2021-08-09 09:27:52 +00:00
Samuel Pitoiset
f136838d1e radv: perform a FCE for MSAA images that might have been fast-cleared
FMASK_DECOMPRESS can't eliminate DCC fast clears. This will allow to
enable DCC MSAA fast clears that require a FCE.

Only supported on GFX10+.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12180>
2021-08-09 09:27:52 +00:00
Samuel Pitoiset
3cfa3187cb radv: rework DCC, FMASK and FCE decompress path
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12180>
2021-08-09 09:27:52 +00:00
Timur Kristóf
74181ffcc5 radv: Write RSRC2_GS for NGGC when pipeline is dirty but not emitted.
The radv_emit_ngg_culling_state function won't write the
SPI_SHADER_PGM_RSRC2_GS register when it knows in advance that
radv_emit_graphics_pipeline will overwrite it anyway.

However, there is an unhandled case:

radv_emit_graphics_pipeline will not emit anything (including this
register) when the pipeline is already emitted. Hence, improve
the check in radv_emit_ngg_culling_state to consider this.

Fixes: 9a95f5487f
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12237>
2021-08-06 12:18:37 +02:00
Rhys Perry
5a536eca9c aco: calculate correct register demand for branch instructions
Since copies for the successor's linear phis are inserted before the
branch, we should consider the definitions and operands of the successor's
linear phis.

Fixes a Detroit: Become Human spilling failure with GCM+GVN.

Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12035>
2021-08-05 12:01:58 +00:00
Samuel Pitoiset
16793c8efa ac/surface: implement CmaskAddrFromCoord in NIR on GFX10+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12182>
2021-08-05 06:37:09 +00:00
Samuel Pitoiset
1d67fa4d73 ac/surface: add tests for CmaskAddrFromCoord on GFX10+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12182>
2021-08-05 06:37:09 +00:00
Samuel Pitoiset
0926b268fc amd/addrlib: expose CMASK address equations to drivers on GFX10+
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12182>
2021-08-05 06:37:09 +00:00
Timur Kristóf
8918a809ce ac: Remove deprecated use_late_alloc field as nobody uses it anymore.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11905>
2021-08-04 15:37:05 +00:00
Timur Kristóf
d911259c3a radv: Use ac_compute_late_alloc in radv_pipeline.
This aligns RADV with RadeonSI in how it handles late alloc,
making it easier for us to deal with deadlocks and such.

Also move setting the RSRC3 registers for VS, GS and NGG
into radv_pipeline.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11905>
2021-08-04 15:37:05 +00:00
Timur Kristóf
61eb3debcf radv: Don't toggle PC oversubscription for NGG culling.
We are going to add this directly to the pipeline.
If a pipeline has such a shader, NGG culling is turned on
most of the time, so it's not useful to toggle this setting.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11905>
2021-08-04 15:37:05 +00:00
Timur Kristóf
448592b9ae aco: Use Navi 10 empty NGG output workaround on NGG culling shaders.
Navi 10 can hang when an NGG workgroup has no output,
so we work around that by always exporting a single zero-area
triangle with a single vertex that has all-NaN coordinates.

Thus far, we only employed this for NGG GS, because on all
other stages, the output can't be empty.

However, with NGG culling, the output can be empty, so let's
apply the same workaround there too.

Cc: mesa-stable
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12169>
2021-08-04 12:28:34 +00:00
Rhys Perry
566970f273 aco: use image_dim and image_array intrinsic indices
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12190>
2021-08-04 12:09:07 +00:00
Rhys Perry
39db5a569b radv: set image_dim and image_array intrinsic indices
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12190>
2021-08-04 12:09:07 +00:00
Samuel Pitoiset
ad83c06a5f radv: fix missing cache flushes when clearing HTILE levels on GFX10+
The driver should accumulate the cache flush bits because if it uses
CP DMA for clearing the last level, it won't flush.

Found by inspection.

Cc: 21.2 mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12170>
2021-08-03 12:59:57 +00:00
Samuel Pitoiset
ebea075feb radv: fix selecting the first active CU when profiling with SQTT
Fixes: d26bcc0f5c ("radv: always select the first active CU when profiling with SQTT")
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12167>
2021-08-03 12:30:41 +00:00
Timothy Arceri
a9ed4538ab nir: add indirect loop unrolling to compiler options
This is where it should be rather than having to pass it into the
optimisation pass every time.

It also allows us to call the loop analysis pass without having to
duplicate these options which we will do later in this series.

Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12064>
2021-08-03 10:54:50 +00:00
Samuel Pitoiset
a49b397041 ac/surface: implement CmaskAddrFromCoord in NIR
It's similar to DCC, only GFX9 is currently supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12140>
2021-08-03 07:02:48 +00:00
Samuel Pitoiset
eedc0b59b7 ac/surface: copy the CMASK equation to radeon_surf
Only GFX9 is currently supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12140>
2021-08-03 07:02:48 +00:00
Samuel Pitoiset
1f12c3ccc1 ac/surface: store CMASK pitch and height to radeon_surf
Only GFX9+ is currently supported.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12140>
2021-08-03 07:02:48 +00:00
Samuel Pitoiset
132b205566 ac/surface: add tests for CmaskAddrFromCoord prototype outside of addrlib
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12140>
2021-08-03 07:02:48 +00:00
Samuel Pitoiset
96e12644f3 amd/addrlib: expose CMASK address equations to drivers on GFX9
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12140>
2021-08-03 07:02:48 +00:00
Samuel Pitoiset
501db87779 ac: introduce a structure to store DCC address equations for GFX9
CMASK addr equations will use the same struct.

Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Acked-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12140>
2021-08-03 07:02:48 +00:00
Rob Clark
6edf0d8e90 driconfig: Add support for device specific config
Add support for driconf overrides on a per-device level, for cases
where we don't want to override behavior for all devices supported
by a particular driver.

Signed-off-by: Rob Clark <robdclark@chromium.org>
Reviewed-by: Emil Velikov <emil.l.velikov@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12135>
2021-08-02 16:37:24 -07:00
Timur Kristóf
a6110f3c3a ac/nir: Remove unhelpful nir_opt_cse from ac_nir_lower_ngg_nogs.
This CSE call adds to our compile time without adding any real
benefit to the compiled code.

Fossil DB results on Sienna Cichlid (with NGGC on):

Totals from 1580 (1.23% of 128647) affected shaders:
CodeSize: 4563912 -> 4562312 (-0.04%); split: -0.07%, +0.03%
Instrs: 870722 -> 870338 (-0.04%); split: -0.09%, +0.04%
Latency: 3349863 -> 3351458 (+0.05%); split: -0.10%, +0.14%
InvThroughput: 617796 -> 617971 (+0.03%); split: -0.01%, +0.03%
VClause: 22604 -> 22568 (-0.16%); split: -0.75%, +0.59%
SClause: 16285 -> 16327 (+0.26%); split: -0.07%, +0.33%
Copies: 83472 -> 83599 (+0.15%); split: -0.07%, +0.22%
PreSGPRs: 62340 -> 62334 (-0.01%)

No Fossil DB changes with NGGC off.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Timur Kristóf
da9f4b2e67 nir, aco: Remove vertex and primitive count overwrite intrinsic.
It's no longer needed.

No Fossil DB changes.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Timur Kristóf
4b540aef2c ac/nir: Don't count vertices and primitives in wave after culling.
These are not needed anymore, because the EXEC mask doesn't depend on them.

Fossil DB results on Sienna Cichlid (with NGGC on):

Totals from 58239 (45.27% of 128647) affected shaders:
Latency: 138113669 -> 138285372 (+0.12%)
InvThroughput: 22404840 -> 22405245 (+0.00%)

No Fossil DB changes with NGGC off.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Timur Kristóf
a2d02c0c11 ac/nir: Use gs_accepted variable after culling.
This prevents us from recalculating the EXEC mask later
in the shader, and removes the requirement for
counting the number of primitives.

The stats are better than expected because they also
show that some code that is still there is now DCE'd by ACO.

Fossil DB results on Sienna Cichlid (with NGGC on):

Totals from 58239 (45.27% of 128647) affected shaders:
SpillSGPRs: 330 -> 340 (+3.03%)
CodeSize: 166356072 -> 162805724 (-2.13%)
Instrs: 31920041 -> 31089256 (-2.60%)
Latency: 138815742 -> 138113669 (-0.51%); split: -0.54%, +0.03%
InvThroughput: 22459553 -> 22404840 (-0.24%); split: -0.26%, +0.02%
SClause: 753746 -> 753765 (+0.00%); split: -0.00%, +0.01%
Copies: 3226647 -> 3268973 (+1.31%); split: -0.45%, +1.76%
Branches: 1223441 -> 1223440 (-0.00%); split: -0.00%, +0.00%
PreSGPRs: 2025339 -> 2091013 (+3.24%)

No Fossil DB changes with NGGC off.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Timur Kristóf
8159868699 ac/nir: Use es_accepted variable after culling.
This avoids re-calculating the exec mask for ES vertices,
and makes it unnecessary to count the number of vertices left.

Fossil DB results on Sienna Cichlid (with NGGC on):

Totals from 58239 (45.27% of 128647) affected shaders:
CodeSize: 166521108 -> 166356072 (-0.10%); split: -0.10%, +0.00%
Instrs: 31961308 -> 31920041 (-0.13%); split: -0.13%, +0.00%
Latency: 138820463 -> 138815742 (-0.00%); split: -0.04%, +0.04%
InvThroughput: 22460177 -> 22459553 (-0.00%); split: -0.00%, +0.00%
SClause: 753744 -> 753746 (+0.00%)
Copies: 3093140 -> 3226647 (+4.32%); split: -0.03%, +4.34%

No Fossil DB changes with NGGC off.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Timur Kristóf
1bbea90f50 aco, nir, ac: Simplify sequence of getting initial NGG VS edge flags.
Instead of v_bfe + v_lshl_or for each vertex, get all 3 edge flags
at once of every vertex. This takes fewer VALU instructions than
previously.

Fossil DB results on Sienna Cichlid (with NGGC on):

Totals from 56917 (44.24% of 128647) affected shaders:
CodeSize: 161028288 -> 158751628 (-1.41%)
Instrs: 30917985 -> 30519571 (-1.29%)
Latency: 130617204 -> 129975532 (-0.49%); split: -0.50%, +0.01%
InvThroughput: 21280238 -> 20927401 (-1.66%)
Copies: 3011120 -> 3011125 (+0.00%); split: -0.00%, +0.00%

No Fossil DB changed with NGGC off.

Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/11908>
2021-08-02 11:38:25 +00:00
Rhys Perry
0460f01fdc ac/llvm: implement v2f16 fsat
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Marek Olšák <marek.olsak@amd.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12143>
2021-08-02 10:02:51 +00:00