A stackless (or at least using allocated memory for stack) version
might be nice but for now this works around some games compiling
large shaders and hitting stack overflows.
CC: mesa-stable
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/21231>
This really, really helps on platforms where fabs() isn't free. A great
many shaders use a * frsq(fabs(fdot(a, a))) to normalize a vector.
Since the result of the fdot must be non-negative, the fabs can be
eliminated by an existing algebraic rule.
shader-db results:
r300 (run on R420 - X800XL)
total instructions in shared programs: 1369807 -> 1368550 (-0.09%)
instructions in affected programs: 59986 -> 58729 (-2.10%)
helped: 609
HURT: 0
total vinst in shared programs: 512899 -> 512861 (<.01%)
vinst in affected programs: 1522 -> 1484 (-2.50%)
helped: 36
HURT: 0
total sinst in shared programs: 260690 -> 260570 (-0.05%)
sinst in affected programs: 1419 -> 1299 (-8.46%)
helped: 120
HURT: 0
total consts in shared programs: 957295 -> 957230 (<.01%)
consts in affected programs: 849 -> 784 (-7.66%)
helped: 65
HURT: 0
LOST: 0
GAINED: 3
The 3 gained shaders are all vertex shaders from XCom: Enemy Unknown.
I'm guessing that game is never going to run on my X800XL. :)
i915
total instructions in shared programs: 791121 -> 780843 (-1.30%)
instructions in affected programs: 220170 -> 209892 (-4.67%)
helped: 2085
HURT: 0
total temps in shared programs: 47765 -> 47766 (<.01%)
temps in affected programs: 9 -> 10 (11.11%)
helped: 0
HURT: 1
total const in shared programs: 93048 -> 92983 (-0.07%)
const in affected programs: 784 -> 719 (-8.29%)
helped: 65
HURT: 0
LOST: 0
GAINED: 36
Haswell, Ivy Bridge, and Sandy Bridge had similar results. (Haswell shown)
total instructions in shared programs: 16702250 -> 16697908 (-0.03%)
instructions in affected programs: 119277 -> 114935 (-3.64%)
helped: 1065
HURT: 0
helped stats (abs) min: 1 max: 20 x̄: 4.08 x̃: 4
helped stats (rel) min: 0.48% max: 10.17% x̄: 3.66% x̃: 3.94%
95% mean confidence interval for instructions value: -4.26 -3.89
95% mean confidence interval for instructions %-change: -3.76% -3.56%
Instructions are helped.
total cycles in shared programs: 880772068 -> 880734134 (<.01%)
cycles in affected programs: 2134456 -> 2096522 (-1.78%)
helped: 941
HURT: 324
helped stats (abs) min: 2 max: 2180 x̄: 123.06 x̃: 44
helped stats (rel) min: 0.04% max: 49.96% x̄: 7.08% x̃: 3.81%
HURT stats (abs) min: 2 max: 2098 x̄: 240.33 x̃: 35
HURT stats (rel) min: 0.04% max: 77.07% x̄: 12.34% x̃: 3.00%
95% mean confidence interval for cycles value: -47.93 -12.04
95% mean confidence interval for cycles %-change: -2.87% -1.34%
Cycles are helped.
No shader-db changes on any other Intel platform.
Reviewed-by: Jason Ekstrand <jason.ekstrand@collabora.com>
Reviewed-by: Emma Anholt <emma@anholt.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/17181>
The lowered LS and NGG stages use local_invocation_index and they
can benefit from the unsigned upper bound because they can emit a
less expensive integer multiplication instruction.
This was working in the past, but accidentally borked by a refactor.
Fossil DB changes on Sienna Cichlid:
Totals from 956 (0.74% of 128647) affected shaders:
CodeSize: 2354172 -> 2344712 (-0.40%)
Instrs: 434359 -> 434327 (-0.01%)
Latency: 1883949 -> 1876814 (-0.38%)
InvThroughput: 762638 -> 757405 (-0.69%)
Fossil DB changes on Sienna Cichlid (with NGGC enabled):
Totals from 57873 (44.99% of 128647) affected shaders:
CodeSize: 155844192 -> 155607064 (-0.15%)
Instrs: 29799184 -> 29799152 (-0.00%)
Latency: 130959764 -> 130814224 (-0.11%); split: -0.11%, +0.00%
InvThroughput: 21100300 -> 20928635 (-0.81%); split: -0.81%, +0.00%
Fixes: 8af6766062
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/12558>
Recall that when either value is NaN, fmax will pick the other value.
This means the result range of the fmax will either be the "ideal"
result range (calculated above) or the range of the non-NaN value.
Previously, something like fmax({gt_zero}, {lt_zero, is_a_number}) would
return a range of gt_zero. However, if the "gt_zero" parameter is NaN,
the actual result will be the "lt_zero" parameter.
This analysis depends on the is_a_number analysis also added in this MR.
Assuming this doesn't cause any unforeseen problems, I believe we should
wait a bit, then nominate a subset of the series for the stable
branches.
This fixes the piglit tests
tests/spec/glsl-1.30/execution/range_analysis_fmax_of_nan.shader_test
tests/spec/glsl-1.30/execution/range_analysis_fmin_of_nan.shader_test
from https://gitlab.freedesktop.org/mesa/piglit/-/merge_requests/463.
Even with the added fsat fixes, range_analysis_fsat_of_nan.shader_test
still fails. There are some other issues there that will be addressed
in later commits (in another MR).
v2: Add fsat fixes. Suggested by Rhys.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Shader-db results:
All Intel platforms had similar results. (Tiger Lake shown)
total instructions in shared programs: 21049290 -> 21049314 (<.01%)
instructions in affected programs: 3175 -> 3199 (0.76%)
helped: 0
HURT: 17
HURT stats (abs) min: 1 max: 3 x̄: 1.41 x̃: 1
HURT stats (rel) min: 0.20% max: 1.89% x̄: 0.97% x̃: 0.92%
95% mean confidence interval for instructions value: 1.09 1.73
95% mean confidence interval for instructions %-change: 0.75% 1.19%
Instructions are HURT.
total cycles in shared programs: 855136176 -> 855136406 (<.01%)
cycles in affected programs: 37579 -> 37809 (0.61%)
helped: 0
HURT: 17
HURT stats (abs) min: 12 max: 20 x̄: 13.53 x̃: 14
HURT stats (rel) min: 0.17% max: 1.13% x̄: 0.79% x̃: 0.91%
95% mean confidence interval for cycles value: 12.53 14.53
95% mean confidence interval for cycles %-change: 0.63% 0.94%
Cycles are HURT.
Fossil-db results:
Tiger Lake
Instructions in all programs: 160901033 -> 160902591 (+0.0%)
SENDs in all programs: 6812270 -> 6812270 (+0.0%)
Loops in all programs: 38225 -> 38225 (+0.0%)
Cycles in all programs: 7430016795 -> 7429003266 (-0.0%)
Spills in all programs: 192582 -> 192582 (+0.0%)
Fills in all programs: 304539 -> 304539 (+0.0%)
Ice Lake
Instructions in all programs: 145299102 -> 145301634 (+0.0%)
SENDs in all programs: 6863890 -> 6863890 (+0.0%)
Loops in all programs: 38219 -> 38219 (+0.0%)
Cycles in all programs: 8798390846 -> 8798589772 (+0.0%)
Spills in all programs: 216880 -> 216880 (+0.0%)
Fills in all programs: 334250 -> 334250 (+0.0%)
Skylake
Instructions in all programs: 135889478 -> 135892010 (+0.0%)
SENDs in all programs: 6802916 -> 6802916 (+0.0%)
Loops in all programs: 38216 -> 38216 (+0.0%)
Cycles in all programs: 8442624166 -> 8442597324 (-0.0%)
Spills in all programs: 194839 -> 194839 (+0.0%)
Fills in all programs: 301116 -> 301116 (+0.0%)
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
This commit is necessary to support "nir/range_analysis: Fix analysis of
fmin and fmax with NaN".
No shader-db or fossil-db changes on any Intel platform.
v2: Pack and unpack is_a_number.
v3: Don't set is_a_number of integer constants. The bit pattern might
be NaN.
v4: Update handling of b2i32. intBitsToFloat(int(true)) is
1.401298464324817e-45. Return a value consistent with that.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
The obvious changes to nir_search_helpers.h are in a separate commit to
limit the scope of this change. These additions are really only needed
to support the next commit "nir/range_analysis: Add "is a number" range
analysis tracking". This reduction in scope is intended to increase the
suitability for stable branches.
No shader-db or fossil-db changes on any Intel platform.
v2: Pack and unpack is_finite.
v3: Split nir_search_helpers.h changes into a separate commit.
v4: Remove assertion intended for the next commit. Update is_finite
comment for fsign. Both noticed by Rhys. Fix is_finite handling for
load_const vectors. If any element is not finite, set the flag to
false. This is the same way is_integral is already handled.
v5: Update handling of b2i32. intBitsToFloat(int(true)) is
1.401298464324817e-45. Return a value consistent with that.
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9108>
If a query is made of a vector ssa_def (possibly from an intermediate
result), return all_bits. If a constant source is a vector, swizzle
the correct component.
Unit tests were added for the constant vector cases. I don't see a
great way to make unit tests for the other cases.
v2: Add a FINIHSME comment about u16vec2 hardware.
Fixes: 96303a59ea ("nir: Add some range analysis for used bits")
Reviewed-by: Jason Ekstrand <jason@jlekstrand.net>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/9123>
nir_addition_might_overflow() expects the parent instruction to be
an alu instr but it might be a phi instr. Fix it by assuming that
the addition might overflow.
This fixes compiler crashes with Horizon Zero Dawn.
No fossils-db changes.
Cc: mesa-stable
Signed-off-by: Samuel Pitoiset <samuel.pitoiset@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/8268>
This adds a nir_unsigned_upper_bound() helper which does something similar
to nir_analyze_range() except it tries to obtain the largest possible
value instead of it's relation to zero.
It also adds nir_addition_might_overflow(), which uses this helper to try
to prove that an unsigned addition does not wrap around.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com>
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/2720>
This fixes a build failure on MSVC.
BTW, it looks like clang supports _Pragma() but I don't know if it
understands the "gcc unroll N" directive.
Signed-off-by: Brian Paul <brianp@vmware.com>
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
All of the tables are static const, so they only need to be validated
once. As noted in the previous commit, the compiler should be able to
eliminate all of this code when the assertions would pass. Even with
the help of the previous commit, this does not always occur.
-Og: -95.688 +/- 3.91935 (-24.9562% +/- 1.0222%) N=5
-O1: No difference proven at 95.0% confidence. N=5
-O2: -1.962 +/- 0.85001 (-0.860013% +/- 0.372589%) N=5
Reviewed-by: Eric Anholt <eric@anholt.net>
I was pretty liberal with these assertions when I wrote this code
because I had assumed that GCC would unroll the loops, inline the look ups
of static const arrays with now constant indices, and then elmininate
all the actuall assertions. It seems none of this happens even at -O3.
Adding the pragmas helps encourage loop unrolling at some optimization
levels. I tested by running shader-db with NIR_VALIDATE=false on a Core
i7 Haswell desktop system.
-Og: No difference proven at 95.0% confidence. N=5
-O1: -48.304 +/- 1.221 (-16.3343% +/- 0.412888%) N=5
-O2: -49.94 +/- 1.23521 (-17.9634% +/- 0.444303%) N=5
v2: Add a _Pragma to an inner loop that was accidentally dropped during
a rebase.
Reviewed-by: Eric Anholt <eric@anholt.net>
This lets us memoize range analysis work across instructions. Reduces
runtime of shader-db on Intel by -30.0288% +/- 2.1693% (n=3).
Fixes: 405de7ccb6 ("nir/range-analysis: Rudimentary value range analysis pass")
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com>
Reviewed-by: Connor Abbott <cwabbott0@gmail.com>