Job Noorman
584b63ecab
nir/load_store_vectorize: fix division by zero
...
Don't use glsl_get_explicit_stride as it may return 0 for vector types,
use nir_deref_instr_array_stride instead.
Signed-off-by: Job Noorman <jnoorman@igalia.com >
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31460 >
2024-10-02 05:53:57 +00:00
Rhys Perry
be64454710
nir/tests: test opt_loop_peel_initial_break with derefs in header block
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31324 >
2024-10-01 12:24:22 +00:00
Rhys Perry
0484044b1a
nir/opt_loop: rematerialize header block derefs in their use blocks
...
Otherwise, we could end up with phis of derefs.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31324 >
2024-10-01 12:24:22 +00:00
Gert Wollny
f19f1ec17b
nir/opt_algebraic: Allow two-step lowering of ftrunc@64 to use ffract@64
...
If ftrunc@64 is lowered by nir_lower_doubles it is turned into a
comparable long series of 32 bit operations. If the hardware
supports ffract@64 then nir_opt_algebraic can first lower ftrunc@64
to use some combinations with ffloor@64. They can then be turned
into a combination of fsub@64 and ffract@64 resulting in less
all-over instructions.
Fixes: 5218cff34b
nir/algebraic: avoid double lowering of some fp64 operations
Signed-off-by: Gert Wollny <gert.wollny@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29281 >
2024-09-30 23:51:02 +00:00
Kenneth Graunke
0b34a7aff0
nir: Don't generate single iteration loops to zero-initialize memory
...
If the stride we're adding to our loop counter is larger than the total
amount of shared local memory we're trying to initialize, we know the
loop will run at most one time. So we can skip emitting a loop.
Loop unrolling appears to be unable to detect this currently.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31312 >
2024-09-30 05:27:17 +00:00
Georg Lehmann
bb7e8d51b6
nir: delete nir_opt_reuse_constants
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031 >
2024-09-27 05:19:16 +00:00
Georg Lehmann
60776f87c3
nir/opt_remove_phis: rematerialize constants
...
Foz-DB Navi31:
Totals from 749 (0.94% of 79395) affected shaders:
Instrs: 1224359 -> 1223722 (-0.05%); split: -0.07%, +0.02%
CodeSize: 6468392 -> 6466296 (-0.03%); split: -0.06%, +0.03%
Latency: 9764410 -> 9766457 (+0.02%); split: -0.01%, +0.03%
InvThroughput: 1017401 -> 1017380 (-0.00%); split: -0.03%, +0.03%
VClause: 19902 -> 19873 (-0.15%); split: -0.16%, +0.02%
SClause: 38441 -> 38424 (-0.04%); split: -0.05%, +0.01%
Copies: 86880 -> 86304 (-0.66%); split: -0.73%, +0.06%
Branches: 34206 -> 34159 (-0.14%); split: -0.14%, +0.01%
PreSGPRs: 45557 -> 45527 (-0.07%); split: -0.08%, +0.01%
PreVGPRs: 32406 -> 32408 (+0.01%)
VALU: 671633 -> 671533 (-0.01%); split: -0.02%, +0.01%
SALU: 155284 -> 154675 (-0.39%); split: -0.40%, +0.00%
VMEM: 27303 -> 27271 (-0.12%)
SMEM: 67490 -> 67455 (-0.05%)
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031 >
2024-09-27 05:19:16 +00:00
Georg Lehmann
40fc85c15b
nir: make nir_instr_clone usable with load_const and undef
...
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031 >
2024-09-27 05:19:16 +00:00
Georg Lehmann
a9f8089240
nir: replace nir_opt_remove_phis_block with a single source version
...
This is what callers actually want, and it simplifies nir_opt_remove_phis
because we can assume dominance meta data is valid.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31031 >
2024-09-27 05:19:16 +00:00
Georg Lehmann
41e82b8b8e
nir: sink is_subgroup_invocation_lt_amd
...
Having it closer to the branches means we can eliminate an exec copy.
Foz-DB Navi31:
Totals from 11615 (14.63% of 79395) affected shaders:
Instrs: 6804372 -> 6804903 (+0.01%); split: -0.04%, +0.05%
CodeSize: 33684672 -> 33680584 (-0.01%); split: -0.07%, +0.05%
VGPRs: 578616 -> 578604 (-0.00%)
SpillSGPRs: 1506 -> 1304 (-13.41%)
Latency: 29817034 -> 29821320 (+0.01%); split: -0.03%, +0.05%
InvThroughput: 3581587 -> 3581217 (-0.01%); split: -0.02%, +0.01%
VClause: 124826 -> 124782 (-0.04%); split: -0.04%, +0.00%
SClause: 187916 -> 187645 (-0.14%); split: -0.27%, +0.13%
Copies: 520969 -> 510027 (-2.10%); split: -2.20%, +0.10%
PreSGPRs: 442584 -> 421344 (-4.80%)
VALU: 3810755 -> 3810267 (-0.01%); split: -0.01%, +0.00%
SALU: 763402 -> 752650 (-1.41%); split: -1.48%, +0.07%
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184 >
2024-09-26 14:29:14 +00:00
Georg Lehmann
bcfc5c09fa
amd: add offset to is_subgroup_invocation_lt_amd
...
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31184 >
2024-09-26 14:29:13 +00:00
Marek Olšák
09e64e3682
nir/opt_shrink_vectors: shrink memory loads, not just IO
...
The problem with radeonsi+ACO is that UBO loads from vec4 uniforms using
only 1 component always load all 4 components. This fixes that.
We are only interested in shrinking UBO and SSBO loads, but I added more
intrinsics because why not.
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/29384 >
2024-09-26 03:01:38 +00:00
Timothy Arceri
f6e7520b13
glsl: remove now unused linker code
...
This has all be replaced by a nir based linker implementation.
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:44 +00:00
Timothy Arceri
fe9b93fc1c
nir: handle wildcard array deref
...
Here we add handling of wildcard array derefs when attempting to mark
an io as partially used rather than hitting an assert.
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:44 +00:00
Timothy Arceri
6bb6b0e5ad
nir: add nir_intrinsic_deref_implicit_array_length intrinsic
...
This will be used to handle .length() calls on unsized arrays
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:44 +00:00
Timothy Arceri
60937b5286
nir: add implicit_conversion_prohibited field to nir_parameter
...
Will be used in link time validation in following patches.
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:44 +00:00
Timothy Arceri
5645495156
nir: store variable mode in nir_parameter
...
This will be used by the nir glsl linker in following patches.
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:44 +00:00
Timothy Arceri
89a2411c54
nir: serialize nir_parameter type
...
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:44 +00:00
Timothy Arceri
6ff3e87e5f
nir: add function in/outs to variable modes
...
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:44 +00:00
Timothy Arceri
1cb115abd2
nir: add nir_function_impl_clone_remap_globals()
...
This will be use by the glsl nir linker when we are combining
different shaders from the same shader stage that might have multiple
declarations of global variables across the different shaders.
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:43 +00:00
Timothy Arceri
7a1061e0dd
nir: add max_ifc_array_access field to vars
...
This will be used in following patches by the nir based glsl
linker code.
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:43 +00:00
Timothy Arceri
7c5b21c032
glsl: add support for converting global instructions to NIR
...
NIR doesn't really support global instructions such as global val
initilisation. So here we add functionality to glsl_to_nir() to
put these instructions into a temporary function that will be
later inlined into main.
We give the function a name starting with gl_mesa_tmp_ as functions
starting with gl_ are reserved and will not have any clashes with
user functions, we finish the name with the blake3 of the shader
source to avoid conflicts with multiple shaders attached to a single
stage.
Acked-by: Marek Olšák <marek.olsak@amd.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31137 >
2024-09-25 09:39:43 +00:00
Georg Lehmann
e0bcab953d
nir: add amd shared append/consume
...
Reviewed-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31075 >
2024-09-19 16:21:47 +00:00
Boris Brezillon
eeb3512498
nir/lower_ssbo: Extend the load_ssbo_address intrinsic to pass an offset
...
On Mali(Valhall), the bounds checking can be done when in hardware, but
for this to work properly, we need to pass the offset to the
nir_load_ssbo_address() intrinsic.
Add an offset source to the intrinsic, and adjust the lowering pass
to conditionally lower the offset addition.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Acked-by: Eric R. Smith <eric.smith@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164 >
2024-09-18 13:45:57 +00:00
Boris Brezillon
adadb097a3
nir/lower_ssbo: Add an option to conditionally lower loads
...
On Mali(Valhall), we have a way to load SSBO data without going through
an SSBO index -> global address translation, so let's provide a way
to tell nir_lower_ssbo() when it shouldn't lower loads.
Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Acked-by: Eric R. Smith <eric.smith@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31164 >
2024-09-18 13:45:57 +00:00
Georg Lehmann
a3d6a770c0
nir/instr_set: fix fp_fast_math
...
We can't just ignore the flags of the match, we need the union.
Fixes: 666647acae ("nir: track some float controls bits per instruction")
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31195 >
2024-09-17 20:00:03 +00:00
Ian Romanick
6a09d33549
nir: Add a pass to generate BFI instructions from logical operations
...
Inspired by a commit message in !30934 , I set about optimizing the code
generated for nir_copysign. It would be possible to just implement an
opt_algebraic pattern for the specific values used by nir_copysign, but
this casts a slightly larger net.
As noted in a comment in the code, there may be variations of the
pattern that this pass misses. The opt_algebraic pattern would miss them
too.
v2: Use nir_def_replace. Suggested by Alyssa. Allow more "root"
instruction types. Suggested by Georg.
v3: Treat extract_u16(x, 0) as (x & 0x0000ffff), and treat extract_u8(x,
0) as (x & 0x000000ff).
v4: Use nir_scalar. Suggested by Georg.
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006 >
2024-09-13 00:21:00 +00:00
Ian Romanick
057c7c9f53
nir/algebraic: Recognize open-coded bitfield_reverse in XCOM 2
...
The XCOM 2 shaders in my shader-db use iadd instead of ior.
No fossil-db changes on any Intel platform.
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19787210 -> 19787034 (<.01%)
instructions in affected programs: 1187 -> 1011 (-14.83%)
helped: 6 / HURT: 0
total cycles in shared programs: 906024436 -> 906012612 (<.01%)
cycles in affected programs: 72978 -> 61154 (-16.20%)
helped: 6 / HURT: 0
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31006 >
2024-09-13 00:21:00 +00:00
Rhys Perry
97f4250a7c
nir: skip opt_loop_peel_initial_break if continue block only has phis
...
Doing that optimization wouldn't do anything useful in this case.
nir_block_has_non_copy() is used by opt_loop_peel_initial_break().
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002 >
2024-09-12 23:36:58 +00:00
Rhys Perry
8410b4cdd6
nir/tests: add some loop peeling tests
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002 >
2024-09-12 23:36:58 +00:00
Rhys Perry
64ac601049
nir/opt_loop: skip peeling if the loop ends with any kind of jump
...
Any kind of jump prevents us from moving it to the top of the loop, not
just breaks.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002 >
2024-09-12 23:36:58 +00:00
Rhys Perry
af3b099e0a
nir/opt_loop: skip peeling if the break is non-trivial
...
If this nir_if contains continues or other breaks, we can't move it
outside the loop.
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002 >
2024-09-12 23:36:57 +00:00
Rhys Perry
4f44a944bb
nir/opt_if: fix fighting between split_alu_of_phi and peel_initial_break
...
Signed-off-by: Rhys Perry <pendingchaos02@gmail.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Fixes: 6b4b044739 ("nir/opt_loop: add loop peeling optimization")
Closes: https://gitlab.freedesktop.org/mesa/mesa/-/issues/11822
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31002 >
2024-09-12 23:36:57 +00:00
Georg Lehmann
7fa7812219
nir: merge out of loop decision with nir_can_move_instr logic
...
One place to modify instead of two when adding new intrinsics here.
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906 >
2024-09-12 21:49:34 +00:00
Georg Lehmann
91f8e32a85
nir/opt_sink: do not sink inverse_ballot out of loops
...
Inverse_ballot result is undefined if the input is not dynamically uniform.
And sinking out of loops might make the input divergent.
Fixes: 18a0ff137f ("nir: sink/move inverse_ballot like moves")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906 >
2024-09-12 21:49:34 +00:00
Georg Lehmann
1ec3cc2aed
nir/opt_sink: do not sink load_ubo_vec4 out of loops
...
Same reason as for load_ubo.
Fixes: d199d65c3a ("nir/nir_opt_move,sink: Include load_ubo_vec4 as a load_ubo instr.")
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Reviewed-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30906 >
2024-09-12 21:49:34 +00:00
Caio Oliveira
1e7f1c2039
nir: Allow Mesh/Task to use implicit LOD when DERIVATIVE_GROUP is set
...
Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com >
Reviewed-by: Daniel Schürmann <daniel@schuermann.dev >
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30956 >
2024-09-10 18:22:42 +00:00
David Heidelberg
6bf7b5bcd8
nir_lower_mem_access_bit_sizes: Assert when 0 components or bits are requested
...
Prevent the accidental passing of 0 components or bits, as it makes no sense.
Cc: mesa-stable
Reviewed-by: Karol Herbst <kherbst@redhat.com >
Reviewed-by: Faith Ekstrand <faith.ekstrand@collabora.com >
Suggested-by: Karol Herbst <kherbst@redhat.com >
Signed-off-by: David Heidelberg <david@ixit.cz >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31103 >
2024-09-10 11:17:48 +00:00
Ian Romanick
a780305818
nir/algebraic: Optimize more comparisons with b2f
...
shader-db:
All Intel platforms had similar results. (Meteor Lake shown)
total instructions in shared programs: 19781108 -> 19772614 (-0.04%)
instructions in affected programs: 372638 -> 364144 (-2.28%)
helped: 2915 / HURT: 0
total cycles in shared programs: 905907644 -> 905822682 (<.01%)
cycles in affected programs: 5573453 -> 5488491 (-1.52%)
helped: 2363 / HURT: 234
LOST: 42
GAINED: 16
fossil-db:
All Intel platforms had similar results. (Meteor Lake shown)
Totals:
Instrs: 152519634 -> 152519610 (-0.00%)
Cycle count: 17122707642 -> 17122710974 (+0.00%); split: -0.00%, +0.00%
Totals from 5 (0.00% of 633222) affected shaders:
Instrs: 2827 -> 2803 (-0.85%)
Cycle count: 83089 -> 86421 (+4.01%); split: -0.12%, +4.13%
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/31068 >
2024-09-10 04:15:58 +00:00
Alyssa Rosenzweig
b7542c4390
nir: CSE comparisons in atan2
...
Same code generated on AGX but simplified NIR.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
7546ae96a7
nir: drop NaN fixup for atan
...
this existed due to the min/max, per the comment. now that we don't do min/max,
the whole routine is NaN correct so the fixup is pointless.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Suggested-by: Ian Romanick <ian.d.romanick@intel.com >
Reviewed-by: Georg Lehmann <dadschoorse@gmail.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
ab8547a002
nir: push up abs in atan2 calculation
...
everybody has abs on fmul, not everyone has abs on bcsel. should help agx and
bifrost.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
398e1ad46c
nir: fuse ffma for atan range fixup
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
47e7cd268c
nir: negate an expression in atan
...
we're going to fix up the sign immediately anyway.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
5318b8868b
nir: simplify atan range reduction fixup
...
the original version sure is creative.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
87b99d5797
nir: use copysign for atan
...
this does two things:
* ignores sign of negative numbers which let us play fast and loose later in th
series
* avoids an expensive fsign instruction in favour of a cheap bitwise op
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
95215a094a
nir: extend copysign for no-integer hw
...
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
0a4a0df283
nir: push down fabs for atan
...
worse in terms of NIR instruction count but lets the fabs fold easier. (on agx,
which has fabs on comparisons and fmul but not on bcsel. should be no worse if
ISA has fabs on all 3.)
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
8579375777
nir: simplify atan range reduction
...
just implement what the comment says, don't be clever. the clever thing is worse
on all architectures i'm familiar with, because the fdiv will turn into
fmul+frcp.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00
Alyssa Rosenzweig
a32b1a975d
nir: correct comment for atan range reduction
...
the code did not match the comment, blew a sign.
Signed-off-by: Alyssa Rosenzweig <alyssa@rosenzweig.io >
Reviewed-by: Ian Romanick <ian.d.romanick@intel.com >
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30934 >
2024-09-07 00:54:35 +00:00