Merged
Conversation
antoyo
pushed a commit
that referenced
this pull request
Nov 21, 2024
We can make use of the integrated rotate step of the XAR instruction
to implement most vector integer rotates, as long we zero out one
of the input registers for it. This allows for a lower-latency sequence
than the fallback SHL+USRA, especially when we can hoist the zeroing operation
away from loops and hot parts. This should be safe to do for 64-bit vectors
as well even though the XAR instructions operate on 128-bit values, as the
bottom 64-bit results is later accessed through the right subregs.
This strategy is used whenever we have XAR instructions, the logic
in aarch64_emit_opt_vec_rotate is adjusted to resort to
expand_rotate_as_vec_perm only when it's expected to generate a single REV*
instruction or when XAR instructions are not present.
With this patch we can gerate for the input:
v4si
G1 (v4si r)
{
return (r >> 23) | (r << 9);
}
v8qi
G2 (v8qi r)
{
return (r << 3) | (r >> 5);
}
the assembly for +sve2:
G1:
movi v31.4s, 0
xar z0.s, z0.s, z31.s, #23
ret
G2:
movi v31.4s, 0
xar z0.b, z0.b, z31.b, #5
ret
instead of the current:
G1:
shl v31.4s, v0.4s, 9
usra v31.4s, v0.4s, 23
mov v0.16b, v31.16b
ret
G2:
shl v31.8b, v0.8b, 3
usra v31.8b, v0.8b, 5
mov v0.8b, v31.8b
ret
Bootstrapped and tested on aarch64-none-linux-gnu.
Signed-off-by: Kyrylo Tkachov <ktkachov@nvidia.com>
gcc/
* config/aarch64/aarch64.cc (aarch64_emit_opt_vec_rotate): Add
generation of XAR sequences when possible.
gcc/testsuite/
* gcc.target/aarch64/rotate_xar_1.c: New test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This fix a bug in the garbage collector of GCC that happens when compiling multiple times with
-fltoenabled.