-
Notifications
You must be signed in to change notification settings - Fork 13.6k
Integrated arm assembler doesn't understand vuzpq.u16 #20797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
assigned to @rengolin |
As a temporary workaround, you can use the real assembly instruction that this is supposed to be: vuzp.u16 This is a GAS extension AFAIK that basically is meant to help prevent accidental register swaps: vuzp takes either dx or qx while vuzpq only accepts qx. Im torn on this. This is unlikely to be common and is really an extension, but can be somewhat useful for users. |
Hi Saleem, This falls into the category of being ridiculously simple and clean that it may actually be worth it. But I also don't know how widespread that is, so it's hard to measure its usefulness, given that we don't want to encourage usage outside of the ARM ARM. I have a patch ready, but I'm not sure I should apply. Nico, could this one be changed in the source? cheers, |
As far as I can tell, this is only used in a single place in skia assembly, so we could change that. However, gcc's arm_neon.h also has a vuzpq_u16() intrinsic that's used in two places. Its implementation looks different from the vuzp_u16() intrinsic implementation -- are you sure that they're the same instruction? (vuzpq_u16() calls __builtin_neon_vuzpv8hi(), while vuzp_u16() calls __builtin_neon_vuzpv4hi(). The former takes two and returns an uint16x8x2_t, the later uses uint16x4x2_t instead.) It seems like it's probably a good idea to support the instructions that have intrinsics in arm_neon.h? |
Well, intrinsics are not always directly tied to instructions, and that's the point of having them in the first place, so we can abstract things. Both intrinsics will map to the same instructions (with different registers). Let's change this on the source and mark is as won't fix for now. If we find another bug in a less friendly source, we re-open this and add the aliases. :) Thanks! |
And one implementation ending in 8hi and the other in 4hi doesn't make a difference? |
Nope. They're both lowered by the same piece of code and the only resulting difference is the register they operate on, which vzip/vuzp already support. Having them in arm_neon.h is distinct from having them as an assembly alias. If I add it to the assembler, the only thing I'd do is to change vzipq back to vzip and vuzpq to vuzp. |
Extended Description
This works fine with gcc:
thakis@ubu:
$ arm-linux-gnueabihf-g++ -c test.cc -mfpu=neon$ cat test.ccthakis@ubu:
void foo() {
asm volatile ("vuzpq.u16 q0, q1\n\t" : : :);
}
But clang says:
thakis$ ~/src/llvm-build/bin/clang -target arm-linux-androideabi -c -mfpu=neon test.cc
:1:2: error: invalid instruction
vuzpq.u16 q0, q1
^
1 error generated.
(this is used in skia)
The text was updated successfully, but these errors were encountered: