Skip to content

Some floating point test failures on Raspberry Pi 4 with Neon #41

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
KodrAus opened this issue Oct 16, 2020 · 9 comments · Fixed by #357
Closed

Some floating point test failures on Raspberry Pi 4 with Neon #41

KodrAus opened this issue Oct 16, 2020 · 9 comments · Fixed by #357
Labels
ISA-ARMv7 ARM's 7th generation of CPUs

Comments

@KodrAus
Copy link
Contributor

KodrAus commented Oct 16, 2020

Probably because #39 🙂

I've been testing stdsimd a bit on a Raspberry Pi 4 and ran into a couple float test failures with +neon:

cargo test --target armv7-unknown-linux-gnueabihf

succeeds, but:

RUSTFLAGS="-C target-feature=+neon" cargo test --target armv7-unknown-linux-gnueabihf

fails with:

failures:

---- ops_impl::f32::f32x16::fract_odd_floats stdout ----
thread 'ops_impl::f32::f32x16::fract_odd_floats' panicked at 'assertion failed: `(left == right)`
  left: `[0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), NaN (7fc00000), NaN (7fc00000), 0.000000000000000000000000000000000000011754944 (800000), -0.000000000000000000000000000000000000011754944 (80800000), 0.00000011920929 (34000000), -0.00000011920929 (b4000000), NaN (7fc00000), NaN (7fc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`,
 right: `[0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), NaN (7fc00000), NaN (7fc00000), 0.000000000000000000000000000000000000011754944 (800000), -0.000000000000000000000000000000000000011754944 (80800000), 0.00000011920929 (34000000), -0.00000011920929 (b4000000), NaN (7fc00000), NaN (ffc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`', crates/core_simd/tests/ops_impl/f32.rs:6:1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- ops_impl::f32::f32x2::fract_odd_floats stdout ----
thread 'ops_impl::f32::f32x2::fract_odd_floats' panicked at 'assertion failed: `(left == right)`
  left: `[NaN (7fc00000), NaN (7fc00000)]`,
 right: `[NaN (7fc00000), NaN (ffc00000)]`', crates/core_simd/tests/ops_impl/f32.rs:3:1

---- ops_impl::f32::f32x4::fract_odd_floats stdout ----
thread 'ops_impl::f32::f32x4::fract_odd_floats' panicked at 'assertion failed: `(left == right)`
  left: `[NaN (7fc00000), NaN (7fc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`,
 right: `[NaN (7fc00000), NaN (ffc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`', crates/core_simd/tests/ops_impl/f32.rs:4:1

---- ops_impl::f32::f32x8::fract_odd_floats stdout ----
thread 'ops_impl::f32::f32x8::fract_odd_floats' panicked at 'assertion failed: `(left == right)`
  left: `[0.000000000000000000000000000000000000011754944 (800000), -0.000000000000000000000000000000000000011754944 (80800000), 0.00000011920929 (34000000), -0.00000011920929 (b4000000), NaN (7fc00000), NaN (7fc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`,
 right: `[0.000000000000000000000000000000000000011754944 (800000), -0.000000000000000000000000000000000000011754944 (80800000), 0.00000011920929 (34000000), -0.00000011920929 (b4000000), NaN (7fc00000), NaN (ffc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`', crates/core_simd/tests/ops_impl/f32.rs:5:1


failures:
    ops_impl::f32::f32x16::fract_odd_floats
    ops_impl::f32::f32x2::fract_odd_floats
    ops_impl::f32::f32x4::fract_odd_floats
    ops_impl::f32::f32x8::fract_odd_floats

test result: FAILED. 2242 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out
cat /proc/cpuinfo

processor       : 0
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 108.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 1
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 108.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 2
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 108.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 3
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 108.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

Hardware        : BCM2711
Revision        : c03112
Serial          : 10000000421d9bd8
rustc --version

rustc 1.49.0-nightly (e160e5cb8 2020-10-14)
uname -a

Linux raspberrypi 5.4.51-v7l+ #1333 SMP Mon Aug 10 16:51:40 BST 2020 armv7l GNU/Linux

I haven't investigated this much yet, but just thought I'd drop it here as a start.

@programmerjake
Copy link
Member

looks like Neon treats NaN signs differently than scalar floats. LLVM currently doesn't specify which NaNs are returned, so I'd consider this an over-strict test case, rather than the code being tested being incorrect.

@hsivonen
Copy link
Member

As noted on the PR, RUSTFLAGS="-C target-feature=+neon" cargo test --target armv7-unknown-linux-gnueabihf is bad for performance, not suitable for deployment, and, therefore, not really that useful to test. The fix is --target thumbv7neon-unknown-linux-gnueabihf, which uses a NEON-enabled std.

@workingjubilee
Copy link
Member

Hum, I thought thumbv7neon was for "T32" code and armv7 +neon would be the logical choice for "A32" code, so I feel I am missing some nuance here. Nonetheless, if it is really that bad in perf and unsuitable for deployment, we should seriously consider raising a rustc issue to lint against using that target in that way.

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 16, 2020

Yeh, I think armv7-unknown-linux-gnueabihf was the target rustup suggested for me so I went with it.

RUSTFLAGS="-C target-feature=+neon" cargo test --target thumbv7neon-unknown-linux-gnueabihf

does actually still fail in the same way as armv7-unknown-linux-gnueabihf.

@KodrAus KodrAus added the ISA-ARMv7 ARM's 7th generation of CPUs label Oct 16, 2020
@hsivonen
Copy link
Member

hsivonen commented Oct 19, 2020

There currently isn't a 32-bit ARM Linux target that would have non-Thumb instruction encoding and std compiled with NEON enabled.

The reason is that Firefox for Android needed a NEON-enabled target and already used the Thumb instruction encoding. The NEON-enabled Linux targets are consistent with that to enable testing of the same kind of codegen on non-Android Linux.

There wasn't a justification to spend build resources for a non-Thumb variant target without anyone showing up with a use case.

@hsivonen
Copy link
Member

RUSTFLAGS="-C target-feature=+neon" cargo test --target thumbv7neon-unknown-linux-gnueabihf

The -C target-feature=+neon bit should be unnecessary in this case, since it's baked into the target.

@KodrAus
Copy link
Contributor Author

KodrAus commented Oct 19, 2020

Ok, I think I misunderstood you when you suggested the fix was to use thumbv7neon-unknown-linux-gnueabihf instead of armv7-unknown-linux-gnueabihf. I thought you meant that was a fix for these specific failing tests, not a fix for deploying sub-optimal code. I'm not actually trying to deploy any "real" code on this target, just messing around with stdsimd on it 🙂

So on @programmerjake's point, are these tests too strict? Or do we want to keep assumptions like this around for now to try flush out differences?

@programmerjake
Copy link
Member

all FP tests should test for equality using the following function:

pub fn to_canonical(v: f32, flush_subnormals_to_zero: bool) -> f32 {
    match v.classify() {
        FpCategory::Nan => {
            // return canonical NaN because, IIRC, LLVM doesn't distuinguish between
            // signaling/quiet NaNs. If not, adjust to return canonical signaling/quiet NaN.
            f32::NAN
        }
        FpCategory::Subnormal if flush_subnormals_to_zero => 0.0 * v, // convert to signed zero
        _ => v,
    }
}
pub fn is_same(a: f32, b: f32, flush_subnormals_to_zero: bool) -> bool {
    to_canonical(a, flush_subnormals_to_zero).to_bits()
    == to_canonical(b, flush_subnormals_to_zero).to_bits()
}

flush_subnormals_to_zero should be true on the platforms such as Arm NEON that flush subnormals to zero for SIMD operations. most platforms don't by default.

@thomcc
Copy link
Member

thomcc commented Oct 20, 2020

That's certainly a choice that could be made. The code as-is was deliberately attempting to flush out such differences, as they are distinguishable differences between the SIMD and a scalar implementation.

Edit: FWIW I'm more supportive of this for subnormal than for NaN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ISA-ARMv7 ARM's 7th generation of CPUs
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants