Some floating point test failures on Raspberry Pi 4 with Neon #41

KodrAus · 2020-10-16T00:31:42Z

Probably because #39 🙂

I've been testing stdsimd a bit on a Raspberry Pi 4 and ran into a couple float test failures with +neon:

cargo test --target armv7-unknown-linux-gnueabihf

succeeds, but:

RUSTFLAGS="-C target-feature=+neon" cargo test --target armv7-unknown-linux-gnueabihf

fails with:

failures:

---- ops_impl::f32::f32x16::fract_odd_floats stdout ----
thread 'ops_impl::f32::f32x16::fract_odd_floats' panicked at 'assertion failed: `(left == right)`
  left: `[0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), NaN (7fc00000), NaN (7fc00000), 0.000000000000000000000000000000000000011754944 (800000), -0.000000000000000000000000000000000000011754944 (80800000), 0.00000011920929 (34000000), -0.00000011920929 (b4000000), NaN (7fc00000), NaN (7fc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`,
 right: `[0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), 0.0 (0), NaN (7fc00000), NaN (7fc00000), 0.000000000000000000000000000000000000011754944 (800000), -0.000000000000000000000000000000000000011754944 (80800000), 0.00000011920929 (34000000), -0.00000011920929 (b4000000), NaN (7fc00000), NaN (ffc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`', crates/core_simd/tests/ops_impl/f32.rs:6:1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- ops_impl::f32::f32x2::fract_odd_floats stdout ----
thread 'ops_impl::f32::f32x2::fract_odd_floats' panicked at 'assertion failed: `(left == right)`
  left: `[NaN (7fc00000), NaN (7fc00000)]`,
 right: `[NaN (7fc00000), NaN (ffc00000)]`', crates/core_simd/tests/ops_impl/f32.rs:3:1

---- ops_impl::f32::f32x4::fract_odd_floats stdout ----
thread 'ops_impl::f32::f32x4::fract_odd_floats' panicked at 'assertion failed: `(left == right)`
  left: `[NaN (7fc00000), NaN (7fc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`,
 right: `[NaN (7fc00000), NaN (ffc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`', crates/core_simd/tests/ops_impl/f32.rs:4:1

---- ops_impl::f32::f32x8::fract_odd_floats stdout ----
thread 'ops_impl::f32::f32x8::fract_odd_floats' panicked at 'assertion failed: `(left == right)`
  left: `[0.000000000000000000000000000000000000011754944 (800000), -0.000000000000000000000000000000000000011754944 (80800000), 0.00000011920929 (34000000), -0.00000011920929 (b4000000), NaN (7fc00000), NaN (7fc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`,
 right: `[0.000000000000000000000000000000000000011754944 (800000), -0.000000000000000000000000000000000000011754944 (80800000), 0.00000011920929 (34000000), -0.00000011920929 (b4000000), NaN (7fc00000), NaN (ffc00000), 0.33333206 (3eaaaa80), -0.33333206 (beaaaa80)]`', crates/core_simd/tests/ops_impl/f32.rs:5:1


failures:
    ops_impl::f32::f32x16::fract_odd_floats
    ops_impl::f32::f32x2::fract_odd_floats
    ops_impl::f32::f32x4::fract_odd_floats
    ops_impl::f32::f32x8::fract_odd_floats

test result: FAILED. 2242 passed; 4 failed; 0 ignored; 0 measured; 0 filtered out

cat /proc/cpuinfo

processor       : 0
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 108.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 1
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 108.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 2
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 108.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

processor       : 3
model name      : ARMv7 Processor rev 3 (v7l)
BogoMIPS        : 108.00
Features        : half thumb fastmult vfp edsp neon vfpv3 tls vfpv4 idiva idivt vfpd32 lpae evtstrm crc32 
CPU implementer : 0x41
CPU architecture: 7
CPU variant     : 0x0
CPU part        : 0xd08
CPU revision    : 3

Hardware        : BCM2711
Revision        : c03112
Serial          : 10000000421d9bd8

rustc --version

rustc 1.49.0-nightly (e160e5cb8 2020-10-14)

uname -a

Linux raspberrypi 5.4.51-v7l+ #1333 SMP Mon Aug 10 16:51:40 BST 2020 armv7l GNU/Linux

I haven't investigated this much yet, but just thought I'd drop it here as a start.

The text was updated successfully, but these errors were encountered:

programmerjake · 2020-10-16T01:46:32Z

looks like Neon treats NaN signs differently than scalar floats. LLVM currently doesn't specify which NaNs are returned, so I'd consider this an over-strict test case, rather than the code being tested being incorrect.

hsivonen · 2020-10-16T08:01:56Z

As noted on the PR, RUSTFLAGS="-C target-feature=+neon" cargo test --target armv7-unknown-linux-gnueabihf is bad for performance, not suitable for deployment, and, therefore, not really that useful to test. The fix is --target thumbv7neon-unknown-linux-gnueabihf, which uses a NEON-enabled std.

workingjubilee · 2020-10-16T17:49:10Z

Hum, I thought thumbv7neon was for "T32" code and armv7 +neon would be the logical choice for "A32" code, so I feel I am missing some nuance here. Nonetheless, if it is really that bad in perf and unsuitable for deployment, we should seriously consider raising a rustc issue to lint against using that target in that way.

KodrAus · 2020-10-16T22:48:53Z

Yeh, I think armv7-unknown-linux-gnueabihf was the target rustup suggested for me so I went with it.

RUSTFLAGS="-C target-feature=+neon" cargo test --target thumbv7neon-unknown-linux-gnueabihf

does actually still fail in the same way as armv7-unknown-linux-gnueabihf.

hsivonen · 2020-10-19T08:38:31Z

There currently isn't a 32-bit ARM Linux target that would have non-Thumb instruction encoding and std compiled with NEON enabled.

The reason is that Firefox for Android needed a NEON-enabled target and already used the Thumb instruction encoding. The NEON-enabled Linux targets are consistent with that to enable testing of the same kind of codegen on non-Android Linux.

There wasn't a justification to spend build resources for a non-Thumb variant target without anyone showing up with a use case.

hsivonen · 2020-10-19T08:39:22Z

RUSTFLAGS="-C target-feature=+neon" cargo test --target thumbv7neon-unknown-linux-gnueabihf

The -C target-feature=+neon bit should be unnecessary in this case, since it's baked into the target.

KodrAus · 2020-10-19T21:06:48Z

Ok, I think I misunderstood you when you suggested the fix was to use thumbv7neon-unknown-linux-gnueabihf instead of armv7-unknown-linux-gnueabihf. I thought you meant that was a fix for these specific failing tests, not a fix for deploying sub-optimal code. I'm not actually trying to deploy any "real" code on this target, just messing around with stdsimd on it 🙂

So on @programmerjake's point, are these tests too strict? Or do we want to keep assumptions like this around for now to try flush out differences?

programmerjake · 2020-10-20T05:16:11Z

all FP tests should test for equality using the following function:

pub fn to_canonical(v: f32, flush_subnormals_to_zero: bool) -> f32 {
    match v.classify() {
        FpCategory::Nan => {
            // return canonical NaN because, IIRC, LLVM doesn't distuinguish between
            // signaling/quiet NaNs. If not, adjust to return canonical signaling/quiet NaN.
            f32::NAN
        }
        FpCategory::Subnormal if flush_subnormals_to_zero => 0.0 * v, // convert to signed zero
        _ => v,
    }
}
pub fn is_same(a: f32, b: f32, flush_subnormals_to_zero: bool) -> bool {
    to_canonical(a, flush_subnormals_to_zero).to_bits()
    == to_canonical(b, flush_subnormals_to_zero).to_bits()
}

flush_subnormals_to_zero should be true on the platforms such as Arm NEON that flush subnormals to zero for SIMD operations. most platforms don't by default.

thomcc · 2020-10-20T05:18:11Z

That's certainly a choice that could be made. The code as-is was deliberately attempting to flush out such differences, as they are distinguishable differences between the SIMD and a scalar implementation.

Edit: FWIW I'm more supportive of this for subnormal than for NaN.

KodrAus mentioned this issue Oct 16, 2020

add armv7 target #42

Closed

KodrAus added the ISA-ARMv7 ARM's 7th generation of CPUs label Oct 16, 2020

calebzulawski mentioned this issue Jul 27, 2023

Fix subnormals #357

Merged

calebzulawski closed this as completed in #357 Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Some floating point test failures on Raspberry Pi 4 with Neon #41

Some floating point test failures on Raspberry Pi 4 with Neon #41

KodrAus commented Oct 16, 2020

programmerjake commented Oct 16, 2020

hsivonen commented Oct 16, 2020

workingjubilee commented Oct 16, 2020

KodrAus commented Oct 16, 2020

hsivonen commented Oct 19, 2020 •

edited

Loading

hsivonen commented Oct 19, 2020

KodrAus commented Oct 19, 2020

programmerjake commented Oct 20, 2020

thomcc commented Oct 20, 2020 •

edited

Loading

Some floating point test failures on Raspberry Pi 4 with Neon #41

Some floating point test failures on Raspberry Pi 4 with Neon #41

Comments

KodrAus commented Oct 16, 2020

programmerjake commented Oct 16, 2020

hsivonen commented Oct 16, 2020

workingjubilee commented Oct 16, 2020

KodrAus commented Oct 16, 2020

hsivonen commented Oct 19, 2020 • edited Loading

hsivonen commented Oct 19, 2020

KodrAus commented Oct 19, 2020

programmerjake commented Oct 20, 2020

thomcc commented Oct 20, 2020 • edited Loading

hsivonen commented Oct 19, 2020 •

edited

Loading

thomcc commented Oct 20, 2020 •

edited

Loading