Skip to content

rustc 1.18.0 misaligned loads and stores on SPARC #43346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dhduvall opened this issue Jul 19, 2017 · 7 comments
Closed

rustc 1.18.0 misaligned loads and stores on SPARC #43346

dhduvall opened this issue Jul 19, 2017 · 7 comments
Labels
O-SPARC Target: SPARC processors

Comments

@dhduvall
Copy link

My apologies for not catching this sooner.

The struct alignment and packing that landed (well, was re-enabled) with 1.18.0 produces either improperly aligned structs or improperly aligned loads and stores (or both) on SPARC. The following test program demonstrates it (extracted from where I originally saw the compiler die):

pub struct Blake2bCtx {
    a: [u8; 8],
    b: u16,
    c: u8,
}

#[inline(never)]
fn blake2b_compress(ctx: &mut Blake2bCtx) {

    // Re-interpret the input buffer in the state as an array of little-endian
    // u64s, converting them to machine endianness. It's OK to modify the
    // buffer in place since this is the last time this data will be accessed
    // before it's overwritten.

    let m: &mut [u64; 1] = unsafe {
        let t: &mut [u8; 8] = &mut ctx.a;
        ::std::mem::transmute(t)
    };

    for word in &mut m[..] {
        *word = *word + 1;
    }
}

fn main() {
    let mut ctx = Blake2bCtx {
        a: [0; 8],
        b: 0,
        c: 0,
    };

    blake2b_compress(&mut ctx)
}

The disassembly looks like this:

_ZN5blake16blake2b_compress17h54f2249749134ad6E()
    blake::blake2b_compress::h54f2249749134ad6:      9d e3 bf 80  save      %sp, -0x80, %sp
    blake::blake2b_compress::h54f2249749134ad6+0x4:  f2 5e 20 02  ldx       [%i0 + 0x2], %i1
    blake::blake2b_compress::h54f2249749134ad6+0x8:  b2 06 60 01  add       %i1, 0x1, %i1
    blake::blake2b_compress::h54f2249749134ad6+0xc:  f2 76 20 02  stx       %i1, [%i0 + 0x2]
    blake::blake2b_compress::h54f2249749134ad6+0x10: 81 c7 e0 08  ret
    blake::blake2b_compress::h54f2249749134ad6+0x14: 81 e8 00 00  restore

Note the ldx of [%i0 + 0x2]. That would be valid if %i0 were itself misaligned, but it's not, and ldx requires an 8-byte alignment. I can attach the entire IR if necessary, but I believe these are the relevant bits:

; ModuleID = 'blake.cgu-0.rs'
source_filename = "blake.cgu-0.rs"
target datalayout = "E-m:e-i64:64-n32:64-S128"
target triple = "sparcv9-sun-solaris"

%Blake2bCtx = type { i16, [0 x i8], [8 x i8], [0 x i8], i8, [1 x i8] }
%"core::slice::IterMut<u64>" = type { i64*, [0 x i8], i64*, [0 x i8], %"core::marker::PhantomData<&mut u64>", [0 x i8] }
%"core::marker::PhantomData<&mut u64>" = type {}

@__rustc_debug_gdb_scripts_section__ = internal unnamed_addr constant [34 x i8] c"\01gdb_load_rust_pretty_printers.py\00", section ".debug_gdb_scripts", align 1

; Function Attrs: noinline nounwind uwtable
define internal fastcc void @_ZN5blake16blake2b_compress17h54f2249749134ad6E(%Blake2bCtx* dereferenceable(12)) unnamed_addr #0 !dbg !5 {
bb6:
  tail call void @llvm.dbg.value(metadata %Blake2bCtx* %0, i64 0, metadata !23, metadata !57), !dbg !58
  tail call void @llvm.dbg.value(metadata %Blake2bCtx* %0, i64 0, metadata !24, metadata !57), !dbg !59
  %1 = getelementptr inbounds %Blake2bCtx, %Blake2bCtx* %0, i64 0, i32 2, !dbg !60
  %2 = bitcast [8 x i8]* %1 to i64*, !dbg !61
  tail call void @llvm.dbg.value(metadata %"core::slice::IterMut<u64>"* undef, i64 0, metadata !39, metadata !62), !dbg !63
  tail call void @llvm.dbg.value(metadata i64* %2, i64 0, metadata !54, metadata !57), !dbg !64
  %3 = load i64, i64* %2, align 8, !dbg !65
  %4 = add i64 %3, 1, !dbg !65
  store i64 %4, i64* %2, align 8, !dbg !65
  tail call void @llvm.dbg.value(metadata %"core::slice::IterMut<u64>"* undef, i64 0, metadata !39, metadata !62), !dbg !63
  ret void, !dbg !66
}

I don't see anything obviously wrong in the code, but my knowledge here is pretty slim, so if it's obvious to someone else, I'd love the help. Otherwise, I'm probably going to be putting debugging statements into rustc::ty::layout::Struct::new, hope I'm poking around in the right area, and doing a lot of recompiling.

cc @binarycrusader

@jonas-schievink
Copy link
Contributor

You're transmuting from a u8 array, which has no alignment requirements, to a u64 array, which has an alignment requirement of 8. There was no guarantee that this would work, even before that patch, since the Blake2bCtx struct only has an alignment of 2 due to the u16 inside.

@dhduvall
Copy link
Author

dhduvall commented Jul 19, 2017

Perhaps I failed to trim down blake2b.rs correctly, but this is the same error I'm seeing in the compiler, out of https://github.com/rust-lang/rust/blob/master/src/librustc_data_structures/blake2b.rs

Though there the loads are more complex:

    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x14:  ca 5e 00 00  ldx       [%i0], %g5
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x18:  c8 5e 20 08  ldx       [%i0 + 0x8], %g4
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x1c:  c6 5e 20 10  ldx       [%i0 + 0x10], %g3
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x20:  fa 5e 20 18  ldx       [%i0 + 0x18], %i5
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x24:  e0 5e 20 20  ldx       [%i0 + 0x20], %l0
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x28:  f8 5e 20 28  ldx       [%i0 + 0x28], %i4
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x2c:  f6 5e 20 30  ldx       [%i0 + 0x30], %i3
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x30:  d2 5e 20 5a  ldx       [%i0 + 0x5a], %o1
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x34:  c4 5e 20 38  ldx       [%i0 + 0x38], %g2
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x38:  e2 5e 20 40  ldx       [%i0 + 0x40], %l1
    rustc_data_structures::blake2b::blake2b_compress::hb082a32bbe0c2ce5+0x3c:  d0 5e 20 48  ldx       [%i0 + 0x48], %o0

(See the +0x5a which is where we get the SIGBUS.)

@jonas-schievink
Copy link
Contributor

Ah, ok. That would have worked before the patch, of course.

@dhduvall
Copy link
Author

Yes. Or by using -Z fuel=<mod>=0.

@sanxiyn sanxiyn added the O-SPARC Target: SPARC processors label Jul 20, 2017
@dhduvall
Copy link
Author

It doesn't look like I can use RUSTFLAGS to inject -Z fuel=rustc_data_structures::blake2b=0, since my bootstrapping compiler is 1.17.0 and doesn't understand that.

I reimplemented the transmute() in blake2b_compress with a pair of for loops and lots of shifting and oring, and the compiler works again, even if it uses a lot more instructions doing so (and blake2b_selftest() passes). If there's a faster way that's akin to transmute(), but works, I've no idea. I looked at the various alternatives in the transmute() docs, but when I got a warning about a non-scalar cast, I gave up. I don't know how performance-critical this is, anyway.

I could, I suppose, build a patched 1.18.0 and use that for the bootstrap so that the above RUSTFLAGS workaround could then work to build a good compiler against the existing sources, but I keep worrying about the use of -Z going away, and don't really want to depend on that.

I'll leave this open until someone else wants to pick it up. If anyone thinks it'd be worthwhile for me to simply submit a PR with my slow code, and get it code-reviewed into existence as folks have time, I'm happy to do that.

@ollie27
Copy link
Member

ollie27 commented Jul 20, 2017

I think the simplest workaround would be to add #[repr(C)] to Blake2bCtx. If you can confirm that that works then it would be worth submitting that. There's probably a better way of fixing this though like changing the type of Blake2bCtx.b to [u64; 16] because transmuting from &mut [u64; 16] to &mut [u8; 128] should be safe.

@dhduvall
Copy link
Author

Indeed, #[repr(C)] does the trick.

dhduvall pushed a commit to dhduvall/rust that referenced this issue Jul 25, 2017
On SPARC, optimization fuel ends up emitting incorrect load and store
instructions for the transmute() call in blake2b_compress().  If we
force Blake2bCtx to be repr(C), the problem disappears.

Fixes rust-lang#43346
Mark-Simulacrum added a commit to Mark-Simulacrum/rust that referenced this issue Jul 26, 2017
Constrain the layout of Blake2bCtx for proper SPARC compilation

On SPARC, optimization fuel ends up emitting incorrect load and store
instructions for the transmute() call in blake2b_compress().  If we
force Blake2bCtx to be repr(C), the problem disappears.

Fixes rust-lang#43346
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-SPARC Target: SPARC processors
Projects
None yet
Development

No branches or pull requests

4 participants