Skip to content

with_exposed_provenance(0).with_addr(addr) is compiled as gep null #131741

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
saethlin opened this issue Oct 15, 2024 · 5 comments
Open

with_exposed_provenance(0).with_addr(addr) is compiled as gep null #131741

saethlin opened this issue Oct 15, 2024 · 5 comments
Labels
A-strict-provenance Area: Strict provenance for raw pointers C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team

Comments

@saethlin
Copy link
Member

Miri executes this program without error, but I believe our lowering to LLVM IR adds UB:

#![feature(strict_provenance, exposed_provenance)]
#[no_mangle]
pub fn from_exposed_null(addr: usize) -> *const u8 {
    let ptr = 0x0 as *const u8;
    ptr.with_addr(addr)
}

// Main for Miri
fn main() {
    let data = 0u8;
    let addr = std::ptr::addr_of!(data).expose_provenance();
    let ptr = from_exposed_null(addr);
    unsafe {
        println!("{}", *ptr);
    }
}

With optimizations enabled, LLVM cleans up from_exposed_null to

define noalias noundef ptr @from_exposed_null(i64 noundef %addr) unnamed_addr #0 !dbg !7 {
  %0 = getelementptr i8, ptr null, i64 %addr, !dbg !12
  ret ptr %0, !dbg !28
}

With -Cno-prepopulate-passes I believe the codegen also returns a pointer that definitely has no provenance, though it's just harder to read.
godbolt: https://godbolt.org/z/s414rfK44

I think this happens because codegen is implicitly const-propagating through the int-to-pointer cast via OperandValue. At least I don't see any other way to get from this MIR:

    bb0: {
        _2 = const 0_usize as *const u8 (PointerWithExposedProvenance);
        StorageLive(_3);
        StorageLive(_5);
        StorageLive(_6);
        StorageLive(_4);
        _4 = copy _2 as usize (Transmute);
        _3 = move _4 as isize (IntToInt);
        StorageDead(_4);
        _5 = copy _1 as isize (IntToInt);
        _6 = Sub(copy _5, copy _3);
        _0 = arith_offset::<u8>(move _2, move _6) -> [return: bb1, unwind unreachable];
    }

    bb1: {
        StorageLive(_7);
        _7 = copy _0 as *const () (PtrToPtr);
        StorageDead(_7);
        StorageDead(_6);
        StorageDead(_5);
        StorageDead(_3);
        return;
    }

To this LLVM IR

define noundef ptr @from_exposed_null(i64 noundef %addr) unnamed_addr #0 !dbg !7 {
start:
  %0 = alloca [8 x i8], align 8, !dbg !12
  %offset = sub i64 %addr, 0, !dbg !12
  call void @llvm.lifetime.start.p0(i64 8, ptr %0), !dbg !28
  %1 = getelementptr i8, ptr null, i64 %offset, !dbg !28
  store ptr %1, ptr %0, align 8, !dbg !28
  %self = load ptr, ptr %0, align 8, !dbg !28
  call void @llvm.lifetime.end.p0(i64 8, ptr %0), !dbg !28
  ret ptr %self, !dbg !34
}

godbolt: https://godbolt.org/z/MjPvcdj9h

@saethlin saethlin added the C-bug Category: This is a bug. label Oct 15, 2024
@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Oct 15, 2024
@jwong101
Copy link
Contributor

jwong101 commented Oct 15, 2024

This ends up not mattering, since LLVM also folds inttoptr i64 0 to ptr to null:

define void @src(i64 noundef) {
start:
  %1 = inttoptr i64 0 to ptr
  %2 = getelementptr i8, ptr %1, i64 %0
  store i8 0, ptr %2
  ret void
}

With -passes='early-cse,instcombine' -debug:

; EarlyCSE Simplify:   %1 = inttoptr i64 0 to ptr  to: ptr null
define void @src(i64 noundef %0) {
  %1 = getelementptr i8, ptr null, i64 %0
  store i8 poison, ptr %1, align 1
  ret void
}

Alive2 thinks this is illegal:

https://alive2.llvm.org/ce/z/EJxazG

However, their LangRef also says that:

  • An integer constant other than zero or a pointer value returned from a function not defined within LLVM may be associated with address ranges allocated through mechanisms other than those provided by LLVM.
  • A pointer value formed from a scalar getelementptr operation is based on the pointer-typed operand of the getelementptr.
  • A pointer value formed by an inttoptr is based on all pointer values that contribute (directly or indirectly) to the computation of the pointer’s value.

So they might have a special case where inttoptr i64 0 to ptr has no provenance? I'm not sure what their intended semantics are for that.

@saethlin
Copy link
Member Author

saethlin commented Oct 15, 2024

This ends up not mattering

It definitely matters. There's no hope of the full compiler getting this right if rustc corrupts the program. We should have an LLVM bug filed for this; can you file one or link an existing issue?

@jwong101
Copy link
Contributor

There appears to be another issue about this in the UCG repo here: rust-lang/unsafe-code-guidelines#507.

Also, I think this is a similar issue about i2p 0 != null: #107326

On the LLVM side:

I'll see if I can find any issues about i2p 0 != null. If not I'll open a new issue.

I think the underlying cause is this line of code:
https://github.com/llvm/llvm-project/blob/ec78f0da0e9b1b8e2b2323e434ea742e272dd913/llvm/lib/IR/ConstantFold.cpp#L146

  if (V->isNullValue() && !DestTy->isX86_AMXTy() &&
      opc != Instruction::AddrSpaceCast)
    return Constant::getNullValue(DestTy);

which gets called by simplifyInstruction on int2ptr.

@lolbinarycat lolbinarycat added A-strict-provenance Area: Strict provenance for raw pointers WG-ucg labels Oct 16, 2024
@saethlin saethlin added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. WG-ucg labels Oct 18, 2024
@RalfJung
Copy link
Member

RalfJung commented Nov 3, 2024

Cc @rust-lang/opsem @nikic

@nikic
Copy link
Contributor

nikic commented Nov 3, 2024

Yeah, the fact that inttoptr 0 folds to null is a "well known" issue in LLVM, and as usual, hard to fix :) We do go out of the way not to produce ptrtoint 0 in cases where this is likely to matter in practice.

See also AliveToolkit/alive2#929 for some related discussion from an alive2 perspective. I guess the memset(0) question at the end is resolved on the Rust side by saying that transmuting that memset to a pointer wouldn't have provenance anyway, but on the LLVM side this is still an unresolved issue (cf byte type).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-strict-provenance Area: Strict provenance for raw pointers C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

No branches or pull requests

6 participants