Skip to content

Segfault when using rust inside a go program #64834

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
joelwurtz opened this issue Sep 27, 2019 · 1 comment
Closed

Segfault when using rust inside a go program #64834

joelwurtz opened this issue Sep 27, 2019 · 1 comment
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example

Comments

@joelwurtz
Copy link

This is really a weird bug that occurs on one of our program, i managed to make a reproducable example under this repository: https://github.com/joelwurtz/segfault-golang-with-rust

This only happens when calling a specific function on the https://github.com/servo/html5ever library with a specific parameter, and i also make an issue on this library: servo/html5ever#393

However i'm wondering if the bug is not coming from rust-lang

What it does:

  • We create a static library (on the musl target)
  • We compile this static library on go with musl gcc
  • We create a signal channel before calling the exposed api
  • We call this function after which cause the segfault (which happens on the LocalName::from(&*attribute_name) on the https://github.com/servo/html5ever library call with a not defined local name

This result in a segfault 100% of the time

There are some workaround for this:

  • Adding the attribute into the local_names.txt file will fix the problem
  • Calling the exposed API before creating the signal handler will fix the problem

I really don't know what happens inside and it's really difficult to trace it due to the way of compiling this.

I'm wondering if using compiler_builtins in 0.1.19 version will help tracing this bug due to rust-lang/compiler-builtins@985a430

If you also have insight on how i can produce a better debugging log i will gladly do it.

@jonas-schievink jonas-schievink added C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example labels Sep 27, 2019
@nagisa
Copy link
Member

nagisa commented Sep 29, 2019

This is the relevant stack-trace

#0  __rust_probestack () at /cargo/registry/src/github.1485827954.workers.dev-1ecc6299db9ec823/compiler_builtins-0.1.18/src/probestack.rs:55
#1  0x000000000050134a in std::sync::mutex::Mutex<T>::new (t=...) at /rustc/488381ce9ef0ceabe83b73127c659e5d38137df0/src/libstd/sync/mutex.rs:168
#2  0x0000000000501707 in <string_cache::atom::STRING_CACHE as core::ops::deref::Deref>::deref::__static_ref_initialize () at /home/nagisa/.cargo/registry/src/github.1485827954.workers.dev-1ecc6299db9ec823/string_cache-0.7.3/src/atom.rs:46
#3  core::ops::function::FnOnce::call_once () at /rustc/488381ce9ef0ceabe83b73127c659e5d38137df0/src/libcore/ops/function.rs:227
#4  0x000000000050117c in lazy_static::lazy::Lazy<T>::get::{{closure}} () at /home/nagisa/.cargo/registry/src/github.1485827954.workers.dev-1ecc6299db9ec823/lazy_static-1.4.0/src/inline_lazy.rs:31
#5  0x000000000050132e in std::sync::once::Once::call_once::{{closure}} () at /rustc/488381ce9ef0ceabe83b73127c659e5d38137df0/src/libstd/sync/once.rs:225
#6  0x00000000005325a8 in std::sync::once::Once::call_inner () at src/libstd/sync/once.rs:392
#7  0x00000000005012b3 in std::sync::once::Once::call_once (self=0x70aee8 <<string_cache::atom::STRING_CACHE as core::ops::deref::Deref>::deref::__stability::LAZY+32784>, f=...)
    at /rustc/488381ce9ef0ceabe83b73127c659e5d38137df0/src/libstd/sync/once.rs:225
#8  0x0000000000505afb in lazy_static::lazy::Lazy<T>::get (self=0x702ed8 <<string_cache::atom::STRING_CACHE as core::ops::deref::Deref>::deref::__stability::LAZY>, f=0x710060)
    at /home/nagisa/.cargo/registry/src/github.1485827954.workers.dev-1ecc6299db9ec823/lazy_static-1.4.0/src/inline_lazy.rs:30
#9  <string_cache::atom::STRING_CACHE as core::ops::deref::Deref>::deref::__stability () at <::lazy_static::__lazy_static_internal macros>:16
#10 <string_cache::atom::STRING_CACHE as core::ops::deref::Deref>::deref (self=0x600e30) at <::lazy_static::__lazy_static_internal macros>:18
#11 0x00000000004d628c in <string_cache::atom::Atom<Static> as core::convert::From<alloc::borrow::Cow<str>>>::from (string_to_add=...) at /home/nagisa/.cargo/registry/src/github.1485827954.workers.dev-1ecc6299db9ec823/string_cache-0.7.3/src/atom.rs:310
#12 0x00000000004d6875 in <string_cache::atom::Atom<Static> as core::convert::From<&str>>::from (string_to_add=...) at /home/nagisa/.cargo/registry/src/github.1485827954.workers.dev-1ecc6299db9ec823/string_cache-0.7.3/src/atom.rs:323
#13 0x00000000004d5fb0 in segfaulthtml5evergolang::do_segfault (data=...) at src/lib.rs:32
#14 0x00000000004d5c9b in api_do_segfault (data_cstr=0x710040 "segfault") at src/lib.rs:9
#15 0x00000000004c9e10 in runtime.asmcgocall () at /nix/store/vdlp4402c4vdk86w74rk1njx5vicdlag-go-1.12.9/share/go/src/runtime/asm_amd64.s:635
#16 0x00007ffff5b1a8f8 in ?? ()
#17 0x00000000004c71b8 in runtime.goready.func1 () at /nix/store/vdlp4402c4vdk86w74rk1njx5vicdlag-go-1.12.9/share/go/src/runtime/proc.go:312
#18 0x000000c000000180 in ?? ()
#19 0x00000000004a5c40 in ?? () at /nix/store/vdlp4402c4vdk86w74rk1njx5vicdlag-go-1.12.9/share/go/src/runtime/proc.go:1082
#20 0x0000000000000027 in ?? ()
#21 0x0000000000020000 in ?? ()
#22 0x0000000000000000 in ?? ()

Mutex<T>::new requests for 0x18098 bytes of stack. That’s around 96kB of it, which is fairly reasonable, given that stacks are usually megabytes in size. This request occurs most likely because T is very large. __rust_probestack attempts to verify for sufficient stack, of which there isn’t, which is why this is failing. As go runtime is responsible for allocating the stack in this case, it looks to me like insufficient stack was allocated.


For some arbitrary execution, on entry:

(gdb) p $rsp
$2 = (*mut ()) 0x7ffff5b02370

and it eventually segfaults when the stack probing routine reaches…

(gdb) p $rsp
$3 = (*mut ()) 0x7ffff5af9370

With the stack size request of 0x18000 all pages between address 0x7ffff5b02370 and address 0x7ffff5aea370 must be valid to read and write. Note that the segfaulting address is well within this range.

Looking at the memory maps for the process:

...
7ffff5af8000-7ffff5afa000 ---p 00000000 00:00 0
7ffff5afa000-7ffff5b1b000 rw-p 00000000 00:00 0 
...

You can see that the stack address which did not probe successfully is inside of the non-read non-write page(s). These are usually called a "guard page" and are used to enforce that no code will write to outside its stack. Thus the too-small-stack hypothesis is confirmed.

Closing as a not-a-bug.

cc @SimonSapin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-bug Category: This is a bug. E-needs-mcve Call for participation: This issue has a repro, but needs a Minimal Complete and Verifiable Example
Projects
None yet
Development

No branches or pull requests

3 participants