-
Notifications
You must be signed in to change notification settings - Fork 225
WIP: Emulate the vsyscall page in userspace in the x86_64 Docker image #157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Since some recent distros are shipping with vsyscall=none by default, the manylinux1 Docker image doesn't work. Fortunately, we can emulate everything in userspace by catching segmentation faults for the vsyscall addresses and forcing the program to execute a normal syscall instead. Install a global preload library to do this, and also attempt to keep other segmentation fault handlers working. This is brittle and isn't intended for anything other than running the Docker image long enough to build some wheels.
Oh my. |
From The Linux Programming Interface, p. 452: SIGBUS, SIGFPE, SIGILL, and SIGSEGV can be generated as a consequence of a hardware exception.... SUSv3 specifies that the behavior of a process is undefined if it returns from a handler for the signal, or if it ignores it or blocks the signal... * Blocking the signal: ... On Linux 2.4 and earlier, the kernel simply ignores attempts to block a hardware-generated signal; the signal is delivered to the process anyway, and then either terminates the process or is caught by a signal handler, if one has been established. Starting with Linux 2.6, if the signal is blocked, then the process is always immediately killed by that signal, even if the process has installed a handler for the signal. (The rationale for the Linux 2.6 change in the treatment of blocked hardware-generated signals was that the Linux 2.4 behavior hid bugs and could cause deadlocks in threaded programs.) Some programs, like rpm, aggressively block signals with masks that exclude SIGSEGV. Under normal circumstances with Linux 2.6 or later this always terminate the process. However, with our user-space SIGSEGV-based vsyscall handling, sometimes these programs terminate, as described above, while other times they resume execution at the instruction that accessed the offending address and enter into an infinite loop. Presumably our signal handler would have run with Linux 2.4! This commit patches sigprocmask(2) to remove SIGSEGV from a new signal set in a way that should be invisible to the program that's installing it.
Whew! I've been testing this patch in Docker containers on my desktop with I opened a PR to fix a segfault in I think maybe https://gist.github.com/markrwilliams/786d855e56ca88ba2ea76c4304b68a02#file-gcc-sigsegv-L303 |
Wouldn't it be simpler and more reliable to ...I guess "simpler and more reliable" is not necessarily the goal here though, given that we already have a working patch for glibc. |
@njsmith It would in fact be much simpler and preferable, but the address is in kernelspace so we can't map it from userspace. |
A code grep indicates gcc is probably calling To expand a bit on why I think this is a better plan than patching glibc:
|
Ah, darn. My general concern is that there's a risk we'll find ourselves hunting down weird corner cases here for several years. These images have a very wide user base, most of whom have no idea what this witchcraft is, and the symptoms are likely to be super obscure, so it'll be hard to even diagnose why things are breaking and get the right people looking at them.
One option would be to use this just for building the patched glibc, in case we need to bootstrap that on a system with vsyscall=none.
YAGNI. If this unlikely event occurs, we can always reevaluate.
It seems like there are probably other, less risky ways to address this? Do we just need a spinner process to tell Travis that we haven't frozen? What about using circleci or the auto-builders that the image repositories use? Re: |
One thing I'd really like to do is make this module have no effect on systems booted without Then this patch is strictly an improvement - it will only have an effect if you were going to segfault anyway, and it will eliminate some segfaults - hopefully all of them but the worst case is that you still segfault.
Which I acknowledge is an argument that this approach is pretty brittle... I was hoping to only do the things needed to compile software using the compilers in the image, but it's certainly true that someone might be wgetting some random commercial compiler during their build or whatever. (Also, another approach that doesn't actually help the build but might be worth doing: add something to detect if the vsyscall page is missing, print a clear error to stderr about what's going on, and exit instead of just segfaulting.) |
@geofft patching
The only
Running
The conspicuous absence of
I don't think this means this approach is dead in the water yet. An obvious next step is to patch out @njsmith I agree that the immediate value of this PR is that it would allow us to patch I also think it will take non-trivial to get this right :( Maybe it's better to move this to its own repository? Regardless, this PR has convinced me that patching |
https://github.com/geofft/manylinux/blob/ptrace/docker/vsyscall_emu/vsyscall_trace.c I haven't tested this against Docker, but I have confirmed that attaching it to the bash in my terminal and running a bash + libc from Wheezy causes things to work fine, and there's no perceivable slowdown. As soon as I ^C the tracer, python, bash, etc. start dying again. @markrwilliams (or anyone) - can you try running this on the pid of your Docker daemon on a vsyscall=none host, and see if the CentOS 5 Docker image or the normal manylinux1 Docker image will run? I'm not familiar enough with Docker to know if we can ship this inside the container unprivileged, or this would just be a separate tool you'd have to manually run (which would require access to the Docker socket but not a reboot, so it should help with CI etc.). Perhaps the right answer is to somehow get this into Docker itself, to preserve Docker's implicit ABI compatibility promise. |
0449b6e
to
4b62b83
Compare
Closing in favor of #158, which uses the ptrace-based approach. |
This is an LD_PRELOAD to catch segfaults from a kernel booted with
vsyscall=none
and turn them into normal syscalls. It works well enough to run bash from wheezy, but I'm not sure how well it works in general and it is probably extremely buggy in the general case. I'm mostly posting it here to run Travis against the Dockerfile and for @markrwilliams' feedback / testing. :) I'll update this comment once it's close to ready for merge.