Description
What version of Go are you using (go version
)?
$ go version go version go1.20.6 linux/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/cyphar/.cache/go-build" GOENV="/home/cyphar/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/cyphar/.local/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/cyphar/.local" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/lib64/go/1.20" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/lib64/go/1.20/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.20.6" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/cyphar/.local/src/github.com/opencontainers/runc/go.mod" GOWORK="" CGO_CFLAGS="-O2 -g" CGO_CPPFLAGS="" CGO_CXXFLAGS="-O2 -g" CGO_FFLAGS="-O2 -g" CGO_LDFLAGS="-O2 -g" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3122356899=/tmp/go-build -gno-record-gcc-switches" cyphar@senku
What did you do?
For a bit of background, I am working on a runc patch to move some code we currently do in C to Go. The particular issue I ran into is related to some slightly awful stuff we do in runc in order to protect against certain container attacks (such as ) where we create a copy of the running executable (from /proc/self/exe
) and actually execute a copy of the binary (usually a memfd) when running code inside the container.
I am giving this background to preempt questions about "why on earth do you need to do this" when I give you the example program. 😅
https://go.dev/play/p/EN7Bf-OThar
package main
import (
"fmt"
"os"
"os/exec"
"runtime"
"syscall"
)
func mustOpen(path string) *os.File {
f, err := os.Open(path)
if err != nil {
panic(err)
}
return f
}
func main() {
// A file we want to execute.
binTrueOriginal := mustOpen("/bin/true")
// A random other file we cannot execute.
devNullOriginal := mustOpen("/dev/null")
// Change the file descriptors such that devNull is a large descriptor and
// binTrue is one higher. This will cause binTrue to become nextfd and thus
// be clobbered by the devNull copy made for the shuffling.
devNullFd := 9000
binTrueFd := devNullFd + 1
if err := syscall.Dup2(int(devNullOriginal.Fd()), devNullFd); err != nil {
panic(err)
}
devNull := os.NewFile(uintptr(devNullFd), "/dev/null (dup'd)")
if err := syscall.Dup2(int(binTrueOriginal.Fd()), binTrueFd); err != nil {
panic(err)
}
binTrue := os.NewFile(uintptr(binTrueFd), "/dev/null (dup'd)")
// Try to run binTrue through /proc/self/fd/$n.
path := fmt.Sprintf("/proc/self/fd/%d", binTrue.Fd())
cmd := exec.Command(path)
cmd.ExtraFiles = []*os.File{devNull}
err := cmd.Run()
fmt.Printf("run /bin/true: %v\n", err)
runtime.KeepAlive(binTrue)
}
If you adjust binTrueFd to be any other value, you'll see the program runs without issues.
The workaround for this problem is to pass the intended executable file as an attr.Files
, even though we don't use it, but this results in a non-O_CLOEXEC
descriptor being passed to the child which I consider a security risk (at least in the context of runc). We have many other protections against leaking file descriptors to containers, so this isn't a problem for us -- but it seems that this is an actual bug in the stdlib that should be fixed.
What did you expect to see?
The syscall.StartProcess
call should execute the file descriptor specified by /proc/self/fd/$n
without the Go stdlib overwriting said file descriptor.
What did you see instead?
run /bin/true: fork/exec /proc/self/fd/9001: permission denied
The execve will attempt to exec a completely incorrect file descriptor, which in the best case will fail, and in the worst case will execute a completely unexpected program (in runc's case, as root).
In the case of the runc PR I mentioned above, this issue is only triggered by a single test because there is an apparent file descriptor leak which causes the file descriptor to be large enough that it gets overwritten -- meaning that the possible security issue (runc
runs as root and has no restrictions in this context) is non-deterministic in our testing.
Analysis
The bug is caused by an assumption in forkAndExecInChild1
that the largest file descriptor relevant to the process is always included in attr.Files
and that thus any larger file descriptors can be used as scratch space.
Unfortunately there isn't a particularly pretty solution to figuring out the largest file descriptor present in a process other that doing a readdir of /proc/self/fd
. I suspect that one of the following solutions would be more workable:
- Open the execve program path as
O_PATH
and make surenextfd
is larger than it (or special-case it likepipe
is today), and then doexecveat(AT_EMPTY_PATH)
to exec the program through a file descriptor (as an aside, the ability to do this as a user would be really nice!). This would plug this particular hole, and I suspect that theexecve
path is the only case where this bug could be hit. - Rather than using this whole
nextfd
logic for file descriptor shuffling (which appears to be used purely becausedup3()
requires a target descriptor and we want the cloned descriptors to beO_CLOEXEC
), useF_DUPFD_CLOEXEC
to avoid having to manage the new file descriptor number.
I think the second option would clean up the existing code the most, but as a user it would be nice to be able to use execveat(2)
with the Go stdlib.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status