Skip to content

syscall: nextfd handling for attr.Files shuffle will clobber files #61751

Open
@cyphar

Description

@cyphar

What version of Go are you using (go version)?

$ go version
go version go1.20.6 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/cyphar/.cache/go-build"
GOENV="/home/cyphar/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/cyphar/.local/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/cyphar/.local"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib64/go/1.20"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib64/go/1.20/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.20.6"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/home/cyphar/.local/src/github.com/opencontainers/runc/go.mod"
GOWORK=""
CGO_CFLAGS="-O2 -g"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-O2 -g"
CGO_FFLAGS="-O2 -g"
CGO_LDFLAGS="-O2 -g"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3122356899=/tmp/go-build -gno-record-gcc-switches"
cyphar@senku

What did you do?

For a bit of background, I am working on a runc patch to move some code we currently do in C to Go. The particular issue I ran into is related to some slightly awful stuff we do in runc in order to protect against certain container attacks (such as ) where we create a copy of the running executable (from /proc/self/exe) and actually execute a copy of the binary (usually a memfd) when running code inside the container.

I am giving this background to preempt questions about "why on earth do you need to do this" when I give you the example program. 😅

https://go.dev/play/p/EN7Bf-OThar
package main

import (
	"fmt"
	"os"
	"os/exec"
	"runtime"
	"syscall"
)

func mustOpen(path string) *os.File {
	f, err := os.Open(path)
	if err != nil {
		panic(err)
	}
	return f
}

func main() {
	// A file we want to execute.
	binTrueOriginal := mustOpen("/bin/true")
	// A random other file we cannot execute.
	devNullOriginal := mustOpen("/dev/null")

	// Change the file descriptors such that devNull is a large descriptor and
	// binTrue is one higher. This will cause binTrue to become nextfd and thus
	// be clobbered by the devNull copy made for the shuffling.
	devNullFd := 9000
	binTrueFd := devNullFd + 1
	if err := syscall.Dup2(int(devNullOriginal.Fd()), devNullFd); err != nil {
		panic(err)
	}
	devNull := os.NewFile(uintptr(devNullFd), "/dev/null (dup'd)")
	if err := syscall.Dup2(int(binTrueOriginal.Fd()), binTrueFd); err != nil {
		panic(err)
	}
	binTrue := os.NewFile(uintptr(binTrueFd), "/dev/null (dup'd)")

	// Try to run binTrue through /proc/self/fd/$n.
	path := fmt.Sprintf("/proc/self/fd/%d", binTrue.Fd())
	cmd := exec.Command(path)
	cmd.ExtraFiles = []*os.File{devNull}
	err := cmd.Run()
	fmt.Printf("run /bin/true: %v\n", err)

	runtime.KeepAlive(binTrue)
}

If you adjust binTrueFd to be any other value, you'll see the program runs without issues.

The workaround for this problem is to pass the intended executable file as an attr.Files, even though we don't use it, but this results in a non-O_CLOEXEC descriptor being passed to the child which I consider a security risk (at least in the context of runc). We have many other protections against leaking file descriptors to containers, so this isn't a problem for us -- but it seems that this is an actual bug in the stdlib that should be fixed.

What did you expect to see?

The syscall.StartProcess call should execute the file descriptor specified by /proc/self/fd/$n without the Go stdlib overwriting said file descriptor.

What did you see instead?

run /bin/true: fork/exec /proc/self/fd/9001: permission denied

The execve will attempt to exec a completely incorrect file descriptor, which in the best case will fail, and in the worst case will execute a completely unexpected program (in runc's case, as root).

In the case of the runc PR I mentioned above, this issue is only triggered by a single test because there is an apparent file descriptor leak which causes the file descriptor to be large enough that it gets overwritten -- meaning that the possible security issue (runc runs as root and has no restrictions in this context) is non-deterministic in our testing.

Analysis

The bug is caused by an assumption in forkAndExecInChild1 that the largest file descriptor relevant to the process is always included in attr.Files and that thus any larger file descriptors can be used as scratch space.

Unfortunately there isn't a particularly pretty solution to figuring out the largest file descriptor present in a process other that doing a readdir of /proc/self/fd. I suspect that one of the following solutions would be more workable:

  • Open the execve program path as O_PATH and make sure nextfd is larger than it (or special-case it like pipe is today), and then do execveat(AT_EMPTY_PATH) to exec the program through a file descriptor (as an aside, the ability to do this as a user would be really nice!). This would plug this particular hole, and I suspect that the execve path is the only case where this bug could be hit.
  • Rather than using this whole nextfd logic for file descriptor shuffling (which appears to be used purely because dup3() requires a target descriptor and we want the cloned descriptors to be O_CLOEXEC), use F_DUPFD_CLOEXEC to avoid having to manage the new file descriptor number.

I think the second option would clean up the existing code the most, but as a user it would be nice to be able to use execveat(2) with the Go stdlib.

Metadata

Metadata

Assignees

Labels

NeedsFixThe path to resolution is known, but the work has not been done.compiler/runtimeIssues related to the Go compiler and/or runtime.

Type

No type

Projects

Status

In Progress

Relationships

None yet

Development

No branches or pull requests

Issue actions