Description
Go version
go version go1.23.0 linux/amd64
Output of go env
in your module/workspace:
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='0'
GOMOD='/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2731747166=/tmp/go-build -gno-record-gcc-switches'
What did you do?
When attempting to build a valid go program, we are getting errors about imports on certain lines of source code that don't match the actual contents of the source file they are reported on. The errors go away and the program builds successfully once the go build cache at /root/.cache/go-build
is deleted and nothing else changes.
The repro is quite complex and takes a long time to trigger, so the best I could do for now was create a docker image with the go toolchain, source code and go build cache in place that reproduces the problem. Including commands for reproducing with that image below.
For more context, we are hitting this in Dagger, which is a container-based DAG execution engine that, among other things, does a lot of containerized building of Go code.
We specifically see this problem arise during integration tests, which will run, over the course of ~20min, many (probably 100+) go build
executions in separate containers. The most relevant details I can think of:
- All containers are using the same go toolchain version (1.23.0 currently) and the same base image
- All containers have a shared bind mount for the go build cache (always mounted at
/root/.cache/go-build
) and the go mod cache (always mounted at/go/pkg/mod
) - Source code is always mounted at
/src
and built with the commandgo build -o /runtime .
from within that/src
directory- A lot of the source code will end up with similar and sometimes identical subpackages under
/src/internal
. They may also have the same go mod name at times.
- A lot of the source code will end up with similar and sometimes identical subpackages under
- Builds can happen in parallel and in serial across the integration test suite
- The integration tests are quite heavy in terms of CPU usage and disk read/write bandwidth, the hosts are often under quite a bit of load
- We don't do any manual fiddling around with the go build cache; we just run commands like
go build
,go mod tidy
, etc. in containers
What did you see happen?
As mentioned above, the best I could do for now was capture the state of one of the containers hitting this error in a docker image. I pushed the image to dockerhub at eriksipsma/corrupt-cache:latest
. It's a linux/amd64
only image unfortunately since that's what our CI is, which is the only place I can get this to happen consistently.
Trigger the go build error:
$ docker run --rm -it eriksipsma/corrupt-cache:latest sh -c '/usr/local/go/bin/go build -C /src .'
go: downloading go.opentelemetry.io/otel v1.27.0
go: downloading go.opentelemetry.io/otel/sdk v1.27.0
go: downloading go.opentelemetry.io/otel/trace v1.27.0
go: downloading github.com/99designs/gqlgen v0.17.49
go: downloading golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa
go: downloading github.com/Khan/genqlient v0.7.0
go: downloading golang.org/x/sync v0.7.0
go: downloading github.com/vektah/gqlparser/v2 v2.5.16
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc v0.0.0-20240518090000-14441aefdf88
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.27.0
go: downloading go.opentelemetry.io/otel/log v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.27.0
go: downloading go.opentelemetry.io/otel/sdk/log v0.3.0
go: downloading go.opentelemetry.io/proto/otlp v1.3.1
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.27.0
go: downloading google.golang.org/grpc v1.64.0
go: downloading github.com/go-logr/logr v1.4.1
go: downloading go.opentelemetry.io/otel/metric v1.27.0
go: downloading golang.org/x/sys v0.21.0
go: downloading google.golang.org/protobuf v1.34.1
go: downloading google.golang.org/genproto/googleapis/rpc v0.0.0-20240515191416-fc5f0ca64291
go: downloading github.com/go-logr/stdr v1.2.2
go: downloading github.com/cenkalti/backoff/v4 v4.3.0
go: downloading github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0
go: downloading github.com/google/uuid v1.6.0
go: downloading github.com/sosodev/duration v1.3.1
go: downloading golang.org/x/net v0.26.0
go: downloading google.golang.org/genproto/googleapis/api v0.0.0-20240520151616-dc85e6b867a5
go: downloading golang.org/x/text v0.16.0
internal/dagger/dagger.gen.go:23:2: package dagger/test/internal/querybuilder is not in std (/usr/local/go/src/dagger/test/internal/querybuilder)
internal/dagger/dagger.gen.go:24:2: package dagger/test/internal/telemetry is not in std (/usr/local/go/src/dagger/test/internal/telemetry)
The errors refer to the source file at /src/internal/dagger/dagger.gen.go
. However, the imports it's erroring on are not the actual imports in the source file:
$ docker run --rm -it eriksipsma/corrupt-cache:latest head -n25 /src/internal/dagger/dagger.gen.go
// Code generated by dagger. DO NOT EDIT.
package dagger
import (
"context"
"encoding/json"
"errors"
"fmt"
"net"
"net/http"
"os"
"reflect"
"strconv"
"strings"
"github.com/Khan/genqlient/graphql"
"github.com/vektah/gqlparser/v2/gqlerror"
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/propagation"
"go.opentelemetry.io/otel/trace"
"dagger/bare/internal/querybuilder"
"dagger/bare/internal/telemetry"
)
- Note that the errors refer to
dagger/test/internal/
but the actual imports in the source code aredagger/bare/internal
- Also worth noting that other containers do build source code with similar package layouts and contents except the import is
dagger/test/internal
. So it seems likego build
here is somehow finding something in the cache from a previous build and incorrectly using it for this one.
The error goes away if you first clear the build cache and then run the same go build
command:
$ docker run --rm -it eriksipsma/corrupt-cache:latest sh -c 'rm -rf /root/.cache/go-build && /usr/local/go/bin/go build -C /src .'
go: downloading go.opentelemetry.io/otel/sdk v1.27.0
go: downloading go.opentelemetry.io/otel/trace v1.27.0
go: downloading go.opentelemetry.io/otel v1.27.0
go: downloading github.com/99designs/gqlgen v0.17.49
go: downloading github.com/Khan/genqlient v0.7.0
go: downloading golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa
go: downloading golang.org/x/sync v0.7.0
go: downloading github.com/vektah/gqlparser/v2 v2.5.16
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc v0.0.0-20240518090000-14441aefdf88
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.27.0
go: downloading go.opentelemetry.io/otel/log v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.27.0
go: downloading go.opentelemetry.io/otel/sdk/log v0.3.0
go: downloading go.opentelemetry.io/proto/otlp v1.3.1
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.27.0
go: downloading google.golang.org/grpc v1.64.0
go: downloading github.com/go-logr/logr v1.4.1
go: downloading go.opentelemetry.io/otel/metric v1.27.0
go: downloading golang.org/x/sys v0.21.0
go: downloading google.golang.org/protobuf v1.34.1
go: downloading google.golang.org/genproto/googleapis/rpc v0.0.0-20240515191416-fc5f0ca64291
go: downloading github.com/google/uuid v1.6.0
go: downloading github.com/sosodev/duration v1.3.1
go: downloading github.com/go-logr/stdr v1.2.2
go: downloading github.com/cenkalti/backoff/v4 v4.3.0
go: downloading github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0
go: downloading golang.org/x/net v0.26.0
go: downloading google.golang.org/genproto/googleapis/api v0.0.0-20240520151616-dc85e6b867a5
go: downloading golang.org/x/text v0.16.0
What did you expect to see?
For go build
to not report errors that don't correspond to the source contents, and for the go build cache to not need to be cleared in order to get rid of the errors.