Skip to content

cmd/go: go build errors that don't correspond to source file until build cache is cleared #69179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
sipsma opened this issue Aug 30, 2024 · 12 comments
Assignees
Labels
GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.

Comments

@sipsma
Copy link

sipsma commented Aug 30, 2024

Go version

go version go1.23.0 linux/amd64

Output of go env in your module/workspace:

GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/root/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/root/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/usr/local/go'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/usr/local/go/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.0'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='0'
GOMOD='/src/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -fno-caret-diagnostics -Qunused-arguments -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build2731747166=/tmp/go-build -gno-record-gcc-switches'

What did you do?

When attempting to build a valid go program, we are getting errors about imports on certain lines of source code that don't match the actual contents of the source file they are reported on. The errors go away and the program builds successfully once the go build cache at /root/.cache/go-build is deleted and nothing else changes.

The repro is quite complex and takes a long time to trigger, so the best I could do for now was create a docker image with the go toolchain, source code and go build cache in place that reproduces the problem. Including commands for reproducing with that image below.

For more context, we are hitting this in Dagger, which is a container-based DAG execution engine that, among other things, does a lot of containerized building of Go code.

We specifically see this problem arise during integration tests, which will run, over the course of ~20min, many (probably 100+) go build executions in separate containers. The most relevant details I can think of:

  1. All containers are using the same go toolchain version (1.23.0 currently) and the same base image
  2. All containers have a shared bind mount for the go build cache (always mounted at /root/.cache/go-build) and the go mod cache (always mounted at /go/pkg/mod)
  3. Source code is always mounted at /src and built with the command go build -o /runtime . from within that /src directory
    • A lot of the source code will end up with similar and sometimes identical subpackages under /src/internal. They may also have the same go mod name at times.
  4. Builds can happen in parallel and in serial across the integration test suite
  5. The integration tests are quite heavy in terms of CPU usage and disk read/write bandwidth, the hosts are often under quite a bit of load
  6. We don't do any manual fiddling around with the go build cache; we just run commands like go build, go mod tidy, etc. in containers

What did you see happen?

As mentioned above, the best I could do for now was capture the state of one of the containers hitting this error in a docker image. I pushed the image to dockerhub at eriksipsma/corrupt-cache:latest. It's a linux/amd64 only image unfortunately since that's what our CI is, which is the only place I can get this to happen consistently.

Trigger the go build error:

$ docker run --rm -it eriksipsma/corrupt-cache:latest sh -c '/usr/local/go/bin/go build -C /src .'
go: downloading go.opentelemetry.io/otel v1.27.0
go: downloading go.opentelemetry.io/otel/sdk v1.27.0
go: downloading go.opentelemetry.io/otel/trace v1.27.0
go: downloading github.com/99designs/gqlgen v0.17.49
go: downloading golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa
go: downloading github.com/Khan/genqlient v0.7.0
go: downloading golang.org/x/sync v0.7.0
go: downloading github.com/vektah/gqlparser/v2 v2.5.16
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc v0.0.0-20240518090000-14441aefdf88
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.27.0
go: downloading go.opentelemetry.io/otel/log v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.27.0
go: downloading go.opentelemetry.io/otel/sdk/log v0.3.0
go: downloading go.opentelemetry.io/proto/otlp v1.3.1
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.27.0
go: downloading google.golang.org/grpc v1.64.0
go: downloading github.com/go-logr/logr v1.4.1
go: downloading go.opentelemetry.io/otel/metric v1.27.0
go: downloading golang.org/x/sys v0.21.0
go: downloading google.golang.org/protobuf v1.34.1
go: downloading google.golang.org/genproto/googleapis/rpc v0.0.0-20240515191416-fc5f0ca64291
go: downloading github.com/go-logr/stdr v1.2.2
go: downloading github.com/cenkalti/backoff/v4 v4.3.0
go: downloading github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0
go: downloading github.com/google/uuid v1.6.0
go: downloading github.com/sosodev/duration v1.3.1
go: downloading golang.org/x/net v0.26.0
go: downloading google.golang.org/genproto/googleapis/api v0.0.0-20240520151616-dc85e6b867a5
go: downloading golang.org/x/text v0.16.0
internal/dagger/dagger.gen.go:23:2: package dagger/test/internal/querybuilder is not in std (/usr/local/go/src/dagger/test/internal/querybuilder)
internal/dagger/dagger.gen.go:24:2: package dagger/test/internal/telemetry is not in std (/usr/local/go/src/dagger/test/internal/telemetry)

The errors refer to the source file at /src/internal/dagger/dagger.gen.go. However, the imports it's erroring on are not the actual imports in the source file:

$ docker run --rm -it eriksipsma/corrupt-cache:latest head -n25 /src/internal/dagger/dagger.gen.go
// Code generated by dagger. DO NOT EDIT.

package dagger

import (
        "context"
        "encoding/json"
        "errors"
        "fmt"
        "net"
        "net/http"
        "os"
        "reflect"
        "strconv"
        "strings"

        "github.com/Khan/genqlient/graphql"
        "github.com/vektah/gqlparser/v2/gqlerror"
        "go.opentelemetry.io/otel"
        "go.opentelemetry.io/otel/propagation"
        "go.opentelemetry.io/otel/trace"

        "dagger/bare/internal/querybuilder"
        "dagger/bare/internal/telemetry"
)
  • Note that the errors refer to dagger/test/internal/ but the actual imports in the source code are dagger/bare/internal
  • Also worth noting that other containers do build source code with similar package layouts and contents except the import is dagger/test/internal. So it seems like go build here is somehow finding something in the cache from a previous build and incorrectly using it for this one.

The error goes away if you first clear the build cache and then run the same go build command:

$ docker run --rm -it eriksipsma/corrupt-cache:latest sh -c 'rm -rf /root/.cache/go-build && /usr/local/go/bin/go build -C /src .'
go: downloading go.opentelemetry.io/otel/sdk v1.27.0
go: downloading go.opentelemetry.io/otel/trace v1.27.0
go: downloading go.opentelemetry.io/otel v1.27.0
go: downloading github.com/99designs/gqlgen v0.17.49
go: downloading github.com/Khan/genqlient v0.7.0
go: downloading golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa
go: downloading golang.org/x/sync v0.7.0
go: downloading github.com/vektah/gqlparser/v2 v2.5.16
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploggrpc v0.0.0-20240518090000-14441aefdf88
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlplog/otlploghttp v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp v1.27.0
go: downloading go.opentelemetry.io/otel/log v0.3.0
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.27.0
go: downloading go.opentelemetry.io/otel/sdk/log v0.3.0
go: downloading go.opentelemetry.io/proto/otlp v1.3.1
go: downloading go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.27.0
go: downloading google.golang.org/grpc v1.64.0
go: downloading github.com/go-logr/logr v1.4.1
go: downloading go.opentelemetry.io/otel/metric v1.27.0
go: downloading golang.org/x/sys v0.21.0
go: downloading google.golang.org/protobuf v1.34.1
go: downloading google.golang.org/genproto/googleapis/rpc v0.0.0-20240515191416-fc5f0ca64291
go: downloading github.com/google/uuid v1.6.0
go: downloading github.com/sosodev/duration v1.3.1
go: downloading github.com/go-logr/stdr v1.2.2
go: downloading github.com/cenkalti/backoff/v4 v4.3.0
go: downloading github.com/grpc-ecosystem/grpc-gateway/v2 v2.20.0
go: downloading golang.org/x/net v0.26.0
go: downloading google.golang.org/genproto/googleapis/api v0.0.0-20240520151616-dc85e6b867a5
go: downloading golang.org/x/text v0.16.0

What did you expect to see?

For go build to not report errors that don't correspond to the source contents, and for the go build cache to not need to be cleared in order to get rid of the errors.

@sipsma sipsma changed the title go build: errors that don't correspond to source file until build cache is cleared cmd/go: go build errors that don't correspond to source file until build cache is cleared Aug 30, 2024
@seankhliao
Copy link
Member

I think somethhing to do with the module index?

Cache verify:

/src # export GODEBUG=gocacheverify=1
/src # /usr/local/go/bin/go build
panic: go: internal cache error: cache verify failed: id=e0ec2602ae240ba6cf7c4e5e75c7c3300aadf0258b376c72d3b612eadbf9e7c4 changed:<<<
	modroot /src
	package go1.23.0 go index v2 /src/internal/dagger
	file dagger.gen.go 2024-08-30 21:58:16.88202819 +0000 UTC 208312
	
	>>>
	old: cf4f84f1aa46a7c9ae121f6d99fa0c53ec025338cc6dcad292a986782e7254c1 815
	new: 704f06df319148867ac8bd9cbcd63e03491a3d078d3083719e85e7e89436f384 815

goroutine 21 [running]:
cmd/go/internal/cache.(*DiskCache).putIndexEntry(0xc0001325d0, {0xe0, 0xec, 0x26, 0x2, 0xae, 0x24, 0xb, 0xa6, 0xcf, ...}, ...)
	cmd/go/internal/cache/cache.go:444 +0x605
cmd/go/internal/cache.(*DiskCache).put(0xc0001325d0, {0xe0, 0xec, 0x26, 0x2, 0xae, 0x24, 0xb, 0xa6, 0xcf, ...}, ...)
	cmd/go/internal/cache/cache.go:524 +0x21a
cmd/go/internal/cache.(*DiskCache).Put(0x1?, {0xe0, 0xec, 0x26, 0x2, 0xae, 0x24, 0xb, 0xa6, 0xcf, ...}, ...)
	cmd/go/internal/cache/cache.go:494 +0x85
cmd/go/internal/cache.PutBytes({0xc19820, 0xc0001325d0}, {0xe0, 0xec, 0x26, 0x2, 0xae, 0x24, 0xb, 0xa6, ...}, ...)
	cmd/go/internal/cache/cache.go:529 +0xc7
cmd/go/internal/modindex.openIndexPackage.func1()
	cmd/go/internal/modindex/read.go:216 +0x1c7
cmd/go/internal/par.(*ErrCache[...]).Do.func1()
	cmd/go/internal/par/work.go:119 +0x13
cmd/go/internal/par.(*Cache[...]).Do(0xc1d7c0, {{0xc000014044, 0x4}, {0xc000000060, 0x14}}, 0xc000411848)
	cmd/go/internal/par/work.go:160 +0x11b
cmd/go/internal/par.(*ErrCache[...]).Do(0xc000146140?, {{0xc000014044, 0x4}, {0xc000000060, 0x14}}, 0x0?)
	cmd/go/internal/par/work.go:118 +0x46
cmd/go/internal/modindex.openIndexPackage({0xc000014044?, 0xc000146140?}, {0xc000000060?, 0xc000146140?})
	cmd/go/internal/modindex/read.go:205 +0x98
cmd/go/internal/modindex.GetPackage({0xc000014044, 0x4}, {0xc000000060, 0x14})
	cmd/go/internal/modindex/read.go:142 +0x14d
cmd/go/internal/modload.dirInModule.func2()
	cmd/go/internal/modload/import.go:723 +0x2f
cmd/go/internal/par.(*ErrCache[...]).Do.func1()
	cmd/go/internal/par/work.go:119 +0x13
cmd/go/internal/par.(*Cache[...]).Do(0xc1d440, {0xc000000060, 0x14}, 0xc000411a28)
	cmd/go/internal/par/work.go:160 +0xfb
cmd/go/internal/par.(*ErrCache[...]).Do(0xc1d3c0?, {0xc000000060?, 0xd?}, 0xc000411b08?)
	cmd/go/internal/par/work.go:118 +0x38
cmd/go/internal/modload.dirInModule({0xc0001d8d05, 0x1b}, {0xc0001d8d05?, 0xb}, {0xc000014044, 0x4}, 0x1)
	cmd/go/internal/modload/import.go:719 +0x271
cmd/go/internal/modload.importFromModules({0xc18e90, 0xfedba0}, {0xc0001d8d05, 0x1b}, 0xc00017c2d0, 0x0, 0x0)
	cmd/go/internal/modload/import.go:430 +0xcd9
cmd/go/internal/modload.(*loader).load(0xc0001a20e0, {0xc18e90, 0xfedba0}, 0xc000210c30)
	cmd/go/internal/modload/load.go:1852 +0xa5
cmd/go/internal/modload.(*loader).pkg.func1.1()
	cmd/go/internal/modload/load.go:1665 +0x25
cmd/go/internal/par.(*Queue).Add.func1()
	cmd/go/internal/par/queue.go:58 +0x5c
created by cmd/go/internal/par.(*Queue).Add in goroutine 19
	cmd/go/internal/par/queue.go:56 +0x1d3

With the module index on (default):

/src # export GODEBUG=goindex=1
/src # /usr/local/go/bin/go build
internal/dagger/dagger.gen.go:23:2: package dagger/test/internal/querybuilder is not in std (/usr/local/go/src/dagger/test/internal/querybuilder)
internal/dagger/dagger.gen.go:24:2: package dagger/test/internal/telemetry is not in std (/usr/local/go/src/dagger/test/internal/telemetry)

With module index off, build succeeds:

/src # export GODEBUG=goindex=0
/src # /usr/local/go/bin/go build

cc @matloob @samthanawalla

@seankhliao seankhliao added NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. GoCommand cmd/go labels Sep 1, 2024
@matloob matloob self-assigned this Sep 3, 2024
@matloob
Copy link
Contributor

matloob commented Sep 3, 2024

I'll look into this

@sipsma
Copy link
Author

sipsma commented Sep 16, 2024

Let me know if there's anything I can do to help debug further!

@sipsma
Copy link
Author

sipsma commented Sep 18, 2024

@matloob @seankhliao Quick question pending the investigation on your end, is there any obvious reason it would be a bad idea to run with GODEBUG=goindex=0 (disable the module index) for now?

Asking because this bug is causing lots of problems in our CI test suite specifically, so I'm considering just disabling the mod index there while things get figured out upstream here. In some preliminary runs I didn't notice any performance impact for us when running with it disabled for the whole test suite (and no occurrences of the bug so far 🤞).

However, that goindex setting is undocumented as far as I can tell so want to double-check with you disabling it isn't going to break go tooling in some other unobvious way or otherwise be a bad idea.

@matloob
Copy link
Contributor

matloob commented Sep 19, 2024

I don't think disabling the module index should cause any correctness problems.

(edit: correctness, not performance)

@erikburt
Copy link

erikburt commented Feb 14, 2025

Something similar is also happening to us. Our scenario:

  • go version go1.23.4 linux/amd64 (env below)
  • We are doing this natively in Github Actions
  • We are restoring the build cache from our default branch
  • export GODEBUG=goindex=0 fixes the problem
  • The change is a single line:
    • updates the import path for a test utility package, removing a local reference and replacing it with one from an external repository (commit) (diff below).
    • reference before
    • reference after
> go test -c
# github.com/smartcontractkit/chainlink/v2/core/capabilities/ccip/common_test [github.com/smartcontractkit/chainlink/v2/core/capabilities/ccip/common.test]
./common_test.go:11:2: "github.com/smartcontractkit/chainlink-integrations/evm/assets" imported and not used
./common_test.go:12:2: could not import github.com/smartcontractkit/chainlink-integrations/evm/testutils (open : no such file or directory)

> GODEBUG="gocacheverify=1" go test -c
panic: go: internal cache error: cache verify failed: id=eed4ffd0459bb2c81a7adfa3b664867773e2847ad408ae7946aa2404d819eb89 changed:<<<
        modroot /home/runner/work/chainlink/chainlink
        package go1.23.4 go index v2 /home/runner/work/chainlink/chainlink/core/capabilities/ccip/common
        file common.go 1970-01-01 00:00:01 +0000 UTC 729
        file common_test.go 1970-01-01 00:00:01 +0000 UTC 1547
        file extradatacodec.go 1970-01-01 00:00:01 +0000 UTC 1841

        >>>
        old: 0da6e2f6a8a63d6fb5d57742f3e767cff87b74eb86add0e7785368fc1dee5f02 1729
        new: 48201bba5330550de3351369f970310703d192f90d00475ef4d7ce88a8b03d70 1729

I too have created a docker image which shows the reproduction. I can share a link if needed.


Commit Diff
diff --git a/core/capabilities/ccip/common/common_test.go b/core/capabilities/ccip/common/common_test.go
index 9b974ec60e..f1695d4187 100644
--- a/core/capabilities/ccip/common/common_test.go
+++ b/core/capabilities/ccip/common/common_test.go
@@ -9,9 +9,9 @@ import (
        "github.com/stretchr/testify/require"
 
        "github.com/smartcontractkit/chainlink-integrations/evm/assets"
+       "github.com/smartcontractkit/chainlink-integrations/evm/testutils"
        capcommon "github.com/smartcontractkit/chainlink/v2/core/capabilities/ccip/common"
        kcr "github.com/smartcontractkit/chainlink/v2/core/gethwrappers/keystone/generated/capabilities_registry_1_1_0"
-       "github.com/smartcontractkit/chainlink/v2/core/internal/testutils"
 )
 
 func Test_HashedCapabilityId(t *testing.T) {
go env
GO111MODULE=''
GOARCH='amd64'
GOBIN=''
GOCACHE='/home/runner/.cache/go-build'
GOENV='/root/.config/go/env'
GOEXE=''
GOEXPERIMENT=''
GOFLAGS=''
GOHOSTARCH='amd64'
GOHOSTOS='linux'
GOINSECURE=''
GOMODCACHE='/home/runner/go/pkg/mod'
GONOPROXY=''
GONOSUMDB=''
GOOS='linux'
GOPATH='/root/go'
GOPRIVATE=''
GOPROXY='https://proxy.golang.org,direct'
GOROOT='/opt/hostedtoolcache/go/1.23.4/x64'
GOSUMDB='sum.golang.org'
GOTMPDIR=''
GOTOOLCHAIN='auto'
GOTOOLDIR='/opt/hostedtoolcache/go/1.23.4/x64/pkg/tool/linux_amd64'
GOVCS=''
GOVERSION='go1.23.4'
GODEBUG=''
GOTELEMETRY='local'
GOTELEMETRYDIR='/root/.config/go/telemetry'
GCCGO='gccgo'
GOAMD64='v1'
AR='ar'
CC='gcc'
CXX='g++'
CGO_ENABLED='1'
GOMOD='/home/runner/work/chainlink/chainlink/go.mod'
GOWORK=''
CGO_CFLAGS='-O2 -g'
CGO_CPPFLAGS=''
CGO_CXXFLAGS='-O2 -g'
CGO_FFLAGS='-O2 -g'
CGO_LDFLAGS='-O2 -g'
PKG_CONFIG='pkg-config'
GOGCCFLAGS='-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -ffile-prefix-map=/tmp/go-build3059515582=/tmp/go-build -gno-record-gcc-switches'

@agis
Copy link

agis commented Apr 7, 2025

We also started facing this, but unfortunately I don't have a reproduction script. Just in case it helps, this is what we get:

panic: go: internal cache error: cache verify failed: id=f8f7dd4f43e561b17ed917ce1e23ee5bb4924bbba2f16f36be2d3130b4ca1401 changed:<<<
        modroot /Users/xxx/go/src/github.com/xxx/xxx/xxx
        package go1.23.7 go index v2 /Users/xxx/go/src/github.com/xxx/xxx/xxx/xxx/xxx
        file xxx.go 2025-02-14 14:49:51.648907954 +0200 EET 2367
        
        >>>
        old: f6f4d92fdf563fe4d884df09fd32a411174f015e84a0ff41050e74e6f7b962aa 485
        new: cd0500369b89095734b03629a1968b922a470e25b904e5e36d67ec943d2959ef 161

goroutine 10 [running]:
cmd/go/internal/cache.(*DiskCache).putIndexEntry(0x140002a5680, {0xf8, 0xf7, 0xdd, 0x4f, 0x43, 0xe5, 0x61, 0xb1, 0x7e, ...}, ...)
        cmd/go/internal/cache/cache.go:444 +0x474
cmd/go/internal/cache.(*DiskCache).put(0x140002a5680, {0xf8, 0xf7, 0xdd, 0x4f, 0x43, 0xe5, 0x61, 0xb1, 0x7e, ...}, ...)
        cmd/go/internal/cache/cache.go:524 +0x1bc
cmd/go/internal/cache.(*DiskCache).Put(0x1?, {0xf8, 0xf7, 0xdd, 0x4f, 0x43, 0xe5, 0x61, 0xb1, 0x7e, ...}, ...)
        cmd/go/internal/cache/cache.go:494 +0x70
cmd/go/internal/cache.PutBytes({0x10547ad40, 0x140002a5680}, {0xf8, 0xf7, 0xdd, 0x4f, 0x43, 0xe5, 0x61, 0xb1, ...}, ...)
        cmd/go/internal/cache/cache.go:529 +0xa4
cmd/go/internal/modindex.openIndexPackage.func1()
        cmd/go/internal/modindex/read.go:216 +0x164
cmd/go/internal/par.(*ErrCache[...]).Do.func1()
        cmd/go/internal/par/work.go:119 +0x24
cmd/go/internal/par.(*Cache[...]).Do(0x10547ec40, {{0x140000280f4, 0x43}, {0x14000029ae0, 0x4f}}, 0x14000a51830)
        cmd/go/internal/par/work.go:160 +0x130
cmd/go/internal/par.(*ErrCache[...]).Do(0x43?, {{0x140000280f4, 0x43}, {0x14000029ae0, 0x4f}}, 0x14000a518d8?)
        cmd/go/internal/par/work.go:118 +0x4c
cmd/go/internal/modindex.openIndexPackage({0x140000280f4?, 0x140001fc100?}, {0x14000029ae0?, 0x140001fc100?})
        cmd/go/internal/modindex/read.go:205 +0x84
cmd/go/internal/modindex.GetPackage({0x140000280f4, 0x43}, {0x14000029ae0, 0x4f})
        cmd/go/internal/modindex/read.go:142 +0x158
cmd/go/internal/modload.dirInModule.func2()
        cmd/go/internal/modload/import.go:723 +0x34
cmd/go/internal/par.(*ErrCache[...]).Do.func1()
        cmd/go/internal/par/work.go:119 +0x24
cmd/go/internal/par.(*Cache[...]).Do(0x10547e8c0, {0x14000029ae0, 0x4f}, 0x14000a51a30)
        cmd/go/internal/par/work.go:160 +0x128
cmd/go/internal/par.(*ErrCache[...]).Do(0x10547e840?, {0x14000029ae0?, 0x48?}, 0x14000a51b18?)
        cmd/go/internal/par/work.go:118 +0x40
cmd/go/internal/modload.dirInModule({0x14000da20df, 0x11}, {0x14000da20df?, 0x5}, {0x140000280f4, 0x43}, 0x1)
        cmd/go/internal/modload/import.go:719 +0x1cc
cmd/go/internal/modload.importFromModules({0x10547a450, 0x10583f080}, {0x14000da20df, 0x11}, 0x14000204190, 0x0, 0x0)
        cmd/go/internal/modload/import.go:430 +0xb04
cmd/go/internal/modload.(*loader).load(0x140002220e0, {0x10547a450, 0x10583f080}, 0x14000e1e410)
        cmd/go/internal/modload/load.go:1852 +0x94
cmd/go/internal/modload.(*loader).pkg.func1.1()
        cmd/go/internal/modload/load.go:1665 +0x2c
cmd/go/internal/par.(*Queue).Add.func1()
        cmd/go/internal/par/queue.go:58 +0x60
created by cmd/go/internal/par.(*Queue).Add in goroutine 161
        cmd/go/internal/par/queue.go:56 +0x198

@matloob
Copy link
Contributor

matloob commented Apr 7, 2025

@erikburt I think I understand why this is happening in your case. We use the modification time on each of the files as part of the cache key for the module index file. If a file in your package changes, but the file has the same size and modification time we try to use the same cache entry.

Thanks for providing the reference file! It's clear why the size of the file isn't changing since the import is moving in the file, but the value for the time stands out.

I was going to say that we should reject time.Unix(0,0) but it seems like the value you're getting is time.Unix(1,0). It would be interesting to know where that value is coming from. But that should be pretty simple to fix once we know- we'll reject using the index in those cases.

@agis I'm less clear about what's happening in your case. Though one thing stands out to me: the size of the 'new' index file is significantly smaller than the size of the 'old' file. Do you remember what change you made to the package before you saw this? Could you tell us which operating system you were using and which filesystem the files you're building are stored on?

@erikburt
Copy link

erikburt commented Apr 9, 2025

@matloob - thank you for the response.

We are actively setting the modtime for our files to 1970-01-01 00:00:01 +0000. This ensures our caches are still usable after we clone our repository in an ephemeral CI environment.

I guess it's technically not valid given the above case, but it doesn't seem like there's any other workaround.

@sipsma
Copy link
Author

sipsma commented Apr 10, 2025

We use the modification time on each of the files as part of the cache key for the module index file. If a file in your package changes, but the file has the same size and modification time we try to use the same cache entry.

@matloob that's extremely useful information, I actually wouldn't be entirely surprised if that's what caused this problem for us too despite the fact that we aren't resetting any timestamps to constant values.

The reason being that we have actually hit problems in the past caused by two different go source files ending up with the same length and nanosecond mod timestamp despite having different contents.

  • Specifically, we hit bugs in dagger's features around filesyncing with go source files, which has a similar heuristic of modtime+size for "file identity".
  • There's an issue here with details, but gets out of scope of go specific problems

Surprisingly, it turns out this is actually not an absurdly obscure situation to run into in the real world, mainly due to two factors

  1. "nanosecond" timestamps on files in Linux are not really nanosecond precision. The kernel uses less precise timers to set filestamps that are cached for up to 10ms (depending on various hardware details)
    • Specifically, they are set using ktime_get_coarse_real_ts64 (doc'd here)
  2. Different go command invocations that share and underlying cache (e.g. containerized build systems that use a common bind mount for go cache) can easily get closely synced up when they are all waiting on the same operation and waiting on the same flocks
    • This makes it relatively easy to end up with two processes executing concurrently, modifying files within the same time bucket such that they end up creating separate files that have the same length+modtime but w/ different contents.

I don't know if there's any possible better heuristic for the go index to switch to than "size+modtime" that's also feasible. Content-hashing is an obvious option but comes w/ plenty of performance tradeoffs of course.

Personally, I'd be happy if there was just an "officially enshrined" way to disable the go index. We haven't had any problems since disabling it but it feels like somewhat shaky ground since it's essentially an undocumented feature flag (at least last I checked).

@matloob
Copy link
Contributor

matloob commented Apr 11, 2025

@sipsma Thanks for the information. A couple of questions

  • "nanosecond" timestamps on files in Linux are not really nanosecond precision. The kernel uses less precise timers to set filestamps that are cached for up to 10ms (depending on various hardware details)

    • Specifically, they are set using ktime_get_coarse_real_ts64 (doc'd here)

We wait a little bit before allowing a file to allowing a file to be cached https://cs.opensource.google/go/go/+/master:src/cmd/go/internal/modindex/read.go;drc=665af869920432879629c1d64cf59f129942dcd6;l=100 . My understanding is that after the wait the files should have settled, so that if there were two writes done so close to each other that the get the same mtime, only the contents of the actual later write that won should be cached. I wonder if there's something incorrect in that logic?

  • Different go command invocations that share and underlying cache (e.g. containerized build systems that use a common bind mount for go cache) can easily get closely synced up when they are all waiting on the same operation and waiting on the same flocks

    • This makes it relatively easy to end up with two processes executing concurrently, modifying files within the same time bucket such that they end up creating separate files that have the same length+modtime but w/ different contents.

In this example the processes are modifying source files, right? Are these things like automated go generate runs?

Personally, I'd be happy if there was just an "officially enshrined" way to disable the go index. We haven't had any problems since disabling it but it feels like somewhat shaky ground since it's essentially an undocumented feature flag (at least last I checked).

If we can't solve this we may want to document the GODEBUG but I'd feel better if we could find a way to better detect situations where we should automatically skip indexing a package.

graham-chainlink added a commit to smartcontractkit/chainlink that referenced this issue May 13, 2025
Due to an issue documented [here](golang/go#69179), this occurs when adding a new import for chainlink-deployments-framework/deployments, i need to set the goindex=0 for godebug. There was a line setting it already but it didn not apply correctly.
graham-chainlink added a commit to smartcontractkit/chainlink that referenced this issue May 13, 2025
Due to an issue documented [here](golang/go#69179), this occurs when adding a new import for chainlink-deployments-framework/deployments, i need to set the goindex=0 for godebug. There was a line setting it already but it didn not apply correctly.
graham-chainlink added a commit to smartcontractkit/chainlink that referenced this issue May 13, 2025
Due to an issue documented [here](golang/go#69179), this occurs when adding a new import for chainlink-deployments-framework/deployments, i need to set the goindex=0 for godebug. There was a line setting it already but it didn not apply correctly.
graham-chainlink added a commit to smartcontractkit/chainlink that referenced this issue May 13, 2025
Due to an issue documented [here](golang/go#69179), this occurs when adding a new import for chainlink-deployments-framework/deployments, i need to set the goindex=0 for godebug. There was a line setting it already but it didn not apply correctly.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GoCommand cmd/go NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests

6 participants