Skip to content

Conversation

@dzbarsky
Copy link
Contributor

@dzbarsky dzbarsky commented May 27, 2025

What type of PR is this?
Bug fix for Nix

What does this PR do? Why is it needed?
The current implementation depends on mktemp and rm being in the PATH, which they're not on Nix. This alternative construction executes go build directly without run_shell or the coreutils deps.

While here, I set a few extra env vars to disable new Go features we don't want (GOTELEMETRY, GOENV).

If this works, we may be able to remove the Windows codepath.

Which issues(s) does this PR fix?

Fixes #

Other notes for review

@dzbarsky dzbarsky force-pushed the zbarsky/hermetic branch 2 times, most recently from f5ccf9d to ed64111 Compare May 27, 2025 01:20
@dzbarsky dzbarsky changed the title Drop non-hermetic deps in _go_tool_binary_impl WIP: Drop non-hermetic deps in _go_tool_binary_impl May 27, 2025
@dzbarsky dzbarsky force-pushed the zbarsky/hermetic branch from ed64111 to 86f25cd Compare May 27, 2025 01:35
@randomizedcoder
Copy link

Tested manually on NixOS running unstable from a week or so go. This resolved the mktemp not found.

Tested using the examples/hello/ in this repo: https://github.com/calmette54/rules_go/tree/master/examples/hello

bazel_dep(name = "rules_go", version = "0.54.1")

patch = use_extension("@bazel_tools//tools/build_defs/repo:extensions.bzl", "patch")
patch.module(
    name = "rules_go_patch",
    module_name = "rules_go",
    patches = ["//patches:rules_go_binary_hermetic.patch"],
)
use_repo(patch)

and

[das@t:~/Downloads/huricz/rules_go/examples/hello]$ ls -la ./patches/
total 16
drwxr-xr-x 2 das users 4096 May 26 18:00 .
drwxr-xr-x 3 das users 4096 May 26 17:59 ..
-rw-r--r-- 1 das users   48 May 26 18:00 BUILD.bazel
-rw-r--r-- 1 das users 2955 May 26 17:59 f6141a7583a5289da5a5f3d2d072bd7ef4c3a38b.patch
lrwxrwxrwx 1 das users   46 May 26 18:00 rules_go_binary_hermetic.patch -> f6141a7583a5289da5a5f3d2d072bd7ef4c3a38b.patch

[das@t:~/Downloads/huricz/rules_go/examples/hello]$ cat ./patches/BUILD.bazel 
exports_files([rules_go_binary_hermetic.patch])

[das@t:~/Downloads/huricz/rules_go/examples/hello]$

Also retested this on Ubuntu server LTS, and with the patch applied the build continues to work

@dzbarsky dzbarsky force-pushed the zbarsky/hermetic branch 6 times, most recently from ecaaf6a to 5fcc83f Compare May 27, 2025 03:11
@dzbarsky dzbarsky force-pushed the zbarsky/hermetic branch 2 times, most recently from bfddafd to 728403e Compare May 27, 2025 12:25
@dzbarsky
Copy link
Contributor Author

@fmeum ok, so I think we have the following options:

  1. Write to /tmp. This is a little sketchy under some sandboxing impls as you point out.
  2. Write to $TMPDIR. This requires env var expansion at action execution time...
    2a) We do this via ctx.actions.run_shell. This is the easiest tweak but it will bring back the shell dependency to the ruleset (it's the only one) which feels unfortunate.
    2b) We write a wrapper binary that tweaks env and execs the toolchain. The wrapper needs to be either built on the fly (Go? complicates repo rules) or precompiled (distribution problem - maybe cosmo libc single artifact?)
  3. We use ctx.actions.declare_directory like in the Windows impl. The drawback is that the resulting directory runs afoul of the reproducibility tester, though we can filter it out like we do with some other outputs. It's not clear to me why this isn't already a problem on Windows.

With all of these solutions, the key thing to remember is that due to -a flag, we are insulated from anything messing with the cache dir, we only do writes, not reads.

Based on all of the above, I am tempted to do (3). It feels like the best way to ensure the output is insulated and matches what we do on Windows, opening the path to combining the codepaths.

WDYT?

@fmeum
Copy link
Member

fmeum commented May 29, 2025

I'm pretty strongly in favor of 3 - I want to avoid adding precompiled binaries to the mix that aren't part of a Go SDK, but also don't like dependencies on hardcoded directory paths.

@jayconrod @linzhp Any concerns?

@dzbarsky dzbarsky force-pushed the zbarsky/hermetic branch from 728403e to 8f1f4ad Compare May 29, 2025 12:32
@dzbarsky dzbarsky changed the title WIP: Drop non-hermetic deps in _go_tool_binary_impl Drop non-hermetic deps in _go_tool_binary_impl May 29, 2025
@dzbarsky dzbarsky force-pushed the zbarsky/hermetic branch 2 times, most recently from 3358174 to c2d4e08 Compare May 29, 2025 13:01
@jayconrod
Copy link
Collaborator

I don't feel like any of these are great options. Depending on the shell for this one bootstrapping step seems reasonable to me.

If we write to /tmp, it MUST be done securely for the reasons @fmeum described. That means mktemp or equivalent. I don't think we should assume $TMPDIR is not /tmp and is safe to write to, even though in practice it might be.

ctx.actions.declare_directory seems wrong for this purpose: it's not an output and shouldn't be in the action graph.

If we really had to avoid a shell dependency, I think shipping a precompiled binary would be better.

@fmeum
Copy link
Member

fmeum commented May 29, 2025

ctx.actions.declare_directory seems wrong for this purpose: it's not an output and shouldn't be in the action graph.

@dzbarsky Do you know whether the cache directory contents are deterministic (assuming exclusive access by one action) and how many files the cache typically contains?

@dzbarsky
Copy link
Contributor Author

@jayconrod that's fair, I'm not thrilled about any of these options :) Note that on Windows we are already doing the output directory trick because there's no mktemp

@fmeum I don't think it's deterministic, that's why I had to exclude it in reproducibility tests. Perhaps it could be made deterministic with the right flags and we never had to try before, but it's also possible that the Go compiler doesn't care about the contents matching precisely as long as they can be used to emit reproducible artifacts (i.e. I assume they store extra metadata about when an entry was created/read so it can be LRU'ed but it doesn't affect build results). I think for the action here the cache was a few hundred files, I can check more precisely later if you're curious

@fmeum
Copy link
Member

fmeum commented May 29, 2025

Non-determinism in outputs is bad even if these outputs don't end up being used - this will show up as an anomaly when diffing exec logs.

If we keep the dep on a shell, is there a way to make this work for nix by modifying how we look for mktemp?

@dzbarsky
Copy link
Contributor Author

Non-determinism in outputs is bad even if these outputs don't end up being used - this will show up as an anomaly when diffing exec logs.

If we keep the dep on a shell, is there a way to make this work for nix by modifying how we look for mktemp?

Is it enough to put the outputs under TMPDIR or do you think we also need a mktemp under the TMPDIR?

@jayconrod
Copy link
Collaborator

Note that on Windows we are already doing the output directory trick because there's no mktemp

I know. Someone should do something :P

Is it enough to put the outputs under TMPDIR or do you think we also need a mktemp under the TMPDIR?

I think we need mktemp under TMPDIR. I don't know whether Bazel randomizes TMPDIR or just gives you /tmp. I expect at least in some configurations it is not randomized, which means it wouldn't be safe, but I'd be happy to be wrong on that.

@fmeum
Copy link
Member

fmeum commented May 29, 2025

@dzbarsky
Copy link
Contributor Author

How do we feel about picking up a dependency on Aspect's coreutils toolchain? That would give us another path to solving this hermetically.

And perhaps we can file a bug in upstream Go to allow compilation without GOCACHE though it may take some convincing and would be a while before we could rely on that here ..

@randomizedcoder
Copy link

randomizedcoder commented May 30, 2025 via email

@jayconrod
Copy link
Collaborator

How do we feel about picking up a dependency on Aspect's coreutils toolchain? That would give us another path to solving this hermetically.

I'd prefer not to add a dependency for this.

And perhaps we can file a bug in upstream Go to allow compilation without GOCACHE though it may take some convincing and would be a while before we could rely on that here ..

That's unlikely to happen: it would mean reversing a large effort that's been in a positive direction. GOCACHE is now a fundamental part of how to the go tool works. All compiler outputs are written there. Previously, they were written to GOROOT or GOPATH, but that caused a lot of problems. The Windows use of ctx.declare_directory we were talking about earlier was added in #3385 because the standard library is no longer compiled into GOROOT and is now in GOCACHE.

The challenge seems to be we are talking about the bootstrap. For example,
it would be potentially be elegant to allow rules_go to create its own
dependencies, via
https://github.com/u-root/u-root/blob/main/cmds/core/mktemp/mktemp.go, for
instance, however, the challenge is we are talking about sandbox setup. At
the point mktemp is being called, we don't have to go, so we couldn't
supply a go only mktemp even if we wanted to.

Maybe the undeclared depency is within bazel, rather than rules_go?
Rules_go arguably starts after this sandbox setup?

I'm not sure what this means? The problem is whether we can depend on mktemp and rm being available in the execution environment, and if they're not, how we can work around their absence.


Thinking more about precompiled binaries: I don't think this would work. I'd actually love to ship a precompiled binary for the builder tool (replacing this whole rule), but we'd need one for every execution platform, and it'd need to work at every commit, not just release. So we'd still need to fall back to the current go_tool_binary in some cases, and the release might get a lot heavier.

Other thoughts:

  • Does GOCACHE actually need to be in a temporary directory or can it be part of the execroot? We only use this internal rule in a couple places, so it won't collide with anything.
  • Once the builder tool is built, we could make rm one of its subcommand so it can clean up after itself.

The problem with this approach is that if we fail to compile the builder tool for some reason, we have no way to clean up the GOCACHE directory. We could run go clean -cache, but that won't remove the directory itself.

@dzbarsky dzbarsky force-pushed the zbarsky/hermetic branch from c2d4e08 to ed75a7e Compare May 31, 2025 15:21
@dzbarsky
Copy link
Contributor Author

@jayconrod in an earlier iteration of this PR I had tried to point HOME (and thus GOCACHE) at the execroot and it caused constant cache invalidation. I just looked at it again and realized that it was due to it being a relative path which is late-expanded within go build and pollutes unintended locations. We can use a combination of run_shell and pwd to absolutize it early in the action execution and fix both the caching and the reproducibility test. Thanks for prompting me to take another look at it!

I've updated the PR, how do we feel about the approach now?

@dzbarsky dzbarsky force-pushed the zbarsky/hermetic branch 8 times, most recently from f1eb3bc to ea6433d Compare June 16, 2025 16:13
@dzbarsky
Copy link
Contributor Author

@jayconrod @fmeum I've updated with all the feedback, PTAL?

Copy link
Collaborator

@jayconrod jayconrod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. No other comments from me. Thanks for improving this.

@fmeum fmeum merged commit 1172e60 into bazel-contrib:master Jun 17, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants