Skip to content

Proposal: remove big old blobs from the git history #2937

Closed
@tbruyelle

Description

@tbruyelle

The CLI repo currently has a huge amount of big blobs, in particular because it contains many binary files. The largest one, nodetime, represents itself 5,5 Gb of data !

This has bad consequences, first, clone or any operations based on history are very slow (clone takes more than 4min), but more importantly, this forces us to disable the sumdb when we request hash (see golang/go#56174).

There's no million way to fix this, on my side, I know only once, it's called BFG (for those who has the reference : :rage1: ) https://rtyley.github.io/bfg-repo-cleaner/

When used with the --strip-blobs-bigger-than flag, it removes all big old blobs larger than the value of the flag. Existing big blogs stay unmodified. For the demo, I ran it with --strip-blobs-bigger-than 10M, and pushed the result to https://github.com/tbruyelle/cli-diet. This new repo takes less ~30s to be cloned.

⚠️ Now the downsides ⚠️

  • history is rewritten, so during the process, everyone should stop interacting with the repo, until the new cleaned repo is pushed with --force. Then everyone should clone it again.
  • since old binaries won't exist anymore, it's no longer possible to build a CLI from an old tag (or more precisely, the binary won't contain the binaries, which could lead to failures with some commands)
  • Since the commits are rewritten, there's new hashes, I need to check what's happening with existing github releases, since they are based on hash.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions