Skip to content

Implement BadgerDB garbage collection #757

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 23, 2025

Conversation

stealthybox
Copy link
Member

@stealthybox stealthybox commented Apr 30, 2025

Garbage collection of the BadgerDB WAL/value-logs is our responsibility to orchestrate.
This patch implements it in a similar manner to dgraph.

db.RunValueLogGC() will rewrite only a single file.
We don't have to guard against concurrent runs of it, but we do need to run it multiple times until there is nothing left to write.

gc.discardValueLogFiles() does this, and we wrap it in a goroutine that waits gc.Interval between each run.
The use of a timer instead of a ticker here is so that the GC run can never overrun itself even if there are lots of files or the disk is very very very slow.

I've bounded the number of discards to prevent bugs in library-code and infrastructure we don't control from causing an infinite loop.

The Garbage Collector is added to the controller manager by implementing controller-runtime's Runnable interface.
This means that the controller runtime will

  • tell the GC to stop via a context, immediately upon receiving SIGTERM
  • provide a logr.Logger via a context
  • wait until the GC shuts down

Notably, the GC will shut down while other reconcilers and database writes may still be occurring.

Now that I've updated this patch, the channel implementation is much simpler than using mutex(#758)

I've added a test that

  • opens a db
  • creates a GC with a fast interval
  • uses the db
  • ensures the GC can be stopped in a timely manner

This test could be extended to write a benchmark that also exercises the value log discard by running the test for longer and writing millions of tags/repos.
Alternatively can run the GC as a full e2e/scale-test with an image on an actual cluster and reading from OCI registries.

Related #748

}

// upper bound for loop
const maxDiscards = 1000
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this is a reasonable value of discards, but it felt like the controller could sample, delete, and rewrite 1000 value log files relatively quickly in its own goroutine without impacting the controller-runtime too much.

@stealthybox stealthybox force-pushed the gc-channel branch 2 times, most recently from 1f78ec0 to 9ff8d70 Compare May 7, 2025 06:13
@stealthybox stealthybox changed the title Implement BadgerDB garbage collector with channels Implement BadgerGC with controller-runtime May 7, 2025
@stealthybox stealthybox marked this pull request as ready for review May 7, 2025 06:44
"github.com/go-logr/logr/testr"
)

func TestBadgerGarbageCollectorDoesStop(t *testing.T) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example Successful run (2.3 seconds)

go test -run '^TestBadgerGarbageCollectorDoesStop$' github.com/fluxcd/image-reflector-controller/internal/database -v -count 1

=== RUN   TestBadgerGarbageCollectorDoesStop
badger 2025/05/07 00:04:06 INFO: All 0 tables opened in 0s
badger 2025/05/07 00:04:06 INFO: Discard stats nextEmptySlot: 0
badger 2025/05/07 00:04:06 INFO: Set nextTxnTs to 0
    badger_gc.go:61: test-badger-gc: "level"=0 "msg"="Starting Badger GC"
    badger_gc.go:84: test-badger-gc: "level"=0 "msg"="Ran Badger GC" "discarded_vlogs"=0
    badger_gc_test.go:47: wrote tags successfully
    badger_gc.go:84: test-badger-gc: "level"=0 "msg"="Ran Badger GC" "discarded_vlogs"=0
    badger_gc.go:84: test-badger-gc: "level"=0 "msg"="Ran Badger GC" "discarded_vlogs"=0
    badger_gc_test.go:52: waiting for GC stop
    badger_gc.go:70: test-badger-gc: "level"=0 "msg"="Stopped Badger GC"
    badger_gc_test.go:57: GC Stopped
badger 2025/05/07 00:04:08 INFO: Lifetime L0 stalled for: 0s
badger 2025/05/07 00:04:08 INFO: 
Level 0 [ ]: NumTables: 01. Size: 216 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done
--- PASS: TestBadgerGarbageCollectorDoesStop (2.06s)
PASS
ok  	github.com/fluxcd/image-reflector-controller/internal/database	2.355s

Synthetic Failure (sleep for a minute when stopped) (7.3 seconds)

go test -run '^TestBadgerGarbageCollectorDoesStop$' github.com/fluxcd/image-reflector-controller/internal/database -v -count 1

=== RUN   TestBadgerGarbageCollectorDoesStop
badger 2025/05/06 23:46:31 INFO: All 0 tables opened in 0s
badger 2025/05/06 23:46:31 INFO: Discard stats nextEmptySlot: 0
badger 2025/05/06 23:46:31 INFO: Set nextTxnTs to 0
    badger_gc.go:61: test-badger-gc: "level"=0 "msg"="Starting Badger GC"
    badger_gc.go:85: test-badger-gc: "level"=0 "msg"="Ran Badger GC" "discarded_vlogs"=0
    badger_gc_test.go:47: wrote tags successfully
    badger_gc.go:85: test-badger-gc: "level"=0 "msg"="Ran Badger GC" "discarded_vlogs"=0
    badger_gc.go:85: test-badger-gc: "level"=0 "msg"="Ran Badger GC" "discarded_vlogs"=0
    badger_gc.go:85: test-badger-gc: "level"=0 "msg"="Ran Badger GC" "discarded_vlogs"=0
    badger_gc_test.go:52: waiting for GC stop
    badger_gc_test.go:55: GC did not stop
badger 2025/05/06 23:46:38 INFO: Lifetime L0 stalled for: 0s
badger 2025/05/06 23:46:38 INFO: 
Level 0 [ ]: NumTables: 01. Size: 216 B of 0 B. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 64 MiB
Level 1 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 2 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 3 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 4 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 5 [ ]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level 6 [B]: NumTables: 00. Size: 0 B of 10 MiB. Score: 0.00->0.00 StaleData: 0 B Target FileSize: 2.0 MiB
Level Done
--- FAIL: TestBadgerGarbageCollectorDoesStop (7.04s)
FAIL
FAIL	github.com/fluxcd/image-reflector-controller/internal/database	7.332s
FAIL

@stealthybox stealthybox force-pushed the gc-channel branch 2 times, most recently from d507687 to 3b04a83 Compare May 9, 2025 05:34
@stealthybox
Copy link
Member Author

Notably, because we are using the controller manager, the Garbage Collector Runnable will only run when the controller is the Leader:
https://github.com/kubernetes-sigs/controller-runtime/blob/b6c5897febe566008678f765ec5a3a1ea04a123a/pkg/manager/runnable_group.go#L55-L76

This is probably okay behavior, but if we want it to run the GC all the time (ex: when the controller becomes a follower), we can implement LeaderElectionRunnable and return false which will add it to the "Others" runnable group.

main.go Outdated
@@ -94,6 +99,7 @@ func main() {
flag.StringVar(&healthAddr, "health-addr", ":9440", "The address the health endpoint binds to.")
flag.StringVar(&storagePath, "storage-path", "/data", "Where to store the persistent database of image metadata")
flag.Int64Var(&storageValueLogFileSize, "storage-value-log-file-size", 1<<28, "Set the database's memory mapped value log file size in bytes. Effective memory usage is about two times this size.")
flag.Uint16Var(&gcInterval, "gc-interval", 1, "The number of minutes to wait between garbage collections. 0 disables the garbage collector.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wandering if the 1m interval is too aggressive and if it will increase the CPU usage unnecessarily for most users. How about we go with 5m as the default, and we document here the flag (those that have a massive amount of images could set it to 1m).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After running some load-tests and considering how much smaller image tag data is -- I don't think it's very common for this GC code to run with the default BadgerDB table/LSM/vlog settings. The database has to contain a large number of image tags and use more than the default memory limit of 1GB to start flushing to vlog files.

I've updated the default interval to 10 minutes.
My hypothesis right now is end-users who are seeing this in production either have:
a) a very large number of image repos, or
b) a constantly changing set of repos with taglists larger than 1MB each

@stefanprodan
Copy link
Member

@stealthybox can you please rebase with upstream main

@stealthybox
Copy link
Member Author

stealthybox commented May 23, 2025

rebased -- load-testing code/notes in #763

It's a separate issue, but I think we could implement some much more conservative defaults right now for badger.
I've only been running synthetic tests with our controller libraries, but I think it could be pretty easy to OOM kill the controller with a small number of repos over time with the default 1GB memory limit.

With the badger defaults, a synthetic test in a 1GB limited docker container can OOM with just 400 repos & ~17MB of tag data using less than 60 value changes across the repos.

This isn't because of the garbage collector. Most tag values will be under the value of the 1MB value log threshold, and get stored directly next to its repo key in the memtables. With the default table sizes and quantities, this fills up memory quickly beyond 1GB before anything at all is marked to be discarded or flushed to disk. I can double check, but I'm pretty sure it OOM's even if I disable the GC process because it's basically a noop for this scenario.

Copy link
Member

@stefanprodan stefanprodan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks @stealthybox

@stefanprodan
Copy link
Member

@stealthybox please rebase your fork with upstream main, I can't merge as your branch is out-of-date.

@stealthybox
Copy link
Member Author

something is broken on GitHub's side with this PR.
I've pushed 8ec426f to stealthybox/gc-channel
but this PR is still stuck on d01d018

@stealthybox
Copy link
Member Author

editing the PR and re-selecting main as the base branch fixed it:
https://stackoverflow.com/a/76590821

@stealthybox stealthybox merged commit c491302 into fluxcd:main May 23, 2025
6 checks passed
@matheuscscp matheuscscp changed the title Implement BadgerGC with controller-runtime Implement BadgerDB garbage collection May 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants