-
Notifications
You must be signed in to change notification settings - Fork 19
Tracing GC for konserve #9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Based partially on datacrypt-project#24 Rewritten to primarily use core.async. * src/hitchhiker/tree/konserve.cljc (create-id): new function; prepends the current timestamp as hex to the UUID key. (KonserveBackend.-write-node): use create-id to generate the storage ID. * src/hitchhiker/tree/tracing-gc/konserve.cljc: new namespace. * src/hitchhiker/tree/tracing-gc.cljc: new namespace. * .gitignore: ignore IntelliJ files. * project.clj: update konserve to 0.6.0-SNAPSHOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your work looks very promising and it is great fun to implement my first GC with you! I think we can get that done soon and plug it into Datahike. Could you also write some tests for the GC code?
Also add protocol for custom stamping mechanisms, just use a current timestamp for stamping nodes by default. * src/hitchhiker/tree/bootstrap/konserve.cljc (PKonserveStampGenerator): new protocol. (KonsorveTimestampGenerator): new record. (+default-stamp-generator+): new var. (create-id): take a stamp generator, use that to create the timestamp. Return a vector of `[stamp guid]`, not a string. (KonserveAddr): implement -raw-address method. * src/hitchhiker/tree/bootstrap/redis.clj (RedisAddr): implement -raw-address method. * src/hitchhiker/tree/tracing_gc/konserve.cljc (compare-stamps): new fn, implement for clj and cljs (do the right thing in cljs). (within-epoch?): rename to after-epoch?. Treat addresses as vectors of timestamp, guid. * src/hitchhiker/tree/utils/plot.clj: pass nil for stamp-generator arg for ->KonsorveBackend. * src/hitchhiker/tree/node.cljc (IAddress): add -raw-address method. * src/hitchhiker/tree/tracing-gc.cljc (trace-gc!): use -raw-address to get the underlying storage address. * src/hitchhiker/tree.cljc (IndexNode, DataNode): implement -raw-address method. * test/hitchhiker/tree/node/testing.cljc (TestingAddr): implement -raw-address method. * test/hitchhiker/tree/testing_gc_test.clj: new test. * test/hitchhiker/konserve_test.cljc: pass nil as stamp-generator to ->KonserveBackend.
Rename konserve -> epoch, it better captures what that does now. Don't make tracing-gc! async. Update gc test.
That is already considerably simpler. It would be ideal to collect all hitchhiker-tree marks in a set and then return it. That way we can add additional keys to mark (e.g. for the roots of all Datahike branches or maybe other things in store) and call a |
The mark phase is a pure function because the tree roots we are giving to it are constants and the store is immutable then. We just happen to read constants from disk. That way we separate it from the side-effects of removing the stale nodes. |
mark just collects addresses into a set (possibly collecting more addresses into an existing set). sweep! walks a sequence (not channel) of addresses, and calls delete-fn on any address that is both accepted by accept-fn and does not appear in the marked addresses set. Removes IGCScratch protocol. epoch just contains a function-builder-function that returns true if the timestamp in a key is before an epoch.
Don't take any accept-fn in sweep!, just leave it up to the caller to filter the seq of addresses. Leave hitchhiker-tree.tracing-gc.epoch for accept-before-epoch fn, since it's useful for consumers of the GC API.
@csm Hey. We have now implemented this with the help of metadata support in our key-value store protocol, which does not clutter the value semantics with timestamps. Thanks a lot for pushing the GC and if you have any further needs, feel free to reach out or open more PRs. |
Based partially on datacrypt-project/hitchhiker-tree#24. Rewritten to primarily use core.async.
Would need something like replikativ/konserve#29 to iterate keys in the store.
the current timestamp as hex to the UUID key.
(KonserveBackend.-write-node): use create-id to generate the storage ID.