Skip to content

Tracing GC for konserve #9

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from
Closed

Tracing GC for konserve #9

wants to merge 8 commits into from

Conversation

csm
Copy link

@csm csm commented Nov 18, 2019

Based partially on datacrypt-project/hitchhiker-tree#24. Rewritten to primarily use core.async.

Would need something like replikativ/konserve#29 to iterate keys in the store.

  • src/hitchhiker/tree/konserve.cljc (create-id): new function; prepends
    the current timestamp as hex to the UUID key.
    (KonserveBackend.-write-node): use create-id to generate the storage ID.
  • src/hitchhiker/tree/tracing-gc/konserve.cljc: new namespace.
  • src/hitchhiker/tree/tracing-gc.cljc: new namespace.
  • .gitignore: ignore IntelliJ files.
  • project.clj: update konserve to 0.6.0-SNAPSHOT.

Based partially on datacrypt-project#24
Rewritten to primarily use core.async.

* src/hitchhiker/tree/konserve.cljc (create-id): new function; prepends
  the current timestamp as hex to the UUID key.
  (KonserveBackend.-write-node): use create-id to generate the storage ID.
* src/hitchhiker/tree/tracing-gc/konserve.cljc: new namespace.
* src/hitchhiker/tree/tracing-gc.cljc: new namespace.
* .gitignore: ignore IntelliJ files.
* project.clj: update konserve to 0.6.0-SNAPSHOT.
Copy link
Member

@whilo whilo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your work looks very promising and it is great fun to implement my first GC with you! I think we can get that done soon and plug it into Datahike. Could you also write some tests for the GC code?

Also add protocol for custom stamping mechanisms, just use a current
timestamp for stamping nodes by default.

* src/hitchhiker/tree/bootstrap/konserve.cljc
  (PKonserveStampGenerator): new protocol.
  (KonsorveTimestampGenerator): new record.
  (+default-stamp-generator+): new var.
  (create-id): take a stamp generator, use that to create the timestamp.
  Return a vector of `[stamp guid]`, not a string.
  (KonserveAddr): implement -raw-address method.
* src/hitchhiker/tree/bootstrap/redis.clj (RedisAddr): implement -raw-address
  method.
* src/hitchhiker/tree/tracing_gc/konserve.cljc (compare-stamps): new fn,
  implement for clj and cljs (do the right thing in cljs).
  (within-epoch?): rename to after-epoch?. Treat addresses as vectors
  of timestamp, guid.
* src/hitchhiker/tree/utils/plot.clj: pass nil for stamp-generator arg
  for ->KonsorveBackend.
* src/hitchhiker/tree/node.cljc (IAddress): add -raw-address method.
* src/hitchhiker/tree/tracing-gc.cljc (trace-gc!): use -raw-address to
  get the underlying storage address.
* src/hitchhiker/tree.cljc (IndexNode, DataNode): implement -raw-address
  method.
* test/hitchhiker/tree/node/testing.cljc (TestingAddr): implement -raw-address
  method.
* test/hitchhiker/tree/testing_gc_test.clj: new test.
* test/hitchhiker/konserve_test.cljc: pass nil as stamp-generator to
  ->KonserveBackend.
Rename konserve -> epoch, it better captures what that does now.
Don't make tracing-gc! async.
Update gc test.
@whilo
Copy link
Member

whilo commented Nov 22, 2019

That is already considerably simpler. It would be ideal to collect all hitchhiker-tree marks in a set and then return it. That way we can add additional keys to mark (e.g. for the roots of all Datahike branches or maybe other things in store) and call a sweep function separately with the marks set. Sorry, I should have been more explicit about this. It would also be nice if the collection of marks would work with core.async as well (without <??), but we can fix that later.

@whilo
Copy link
Member

whilo commented Nov 22, 2019

The mark phase is a pure function because the tree roots we are giving to it are constants and the store is immutable then. We just happen to read constants from disk. That way we separate it from the side-effects of removing the stale nodes.

mark just collects addresses into a set (possibly collecting more
addresses into an existing set).

sweep! walks a sequence (not channel) of addresses, and calls delete-fn
on any address that is both accepted by accept-fn and does not appear
in the marked addresses set.

Removes IGCScratch protocol.

epoch just contains a function-builder-function that returns true if
the timestamp in a key is before an epoch.
Don't take any accept-fn in sweep!, just leave it up to the caller
to filter the seq of addresses.

Leave hitchhiker-tree.tracing-gc.epoch for accept-before-epoch fn,
since it's useful for consumers of the GC API.
@whilo
Copy link
Member

whilo commented Sep 19, 2020

@csm Hey. We have now implemented this with the help of metadata support in our key-value store protocol, which does not clutter the value semantics with timestamps. Thanks a lot for pushing the GC and if you have any further needs, feel free to reach out or open more PRs.

@whilo whilo closed this Sep 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants