Add protocol for iterating keys in a store #29

csm · 2019-11-15T21:58:43Z

Introduces a new protocol PKeyIterable, which defines the method -keys that returns a channel yielding all toplevel keys in the store, in sorted order. Adds implementations of this for memory and filestore.

This adds a function konserve.core/keys, which might not be the best name since it overrides clojure.core/keys. Alternate name suggestions are welcomed.

src/konserve/core.cljc (keys): new function.
src/konserve/filestore.clj (list-keys): do I/O in a thread.
(FileSystemStore): add PKeyIterable protocol.
src/konserve/memory.cljc (MemoryStore): add PKeyIterable protocol.
(new-mem-store): force state to be a sorted-map.
src/konserve/protocols.cljc: Add PKeyIterable protocol.

Introduces a new protocol `PKeyIterable`, which defines the method `-keys` that returns a channel yielding all toplevel keys in the store, in sorted order. Adds implementations of this for memory and filestore. This adds a function `konserve.core/keys`, which might not be the best name since it overrides `clojure.core/keys`. Alternate name suggestions are welcomed. * src/konserve/core.cljc (keys): new function. * src/konserve/filestore.clj (list-keys): do I/O in a thread. (FileSystemStore): add PKeyIterable protocol. * src/konserve/memory.cljc (MemoryStore): add PKeyIterable protocol. (new-mem-store): force state to be a sorted-map. * src/konserve/protocols.cljc: Add PKeyIterable protocol.

csm · 2019-11-15T22:19:20Z

I was trying to think of ways to do node GC in hitchhiker-tree, and at least some way to iterate keys in the store is a starting point. There might be other approaches, and the keys output might not need to be sorted.

whilo

Nice work. I think using the Clojure name for keys is fine because we use the same names as for hash-maps intentionally. Usually I require konserve under the alias k which makes its usage explicit.

src/konserve/core.cljc

* src/konserve/filestore.clj: remove 1-arg -keys. * src/konserve/memory.clj: remove 1-arg -keys; make keys iteration exclusive of start-key. * src/konserve/protocols.clj: remove 1-arg -keys; update docstring to mention that key iteration is exclusive of start-key.

csm · 2019-11-17T18:55:59Z

src/konserve/key_compare.clj

+(ns konserve.key-compare
+  "Comparator for arbitrary types.")
+
+(defn key-compare


I'm unsure if this is the right approach. I can't use clojure.core/compare because that won't work with heterogeneous types. This will sort (e.g.) keywords before symbols, and symbols before strings, for example. This could be refined so that all named values sort together (e.g. :bar < foo < "quux", and say :bar < bar < "bar"). But this might suffice.

Possibly just compare-by-edn is best: (compare (pr-str k1) (pr-str k2))?

Also, it may be fine to leave it up to the implementation to define sort order for heterogeneous types, as long as keys of the same type are in natural order.

Yes, I think so. The hitchhiker-tree implements a comparison protocol for edn https://github.com/replikativ/hitchhiker-tree/blob/master/src/hitchhiker/tree/key_compare.cljc that we build upon in Datahike. Why do you need the keys sorted?

Btw. there was work on a tracing GC for the hitchhiker-tree already, that I thought about building on https://github.com/replikativ/hitchhiker-tree/tree/tracing-gc.

My idea was to change the ID format for hh-tree nodes in konserve to something like <hex-sequential-id>.<node-guid>, where there is some kind of sequential ID added to each node address, and the GC process scans the storage in order, removing unreferenced addresses, until the sequential ID is >= the current sequential ID when the GC started. That way there's no need to worry about new addresses being added during the GC, because the sequential ID part will be larger. Having keys sorted would just help in stopping the scan as soon as a later identifier is encountered -- it's not strictly necessary, though. The same process would work even if keys aren't sorted.

The sequential ID could just be a value in the :db key that is incremented on each flush, or a current timestamp.

Yes, that makes sense. Flushing then needs an additional argument in the hitchhiker-tree and we can just use any monotone lattice that provides a happened-before relation, e.g. a counter. That way we could GC even after merges of databases. Keeping a consistent active set can be tricky in a distributed system though, because we will ship the root nodes to reading client replicas, so maybe we want to use physical time on the transactor so that we can GC only the values that are older than some time window needed for clients to replicate index fragments.

I think it would be better to do the sorting of keys just in memory after having loaded them and leave this implementation detail out of the keys protocol for now. Even in a large DB with millions of tree nodes (i.e. billions of datoms) the keys only have a size of maybe a few dozen megabytes in memory.

I was leaning towards using a timestamp, since that's monotonic without any coordination -- even though keeping accurate time in a distributed system is difficult, we only need it to be approximate, just enough so the GC doesn't remove newly added nodes.

I hear you about the sorting requirements in the protocol -- I'll remove them; it's not worth that effort for that kind of optimization. The GC doesn't even need to sort them anyway to work properly.

Yes, I just keep an eye on forking and joining databases and which operations would make that hard. But timestamp with a conservatively large window should be fine.

* src/konserve/core.clj (keys): add note that order of types is implementation-dependent, but keys of the same type are in natural order. * src/konserve/key_compare.cljc: moved to cljc from clj. * src/konserve/prococols.cljc (-keys): add note about how types should be ordered.

Add some basic tests for keys call.

Fix -keys implementation in filestore.

whilo

Very nice work. Thank you!

noonhomie added 2 commits November 15, 2019 13:49

Merge remote-tracking branch 'upstream/master' into key-iterable

bb20954

noonhomie added 2 commits November 16, 2019 20:54

Fix init memory store.

e834708

Use custom comparator.

df4ca98

whilo requested changes Nov 17, 2019

View reviewed changes

src/konserve/core.cljc Show resolved Hide resolved

csm commented Nov 17, 2019

View reviewed changes

noonhomie added 6 commits November 17, 2019 11:09

Fix konserve.core/keys.

8fb3c36

Remove sorting requirement.

b6575eb

Add some basic tests for keys call.

Fix warnings about keys.

f9ab3be

Fix -keys implementation in filestore.

Remove outdated comment.

a8602c2

Revert spacing change.

d2fe508

csm mentioned this pull request Nov 18, 2019

Tracing GC for konserve replikativ/hitchhiker-tree#9

Closed

whilo approved these changes Nov 19, 2019

View reviewed changes

whilo merged commit 4750a64 into replikativ:master Nov 19, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add protocol for iterating keys in a store #29

Add protocol for iterating keys in a store #29

Uh oh!

csm commented Nov 15, 2019

Uh oh!

csm commented Nov 15, 2019

Uh oh!

whilo left a comment

Uh oh!

Uh oh!

csm Nov 17, 2019

Uh oh!

csm Nov 17, 2019

Uh oh!

whilo Nov 17, 2019

Uh oh!

csm Nov 18, 2019

Uh oh!

whilo Nov 18, 2019

Uh oh!

whilo Nov 18, 2019

Uh oh!

csm Nov 18, 2019

Uh oh!

whilo Nov 19, 2019

Uh oh!

whilo left a comment

Uh oh!

Uh oh!

Add protocol for iterating keys in a store #29

Add protocol for iterating keys in a store #29

Uh oh!

Conversation

csm commented Nov 15, 2019

Uh oh!

csm commented Nov 15, 2019

Uh oh!

whilo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

whilo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!