-
Notifications
You must be signed in to change notification settings - Fork 32
Core protocol v3.0 - stores and storage protocol #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Core protocol v3.0 - stores and storage protocol #30
Conversation
This isn't really ready for review yet, but comments and questions are welcome if anyone is looking. I was originally going to stick close to the v2 protocol, but in trying to write this I've had a deeper think about inefficiencies in the v2 protocol when layered on top of storage with high latency (e.g., cloud object storage). There are in fact some horrible inefficiencies with v2, which can be worked around with the consolidated metadata extension, but which in fact do not need to be quite so bad if the protocol is redesigned more with the limitations of cloud storage in mind. So this PR currently has something quite different from the v2 protocol, but which should be much more efficient to cloud storage and distributed use in general. I'll try to explain more in coming days the thinking behind the current proposed solution in this PR. |
OK, here is an attempt to illustrate the problems with the current v2.0 protocol when used with high latency stores like cloud object stores. Basically, there are certain common tasks like listing the children of a group, or browsing a hierarchy, or creating groups or arrays, which end up requiring loads of store operations, which works fine on a local file system where each store operation has low latency, but works badly on high latency stores because each operation requires network communication. I believe the proposal for v3.0 currently in this PR solves these problems, but need a bit more time to work that through and unpack why and how. Will try to follow up with more on that next week. |
Btw just to add, I think consolidated metadata is an important feature and should end up being a protocol extension. However, the core protocol could be redesigned so that at least certain basic tasks like exploring hierarchies work much better on cloud storage without always requiring consolidated metadata. |
8c14a6a
to
420f438
Compare
Just to say I've pushed a different solution, still aiming to minimise store operations but maybe a little more intuitive. |
I see that Is the reasoning behind moving the metadata and data out into My understanding is that listing on S3 like stores being faster based on a prefix (such as I guess the alternative is to have this information (such as what groups there are in some central place (such in the root metadata) or tree-ing so parent knows their children but not their parents of grandchildren. I need to speed sometime writing my thought up on this more coherently but I'd be keen to hear your thinking. |
Hi @tam203, a few quick responses, I'll follow up with more detail next week...
Yes, listing all keys will not work well with 1000s of keys, to be avoided if possible. Object stores do support list with a prefix, and also list with prefix and delimiter (analogous to "directory listing"), I think we could design the protocol so those can be used where available.
Yes, list with "meta/" as prefix gets you all the metadata keys, which is a way to get the whole hierarchy if you need it.
I'd be interested in any performance experience of using list with a prefix or list with a prefix and delimiter on buckets with lots of keys.
I think that we can balance these needs between the core protocol, which provides some basic functionality and is robust to lots of things happening in parallel (including adding new groups and arrays), and protocol extensions like consolidated metadata and your proposal for groups to list their children, which provide ways of "snapshotting" either the whole hierarchy or parts of the hierarchy. But will be good to unpack all that and work it through carefully. |
Here's an attempt to explain why it's worth reconsidering some aspects Creating hierarchies; implicit groups Consider a user that is creating a hierarchy. They are creating groups The current zarr protocol (v2) says that if creating a node at some There's a couple of potential problems with this. First, when creating However, what if the arrays "/foo/bar/baz" and "/foo/bar/qux" are Instead of this, I've been thinking about an approach that (1) The protocol change would be to say that, if a node is being create at E.g., if a user creates an array at path "/foo/bar/baz", they do Hierarchy node discovery Say a user is given enough information to open a zarr hierarchy, but is Let's assume the user explores from the top down. I.e., they ask, what With the current v2 protocol, there are two problems, and which one is If the store supports a directory-style listing operation If the store does not support directory-style listing operation, There is also a third problem, which is that some stores might not There are several possible solutions for this, and I don't have a Conclusions It would be good if v3.0 could come up with an approach that resolves My proposal for (1) is to allow implicit groups, as described above. I don't have a strong opinion about a solution to (2) yet. If we say Just thinking out loud, interested in thoughts if anyone has managed |
Just to mention I've reverted the draft in this PR to something simpler and a little closer to current zarr v2 and n5. For now I've still left in the prefixes "meta" and "data" for metadata and chunks respectively, I think that's still worth considering. But still need to figure out how the protocol supports discovery of the hierarchy structure. |
@alimanfoo supper write up of the problem... I'm going to go away and think some more about it. I'd be really keen to not required For what it's worth I think splitting metadata and data under different prefixes a good idea to potentially help with listing if required. |
Just to have a straw man, I've pushed a possible solution here, which addresses the issues described above. Not precious about this at all, just wanted to illustrate one possible approach. Here's the essence:
Very happy to discuss or consider alternatives. |
| Parameters: `key` | ||
| Output: `value` | ||
|
||
``set`` - Store a (`key`, `value`) pair. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps set
and delete
should be required for writeable stores? I.e., you could have read-only stores.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, definitely - the contract should specify that any store opened in read mode will only touch get
I think this is ready for wider discussion, @zarr-developers/core-devs any comments welcome. |
@tam203 very interested in your thoughts here. |
docs/protocol/core/v3.0.rst
Outdated
| Parameters: none | ||
| Output: set of `keys` | ||
|
||
``listpre`` - Retrieve all keys with a given prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think any of my filesystems map layers have this, but clearly they can and should
docs/protocol/core/v3.0.rst
Outdated
|
||
(This subsection is not normative.) | ||
|
||
TODO describe possible implementations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A python dict? Probably the simplest, except the methods are not named as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that would be a good example. I was thinking to use this subsection just to describe briefly a small variety of different possible implementations, to reinforce that this protocol is not just about file system storage.
@alimanfoo Sorry for being slow. So I think this is a good idea, I certainly prefer it to |
@alimanfoo I was considering a different approach that my guess people would feel is maybe outside the core of zarr but I would be keen to discuss. I've not had the time to properly work it through but I thought I'd share. What I was thinking is somewhat a version of the metadata consolidation. So I would allow implicit groups but also include in the root some item of metadata that points to all the child groups. The twist I was thinking was that this could be a mapping rather than a list. So root attrs: meta/group.json
In the version above there are two "sets" (not the best word just made up for now). A set would be a collection of "semantic paths" mapped to "real paths". Thinking about it maybe it would actually be better to only have one "set" for one zarr. Let's call that a "mapping": root attrs my_zarr_v1: meta/group.json
root attrs my_zarr_v2: meta/group.json
Above I've created two zarrs but that share some underlying data. It's probably clear the other motivation for this approach it allows live additions to zarrs in a "safe-ish" way. Additions can be made to the zarr on disk but till those modifications are reflected in the root metadata it's like they haven't happened. In this way it also could allow for versioning part's of a big zarr group and updating those versions. The old version could still work and if done correctly the eventually consistent nature of some backends doesn't matter (v1 looks under a different key to v2). It also would allow sharing of zarrays between multiple zarr groups (useful for example with common metadata such as lat-lon grid or maybe a land mask). My current thinking is to keep the actuall metadata in the groups but to provide a route to discover them. I'm sorry this is a bit of a rant/stream of consciousness, I just wanted to get something down before the weekend. I'm going to try to create a notebook explaining this better but in the meantime any thoughts welcome. |
Thanks @tam203, those are all great thoughts. FWIW I think it would be very doable to define these kinds of features via one or more protocol extensions. Within a protocol extension I think it would be fine to define whatever rules and semantics you like for embedding additional metadata within a group metadata document. So currently I think my favoured approach would be to have a minimal core protocol that provides support for a core feature set without any additional metadata, then allow freedom for these other needs to be addressed via extensions. If the extensions are widely useful then hopefully they would get implemented widely, but decoupling them from the core would just give us a way to work through the basics first. Very happy to discuss though. |
This PR has become a bit of a beast I'm afraid, lot's of editing on related pieces has ended up here. I think probably the best thing for now will be to merge into the dev branch, then I'll write up some information on where this has got to, and we can review, revisit and rework any parts as needed. |
I've merged but will write up some information over on #16 and very happy to revisit any aspect of this. |
This PR has work towards straw man content for sections on stores and storage protocol.