-
-
Notifications
You must be signed in to change notification settings - Fork 330
Store conversion methods #137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Musing on this a bit, maybe leveraging the |
That is a very elegant idea!
…On Wed, Mar 1, 2017 at 5:15 PM, jakirkham ***@***.***> wrote:
Musing on this a bit, maybe leveraging the update methods from the
MutableMapping interface might be better. One would then provide it the
store they would like to move their data to. For instance, converting to a
DirectoryStore or a TempStore should be able to follow similar code
paths, which should just work thanks to inheritance. This should also allow
it to remain extensible in the future.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/137#issuecomment-283405709>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QkveW6Cvgx6P7Kh97iYaKyYCEpwEks5rhaecgaJpZM4MP6u->
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
I think it would be worth adding an example to the tutorial to show how to copy data from one store to another. |
Just to clarify, we are thinking of this as a docs/examples item? If so, I think I agree. The existing interface is already sufficient, but a couple line example would be good. Any thoughts on where this would be documented? |
I was actually thinking that a convenience function like
zarr.copy_store(source, dest, source_path=None, dest_path=None) might be
useful. If source_path and dest_path were None this would be equivalent to
copying all key/value pairs from source to dest. If source_path were given
this would restrict copying to keys with this prefix, and then dest_path
would give a new prefix to prepend before storing.
Basically, it's a convenient way to copy data from one hierarchy directly
to another, without having to do any decompression/recompression, and
allowing for the path within source and dest to be provided so that parts
of a hierarchy can be selectively copied.
What do you think?
|
No complaints from me. This is somewhat similar to how I imagined the Group |
I was thinking there is potentially a use for both a low-level zarr.copy_store(source, dest, ...) where source and dest are store objects, and a higher-level zarr.copy(source, dest, ...) where source is array-like or group-like and dest is group-like. The low-level copy_store() is for the case where you want to replicate data exactly, and so you just copy key/value pairs from one store to another, which is going to be faster because there is no need to decompress/recompress chunks, and because only initialized chunks will get copied. The higher-level copy() would be for a more general case where you want to copy an entity (group or array), going via the create_group/create_dataset API. This could mean potentially that source and dest could be anything array- or group-like, e.g., either zarr or h5py, i.e., this would provide a way to migrate data between two zarr hierarchies, or zarr to/from h5py. This could also allow for copying but using different compression to store data in the destination than is used in the source. |
I see. Sure that seems fine. My initial thinking with |
I've had situations where I'd like to copy between two zarr objects, but
change the compressor, in which case it would not be possible to do a
direct store-to-store copy. So it would be nice to handle that use case, I
was thinking via copy().
…On Monday, November 20, 2017, jakirkham ***@***.***> wrote:
I see. Sure that seems fine.
My initial thinking with copy was that it would special case Zarr objects
and thus handle direct data replication with the performance of copy_store,
but through the same easy high level interface.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/issues/137#issuecomment-345868495>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QtFSlxpZWvnqycP77ThyoZgk5Qiiks5s4g6MgaJpZM4MP6u->
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
Big Data Institute Building
Old Road Campus
Roosevelt Drive
Oxford
OX3 7LF
United Kingdom
Phone: +44 (0)1865 743596
Email: [email protected]
Web: http://a <http://purl.org/net/aliman>limanfoo.github.io/
Twitter: https://twitter.com/alimanfoo
|
As follow-up/an extension to discussion in issue ( https://github.com/alimanfoo/zarr/issues/129 ), it would be pretty handy to have some methods to convert back and forth between
DirectoryStore
andZipStore
. Given that Zip files really are best treated as a write once medium, this would allow one to write out to aDirectoryStore
and then convert to aZipStore
. Similarly if editing needs to occur, one could extract theZipStore
and perform edits on theDirectoryStore
and then archive it afterwards.The text was updated successfully, but these errors were encountered: