Possible feature request on saving array mechanism #1140

AhmetCanSolak · 2022-09-21T17:00:07Z

Hello all,

As far as I understand, current zarr-python implementation is overwriting all the chunks by default when you have an array that is already saved, had some changes and will be saved again. (correct me if I am wrong but here what I am looking it:

zarr-python/zarr/convenience.py

Line 170 in 505810c

    
           _create_array(arr, store=_store, overwrite=True, zarr_version=zarr_version, path=path,

for example). In some cases(like image acquisition softwares that are trying to save more chunks as data arrives and continues to write chunks over hours/days) it can be wasteful overwrite all chunks especially if the only the new chunks are different chunks.

Less wordy explanation of the concern can be:

imagine you have an array on disk with 1000 chunks.
you want to append let's say 1000 more chunks of data to the array.
you want zarr api to realized first 1000 chunks will be identical anyway and not spend time overwrite it and directly only add new chunks.

Here at opensci2022 meeting, I have been discussing this with @jakirkham and he suggested one can resize the array first and fill only the new chunks with newly available values/frames. I think it is a valid way to address the concern. I like to discuss if we can possibly implement this internally and do it by default if possible. It may or may not change the existing public API(happy to discuss here). A few implementation ideas:

there is a require_dataset endpoint:

zarr-python/zarr/hierarchy.py

Line 997 in ce129a5

def require_dataset(self, name, shape, dtype=None, exact=False, **kwargs):

, maybe we can implement a similar function that is require_chunks and does the check internally and we can call such function in the save_array endpoint?
there is already an append API here:

zarr-python/zarr/core.py

Line 2507 in 43266ee

def append(self, data, axis=0):

but I am not sure if this would work as I explain above at all the times? I understood it works per axis at a time.

Any ideas/comments/discussions welcome!

The text was updated successfully, but these errors were encountered:

joshmoore · 2022-09-21T18:15:28Z

Thanks, @AhmetCanSolak. Cross-linking here as during the community meeting: #1017

sanketverma1704 mentioned this issue Dec 20, 2022

help wanted issues in zarr-python zarr-developers/community#56

Open

dstansby added the enhancement New features or improvements label Dec 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible feature request on saving array mechanism #1140

Possible feature request on saving array mechanism #1140

AhmetCanSolak commented Sep 21, 2022

joshmoore commented Sep 21, 2022

Possible feature request on saving array mechanism #1140

Possible feature request on saving array mechanism #1140

Comments

AhmetCanSolak commented Sep 21, 2022

joshmoore commented Sep 21, 2022