Skip to content

WIP, RFC: Optimize setitem with chunk equal to fill_value #367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 23 additions & 3 deletions zarr/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -1476,6 +1476,17 @@ def _set_basic_selection_zd(self, selection, value, fields=None):
else:
chunk[selection] = value

# clear chunk if it only contains the fill value
if np.all(np.equal(chunk, self._fill_value)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it could sometimes be an expensive operation. So it should be optional.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large are typical chunk sizes for you?

Copy link
Contributor

@d-v-b d-v-b Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

regardless of chunk size, in many cases the user knows in advance whether this check will ever be useful. For dense imaging data, this conditional will almost never evaluate to true; for sparse data, this conditional will often evaluate to true. So the user should have some discretion over whether this check occurs.

try:
del self.chunk_store[ckey]
return
except KeyError:
return
except Exception:
# deleting failed, fallback to overwriting
pass

# encode and store
cdata = self._encode_chunk(chunk)
self.chunk_store[ckey] = cdata
Expand Down Expand Up @@ -1716,10 +1727,19 @@ def _chunk_setitem_nosync(self, chunk_coords, chunk_selection, value, fields=Non
else:
chunk[chunk_selection] = value

# encode chunk
cdata = self._encode_chunk(chunk)
# clear chunk if it only contains the fill value
if np.all(np.equal(chunk, self._fill_value)):
try:
del self.chunk_store[ckey]
return
except KeyError:
return
except Exception:
# deleting failed, fallback to overwriting
pass

# store
# encode and store
cdata = self._encode_chunk(chunk)
self.chunk_store[ckey] = cdata

def _chunk_key(self, chunk_coords):
Expand Down