-
-
Notifications
You must be signed in to change notification settings - Fork 329
WIP, RFC: Optimize setitem with chunk equal to fill_value #367
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP, RFC: Optimize setitem with chunk equal to fill_value #367
Conversation
Matches how these lines are written in `_set_basic_selection_zd`.
Add a simple check to see if the key-value pair is just being set with a chunk equal to the fill value. If so, simply delete the key-value pair instead of storing a chunk that only contains the fill value. The Array will behave the same externally. However this will cutdown on the space require to store the Array. Also will make sure that copying one Array to another Array won't dramatically effect the storage size.
I have a very large, relatively sparse (lots of empty chunks) array. I would really like to avoid writing all those empty chunks to disk. So 👍 to this! |
@@ -1476,6 +1476,17 @@ def _set_basic_selection_zd(self, selection, value, fields=None): | |||
else: | |||
chunk[selection] = value | |||
|
|||
# clear chunk if it only contains the fill value | |||
if np.all(np.equal(chunk, self._fill_value)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it could sometimes be an expensive operation. So it should be optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large are typical chunk sizes for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
regardless of chunk size, in many cases the user knows in advance whether this check will ever be useful. For dense imaging data, this conditional will almost never evaluate to true; for sparse data, this conditional will often evaluate to true. So the user should have some discretion over whether this check occurs.
Thanks! Yeah that was the idea behind it. There was funkiness with Can give this another look 🙂 |
I actually assumed recently that this was the default. |
Nope, but I think we discussed recently how we might get this to work for |
I was also bitten by thinking that was the default! Would love to see this merged. |
@jakirkham I would be happy to move this PR forward, but I don't think I have push access to your fork. I made some changes (added a kwarg to the |
Sounds good. In that case feel free to send a PR to zarr-developers. Am going to close this out. |
Fixes #366
If a chunk only contains the
fill_value
, delete the key-value pair for that chunk instead of storing the array. Should cutdown on storage space require for theArray
. Also should improve the performance of copying oneArray
to another.TODO:
tox -e docs
)