-
-
Notifications
You must be signed in to change notification settings - Fork 329
FSStore: use ensure_bytes()
#1285
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## main #1285 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 35 35
Lines 14388 14404 +16
=========================================
+ Hits 14388 14404 +16
|
@normanrz can you check if this fixes the issue? |
Yes! That looks good to me. Thanks for the quick fix! |
Thanks for this PR @madsbk! I recently hit a similar issue involving zero-dimensional arrays (which have compression turned off by default). I read the discussion in #1279, and I am left wondering why we leave it up to the store to call What do others think about this? |
We changed this with #934, now we also allow Having said that, I was surprised that |
I see that this has been implemented in the zarr-python implementation. And I thank everyone who worked on this amazing feature! 🚀 But that doesn't mean it is consistent with what the spec says:
It is certainly straightforward to convert an uncompressed ndarray into a sequence of bytes. So maybe I am overthinking this... Or maybe the spec needs to be updated to reflect these new implementation changes? 🤔 My only goal here is to make sure that the spec and the implementation are consistent with each other. Otherwise we risk interoperability with other zarr implementations. If we are fine with saying "a value is a sequence of bytes or a raw uncompressed array" then I am fine with leaving it up to the store implementation whether to convert arrays to bytes. |
Good point and I agree, the spec and implementation should match up! I must admit I am reading "A value is a sequence of bytes" in a more general sense, not as a reference to the |
Hey Mads, thanks for following up on this issue. That said, don't think we should be making this change. In general it is already the expectation in zarr-python & Numcodecs that Python Buffer Protocol supporting objects are used ( How to word this in the spec is nuanced. The spec itself is not meant to be Python specific, but to support other languages as well. They may not have Perhaps others who found this wording in the spec confusing can chime in on how to clarify it. Maybe we can say "a value contains binary data"? |
@jakirkham, I am not sure I understand. Are you saying that we shouldn't use I have no strong opinion about the specifications, but I think it is important that we allow |
Could we call |
I'm not sure that would work. I think the trick is that at some point, |
There's |
Since most other stores use |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with this fix for now, since it seems consistent with what most other stores do.
We can revisit the question of whether this ensure_bytes
logic should live elsewhere in the future.
Sorry to be a stickler. The main reason I bring up |
Any chance we could get this merged and released? |
I'd like to hear from @jakirkham - John, would you be you okay with addressing |
I think @jakirkham's latest suggestion is a straightforward compromise that could be incorporated as part of this PR. |
Done |
Re-opening to try to fix the codecov build. |
Now green. @jakirkham: does this match your expectations? If so, happy to get it in and prep a release during the holidays. (Help with the release notes will be appreciated!) |
If we're doing a release, let's get #1304 in too. |
see #1309 (comment) Leaving this PR for @jakirkham to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Mads! 🙏
LGTM. Made a few comments that we can follow up on separately
@@ -696,3 +701,28 @@ def all_equal(value: Any, array: Any): | |||
# using == raises warnings from numpy deprecated pattern, but | |||
# using np.equal() raises type errors for structured dtypes... | |||
return np.all(value == array) | |||
|
|||
|
|||
def ensure_contiguous_ndarray_or_bytes(buf) -> Union[NDArrayLike, bytes]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to move this to numcodecs.compat
, but that can be follow-on work
values = { | ||
self._normalize_key(key): ensure_contiguous_ndarray_or_bytes(val) | ||
for key, val in values.items() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something we may consider is delaying the instantiation of the dict
so these copies occur as these values are requested. This can be a follow-on though
@jakirkham: ok to get this in the 2.13.x series? @madsbk: I'll get this out ASAP but might wait on any emergency needs for re: #1324 Suggestions for the docs/release.rst statement would be welcome. 🙏🏽 |
Oops sorry should have asked for that here (or just added it). Submitted PR ( #1325 ) |
Fixes #1279
TODO: