-
-
Notifications
You must be signed in to change notification settings - Fork 329
Comparison of non-trivial, uncompressed, in-memory Zarr Arrays fails #348
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
A couple of options we might consider here are as follows.
Option 1 can be nice as this is consistent with the behavior of compressors currently and even more so historically. It also maps very well to what we already do with other stores where we are also writing out blobs of Option 2 can be nice from the standpoint of avoiding copies. Also Option 3 would avoid making any changes to how we store data currently. However this could be pretty complicated to implement. Plus stores would need to start learning about NumPy There may be other options that have not been listed, which would be interesting to evaluate and discuss. |
Have marked this as a release blocker as this is causing problems for the Numcodecs upgrade. ( #347 ) We effectively have a solution modeled after option 1 in that PR, but I don't think we should be beholden to that implementation or the choices made there when it comes to solving this issue. |
Thanks @jakirkham for capturing this so clearly. FWIW I think option 1 would be very reasonable, it seems like a good idea to ensure in general that the store has a reference to data that isn't going to change. Note that many of the compression codecs (Blosc, Zlib, LZMA, LZ4, Zstd) did previously return a bytes object from encode(), and so in cases where a compressor is configured, the store would not need to make a copy, it would already be given a bytes object by the compressor and could just keep that. However, we changed that in zarr-developers/numcodecs#136 and zarr-developers/numcodecs#144 so now an ndarray is returned, in which case obtaining bytes would require a copy. I wonder if we should back out those changes, and just let the codecs return bytes as they did previously. Option 2 could also be good, I guess we could check to see if the memoryview was writeable and only take a copy if it was. Agree option 3 seems not so good. Better to have a clean solution that allows store comparison and ensures data are immutable IMO. |
Completely agree. Probably option 1 is the most logical. Option 2 we'd probably have an The next question is where we should perform this coercion. Should it happen the
It's a good point. Have also been thinking about whether that is still a good idea. Happy to back that out if that is best. |
As a step in this direction, have added PR ( #350 ), which ensures the contents of |
The next question is where we should perform this coercion. Should it
happen the Array or should it happen in the store?
Just to confirm I think it should happen in the store. I.e., a store should
accept any object exporting the buffer-protocol as a value. It may then do
whatever it likes internally.
|
Thanks for confirming. Agree that seems like the most reasonable option. |
It seems that comparisons fail for a few stores. Added a test to demonstrate this in PR ( #357 ). |
The original issue has been fixed by PR ( #359 ). A similar fix has been applied to Testing of comparisons is being discussed in PR ( #357 ), but is not yet complete. However good discussion is already being had there. Am going to go ahead and close this as the original issue is fixed. Associated testing can continue to be discussed in the aforementioned PR. |
Comparing two in-memory Zarr arrays that do not use compressors errors out for anything other than the trivial case.
The issue being that NumPy arrays are being placed in the store (
dict
in this case). Even though NumPy arrays can be compared, the comparison is vectorized, which does not return a singlebool
that can be used in conditional expressions.This usually works fine for other compressors as they often return
bytes
, which are placed in the store and can be compared to return a singlebool
. However this is not the case when no compressors are used or when some combination of filters and/or compressors are used that do return a NumPy array.Minimal, reproducible code sample, a copy-pastable example if possible
Problem description
Ideally comparison here would return
True
orFalse
instead of erroring out.Version and installation information
Please provide the following:
zarr.__version__
: 2.2.0numcodecs.__version__
: 0.5.5Also, if you think it might be relevant, please provide the output from
pip freeze
orconda env export
depending on which was used to install Zarr.The text was updated successfully, but these errors were encountered: