-
-
Notifications
You must be signed in to change notification settings - Fork 329
Bump Numcodecs requirement to 0.6.1 #347
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @jakirkham. Probably worth adding a brief release note.
FTR saw some test failures locally that it looks like CI also encounters. Figured we'd want to look at these and determine how we want to fix them. |
Ah, yes, sorry, forgot about that. I expect some test failures due to #324. May be other consequences of the upgrade too. Suggest dealing with them in this PR. |
SGTM. How about this one ( # create an object array using pickle
z = self.create_array(shape=10, chunks=3, dtype=object, object_codec=Pickle())
z[0] = 'foo'
> assert z[0] == 'foo' ref: https://travis-ci.org/zarr-developers/zarr/jobs/461883537#L2717-L2720 ...or this one ( with pytest.raises(RuntimeError):
# noinspection PyStatementEffect
> v[:]
E Failed: DID NOT RAISE <type 'exceptions.RuntimeError'> ref: https://travis-ci.org/zarr-developers/zarr/jobs/461883537#L2621-L2624 |
Previously MsgPack was turning bytes objects to unicode objects when round-tripping them. However this has been fixed in the latest version of Numcodecs. So correct this test now that MsgPack is working correctly.
Also there is another failure. Appears |
Based on testing locally, it looks like we are down to the for compressor in Zlib(1), Blosc():
z = self.create_array(shape=len(data), chunks=30, dtype=object,
object_codec=Categorize(greetings,
dtype=object),
compressor=compressor)
z[:] = data
v = z.view(filters=[])
with pytest.raises(RuntimeError):
# noinspection PyStatementEffect
> v[:]
E Failed: DID NOT RAISE <type 'exceptions.RuntimeError'> |
Hi @jakirkham, took the liberty to push a commit that should resolve the test failure due to RuntimeError not being raised for object arrays. Other issues should be resolved with the fixes you've merged into numcodecs, so I'll cut a numcodecs 0.6.2 release then come back and update the version requirement here. |
Thanks @alimanfoo! |
As we already ensured the `chunk` is an `ndarray` viewing the original data, there is no need for us to do that here as well. Plus the checks performed by `ensure_contiguous_ndarray` are not needed for our use case here. Particularly as we have already handled the unusual type cases above. We also don't need to constrain the buffer size. As such the only thing we really need is to flatten the array and make it contiguous, which is what we handle here directly.
As both the expected `object` case and the non-`object` case perform a `reshape` to flatten the data, go ahead and refactor that out of both cases and handle it generally. Simplifies the code a bit.
As refactoring of the `reshape` step has effectively dropped the expected `object` type case, the checks for different types is a little more complicated than needed. To fix this, basically invert and swap the case ordering. This way we can handle all generally expected types first and simply cast them. Then we can raise if an `object` type shows up and is unexpected.
Did a little bit of refactoring on your change. Hope that is ok. Please let me know your thoughts on that. |
The `DictStore` is pretty reliant on the fact that values are immutable and can be easily compared. For example `__eq__` assumes that all contents can be compared easily. This works fine if the data is `bytes`. However it doesn't really work for `ndarray`s for example. Previously we would have stored whatever the user gave us here. This means comparisons could falldown in those cases as well (much as the example in the tutorial has highlighted on CI). Now we effectively require that the data be something that can either be coerced to `bytes` (e.g. via the new/old buffer protocol) or is `bytes` to begin with. Make sure not to force this requirement when nesting one `MutableMapping` within another.
This test case seems to be ill-posed. Anytime we store `object`s to `Array`s we require an `object_codec` to be specified. Otherwise we have no clean way to serialize the data. However this `DictStore` test breaks that assumption by explicitly storing an `object` type in it even though this would never work for the other stores (particularly when working with `Array`s). This includes in-memory Zarr `Array`s, which would be backed by `DictStore`. Given this, we go ahead and drop this test case.
Instead of using a Python `dict` as the `default` store for a Zarr `Array`, use the `DictStore`. This ensures that all blobs will be represented as `bytes` regardless of what the user provided as data. Thus things like comparisons of stores will work well in the default case.
On a different point, it looks like Looking at this example before the recent Numcodecs release, it seems the contents of However there is a wrinkle with this approach. It appears we had a test that was assuming that Also it appears that the Happy to hear thoughts or receive pushback on this approach as I may very well be missing something. If you know of a better approach, would also be interested in hearing about that as well. Also raised issue ( #348 ) to provide a nice simple example of this problem. |
NVM this was more fallout from the switch to Edit: This has since been fixed. |
As we are now using `DictStore` to back the `Array`, we can correctly measure how much memory it is using. So update the examples in `info` and the tutorial to show how much memory is being used. Also update the store type listed in info as well.
As Numcodecs now includes a very versatile and effective `ensure_bytes` function, there is no need to define our own in `zarr.storage` as well. So go ahead and drop it.
As this is no longer being used by `ensure_bytes` as that function was dropped, go ahead and drop `binary_type` as well.
Make use of Numcodecs' `ensure_ndarray` to take `ndarray` views onto buffers to be stored in a few cases so as to reshape them and avoid a copy (thanks to the buffer protocol).
Rewrite `buffer_size` to just use Numcodecs' `ensure_ndarray` to get an `ndarray` that views the data. Once the `ndarray` is gotten, all that is needed is to access its `nbytes` member, which returns the number of bytes that it takes up.
If the data is already a `str` instance, turn `ensure_str` into a no-op. For all other cases, make use of Numcodecs' `ensure_bytes` to aid `ensure_str` in coercing data through the buffer protocol. If we are on Python 3, then decode the `bytes` object to a `str`.
As `DictStore` now must only store `bytes` or types coercible to bytes via the buffer protocol, there is no possibility for it to have unknown sizes as `bytes` always have a known size. So drop these cases where the size can be `-1`.
Simplifies some of the code by making use of Numcodecs utility functions. This avoids a few copies when working with some of the stores. Also generally makes the rest easier to understand. |
Make sure that datetime/timedelta arrays are cast to a type that supports the buffer protocol. Ensure this is a type that can handle all of the datetime/timedelta values and has the same itemsize.
Instead of using `ensure_ndarray`, use `ensure_contiguous_ndarray` with the stores. This ensures that datetime/timedeltas are handled by default. Also catches things like object arrays. Finally this handles flattening the array if needed.
As there are some critical fixes in the latest Numcodecs, this bumps our lower bound to the latest version.
TODO:
tox -e docs
)