-
-
Notifications
You must be signed in to change notification settings - Fork 329
AsType Filter #96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AsType Filter #96
Conversation
FWICT both based on the tests and playing with it. This does what I would want. Namely store data on disk as one type and read it in as another (e.g. |
enc = np.empty_like(arr, dtype=self.dtype) | ||
|
||
# Convert and copy | ||
enc[...] = arr.astype(self.astype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be enc[...] = arr.astype(self.dtype)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Fixed.
I'm not sure this will work as intended with just this filter added to the array filters. I think the problem is that within This is why I think you also need to create a view, because you need to also modify the dtype attribute on the parent zarr array, as well as adding in the filter to perform the astype conversion. I think that if you were to call the
However I haven't tested this, there may be other issues we'd need to resolve to make this work. |
|
||
def astype(self, dtype): | ||
filt = AsType(dtype=self._dtype, astype=dtype) | ||
return self.view(filters=[filt], dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will work if self has no filters, but I think would need to check if there are any filters on the self array, and if so make a new filters list by copying the existing list and prepending the new AsType filter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. That would explain the test failure I was seeing. 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is now fixed in the latest changes.
Thanks for the suggestion above about using Figured that wrapping it in an |
No problem, I think the astype method is nice, very happy to merge in if
tests look good.
…On Wednesday, December 7, 2016, jakirkham ***@***.***> wrote:
Thanks for the suggestion above about using view. Was looking at that a
little bit last night, but wasn't sure whether I should be going that route
or not. The nudge in the right direction helped.
Figured that wrapping it in an astype method should make the interaction
with the filters a little nicer and a little more accessible. It's not a
one-to-one of the h5py astype function, but I'm not sure that it is
desirable for it to be. This is a bit more like NumPy's astype so
hopefully a bit more familiar to end users.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/pull/96#issuecomment-265515361>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QkewzBtKUg-gUbEXuGn9LCx_mRJWks5rFu5tgaJpZM4LGEbS>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Alright, finishing a couple more things ATM. Almost done. |
No rush, whenever you're happy. |
|
||
filters = [] | ||
if self._filters: | ||
filters.extend(self._filters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry had just realized _filters
could be None
. So had to update this like so. Please let me know if this is ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me.
astype = np.dtype(np.float32) | ||
|
||
store = dict() | ||
init_array(store, shape=shape, chunks=10, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this test to handle the case where self._filters
is None
.
|
||
data = np.arange(np.prod(shape), dtype=dtype).reshape(shape) | ||
|
||
z1 = self.create_array(shape=shape, chunks=chunks, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this case for when there are other filters.
---------- | ||
dtype : string or dtype | ||
NumPy dtype. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So one could technically write to disk when using astype
. I don't know if we should add a note or warning about this. Maybe something similar to what view
has. Please let me know if you have any thoughts on this.
Cool. Sorry had found a bug when I saw you post that. So wanted to make sure I got it fixed first. Have since fixed it and added a test for that case as noted in the diff comments. I'm happy with this as is. Just not sure if we should warn people about using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good. May need some discussion around the init arguments.
If `astype` is an integer data type, please ensure that it is | ||
sufficiently large to store encoded values. No checks are made and data | ||
may become corrupted due to integer overflow if `astype` is too small. | ||
Note also that the encoded data for each chunk includes the absolute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This sentence beginning "Note also..." comes from the DeltaFilter IIRC and shouldn't apply here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dropped.
|
||
Notes | ||
----- | ||
If `astype` is an integer data type, please ensure that it is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this note applies here, at least not exactly. Probably what is more relevant is the casting mode used within numpy.astype, as it is possible to use different casting modes, at least some of which would prevent anything like integer overflow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this was copy-pasted. I think we should have some note, but not the one here currently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with a better note.
|
||
codec_id = 'astype' | ||
|
||
def __init__(self, dtype, astype=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be worth adding a casting
argument here and storing that for use later within the call to numpy.astype
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about something like:
Notes
-----
This method returns a new Array object which is a view on the same underlying
chunk data. Modifying any data via the view will also modify data in the original
array. This is an experimental feature and care may be needed to avoid data
corruption or unexpected loss of precision.
See Also
--------
Array.view
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So at least initially I like the idea of casting
as an argument. However, my concern is this makes conversion asymmetric, but in a way that might not be obvious.
For example, say we have some int
initially and we want to use astype
to convert it to some float
. No matter what conversion I chose the conversion from int
to float
works fine. However, if one tries to set a value in the view, this won't work unless the casting is unsafe
.
We can come up with other types where a different casting constraint sets this threshold of whether it proceeds or not. Though instead of getting an unclear error about type conversion, maybe the better parameter instead of casting
is simply whether we make the view read_only
or not.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I took most of the original message, but have tweaked it a bit as I have disabled writing on the view provided by astype
.
enc = np.empty_like(arr, dtype=self.dtype) | ||
|
||
# Convert and copy | ||
enc[...] = arr.astype(self.dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you had a casting
property you could use that here, i.e., arr.astype(self.dtype, casting=self.casting)
.
copy_needed = True | ||
|
||
# Convert and copy | ||
dec[...] = enc.astype(self.astype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly could do enc.astype(self.astype, self.casting
.
config['id'] = self.codec_id | ||
config['dtype'] = self.dtype.str | ||
config['astype'] = self.astype.str | ||
return config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you did add a casting
property, would need to also include it in the config.
|
||
filters = [] | ||
if self._filters: | ||
filters.extend(self._filters) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks ok to me.
dtype : dtype | ||
Data type used to store the data. | ||
astype : dtype, optional | ||
Data type of the data read in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On reflection there may be potential for some confusion here. In this filter currently dtype
is the data type used for encoded data, and astype
is the data type used for decoded data. This has come about naturally because the primary use case was decoding. However, these argument names are used the other way around on other filters, e.g., from the Delta filter::
Parameters
----------
dtype : dtype
Data type to use for decoded data.
astype : dtype, optional
Data type to use for encoded data.
I think I'd suggest switching these two arguments around for consistency with other filters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH when I saw Delta
. The naming felt unclear and a bit confusing. That being said, I completely agree that having this opposite to Delta
doesn't help either. Personally, I'd rather change the names of these to something else entirely like from_dtype
and to_dtype
or perhaps data_dtype
and view_dtype
. Though any names that make it clear what these dtypes are applied seem ok IMHO. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to encode_dtype
and decode_dtype
.
So I responded in the comments to some of the big points. I'll try to summarize it here. The names of the type arguments are not great in either order. Instead of flipping them, I'd prefer we come up with clearer names. This should avoid confusing alternatives to other filters or confusing what is going on with this filter. Casting is an interesting idea. Though it mainly seems to set a constraint on backpropagation of data. Perhaps instead we should expose the |
I sympathize re the naming of __init__ arguments to the AsType filter
class. I struggled with this previously for the other filter classes, I
tried various combinations before settling on something that seemed the
least worst.
How about for the AsType filter having decode_astype and encode_astype as
__init__ arguments? I.e., encode_astype is the dtype of the encoded data,
and decode_astype is the dtype of the decoded data? Also encode_astype
could be None and then encode() would be a no-op, and similar for
decode_astype.
FWIW I'd rather not change __init__ param names for other filter classes
because this would break backwards compability and require a major version
bump. I'd like to avoid that for now at least.
It also might help to try and separate discussion a bit here.
One thread we are discussing is how to design the AsType codec, which
enables the use of numpy.astype to perform both encoding and decoding
operations on buffers. This can be considered entirely independently of the
rest of Zarr. In fact I am planning to rip out the codecs module from the
next minor version (2.1) of Zarr, I have factored it out into a separate
package called numcodecs. This doesn't need to change anything for this PR
because I haven't done the work on the Zarr side yet, and I can port over
the AsType filter to numcodecs later. However it's just to illustrate the
idea that codec classes may have a life of their own, independent of Zarr.
For codec classes in general it's been natural to think of encoding first
and decoding second, because usually you start out with some data, then you
want to encode it (e.g., to store it), then you want to decode it (e.g., to
retrieve it again). That's partly why other filter classes have naming the
other way around. I.e., the __init__ argument order (dtype, astype)
reflects the direction of encoding rather than decoding.
The second thread is the design of the Array.astype method. This makes use
of the AsType codec, but that is an implementation detail that can be
hidden from the user. Here the primary use case is to view an existing
array as a different dtype (which uses AsType.decode under the hood). But
we are also trying to figure out what should happen if the user tries to
update the view (which would use AsType.encode under the hood).
So for the Array.astype method the thinking is naturally the other way
around, because we think first about enabling the view (using
AsType.decode), then think second about what happens if data is updated via
the view (using AsType.encode).
I'm not sure if this is helping at all, I guess just trying to explain
these two different points of view.
Regarding casting, this is an interesting point. I guess we could either
leave as-is so the behaviour is to use numpy default (unsafe) casting
during both encoding and decoding operations. An alternative might be to
have encode_casting and decode_casting arguments, so there could be
different casting modes for encoding and decoding operations. This may be
an unnecessary level of detail and not needed for the read-only view use
case which only needs the decoding operation, and here I assume unsafe is
fine for most situations. But it might be worth thinking ahead a bit here
and doing a bit of future-proofing.
Separately it might be worth adding a read_only argument to Array.astype()
so the user can ask for a read-only view.
…On Thursday, December 8, 2016, jakirkham ***@***.***> wrote:
So I responded in the comments to some of the big points. I'll try to
summarize it here.
The names of the type arguments are not great in either order. Instead of
flipping them, I'd prefer we come up with clearer names. This should avoid
confusing alternatives to other filters or confusing what is going on with
this filter.
Casting is an interesting idea. Though it mainly seems to set a constraint
on backpropagation of data. Perhaps instead we should expose the read_only
flag and let users set that. We could also default it to True, which
avoids some surprises out of the box.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/pull/96#issuecomment-265776432>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QuGIdddY5NOFUFHmqCg5Uif8cScHks5rGClqgaJpZM4LGEbS>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Ok, so I have renamed the types to
Of course, wasn't planning on it. Though maybe they can be updated at some future date when a breaking change is planned.
Yes, it makes sense. Codecs in general focus on how to store new data; whereas, with astype we are thinking more about how to work with existing data.
Yeah, I don't claim to have the right answer with casting. Just wanted to point out that My inclination would be to leave it alone until we get a better idea of how people wish to use this. We can even add a note that this method is experimental and subject to change in the future.
Sure this seems totally reasonable. One could even go so far as setting |
filters.extend(self._filters) | ||
filters.insert(0, AsType(encode_dtype=self._dtype, decode_dtype=dtype)) | ||
|
||
return self.view(filters=filters, dtype=dtype, read_only=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I'm thinking about it. Honestly it seems pretty reasonable to me to just set read_only
to True
for now on the view. I am having trouble seeing a good use case for using this to write to disk. Especially as there are easier and clearer ways to do this. If someone asks, it is pretty easy to relax this constraint later. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should add this avoids some of the concerns raised about using casting
as this allows it to have the typical meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the hiatus. I'm very happy to set read_only=True by default.
|
||
codec_id = 'astype' | ||
|
||
def __init__(self, encode_dtype, decode_dtype=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm starting to question my choice to leave decode_dtype
as None
by default. We were borrowing from Delta
before, but Delta
still does something if these are the same. However, for AsType
this is an identity in that case. May try to drop this default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
So I have convinced myself that making the view returned by If we agree on However, the ref: https://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.astype.html |
So I have convinced myself that making the view returned by astype simply
read-only is a good thing. This is because of all the confusing and
concerning points raised in our discussion when considering write support.
Also if one wants to write data with a different type, there is already a
good mechanism for that. Namely one can convert their data to the type used
by the Zarr Array and write it in. This even still works when considering
Zarr Arrays and this version of the astype function. Please let me know
your thoughts on this.
Agreed, I think read_only is a good thing.
If we agree on astype being read-only always, then I see no harm in
adding a casting argument that behaves exactly the same as NumPy's.
Further it's meaning should be just as clear. Looking at NumPy's other
arguments for astype, most of them (e.g. subok and copy) really don't
make sense here. So I suspect we won't want to support them.
Yes adding a "casting" argument to Array.astype() makes sense to me.
However, the order argument actually does make sense particularly with
Zarr. That said, I'm a little unsure if there is more than say just passing
the order argument through. Also on the filter front, I could see value
in that being its own filter independent of AsType (maybe AsType should
be renamed or not?). The other reason I bring order up is this argument
comes before casting in NumPy's astype method. So I don't know how this
impacts support of casting near term. Would be curious to hear your
thoughts on this as well.
FWIW I don't think we want to expose an "order" argument in Array.astype(),
there are some complexities around what that should mean and I don't think
we want to go there right now.
…--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies again for the hiatus. A few minor suggestions but looks good.
Parameters | ||
---------- | ||
encode_dtype : dtype | ||
Data type used to store the data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to change this to "Data type to use for encoded data."
encode_dtype : dtype | ||
Data type used to store the data. | ||
decode_dtype : dtype, optional | ||
Data type of the data read in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to change this to "Data type to use for decoded data."
>>> z = f.encode(y) | ||
>>> z | ||
array([100, 102, 104, 106, 108, 110, 112, 114, 116, 118], dtype=int8) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this example, can I suggest to use 'i4' instead of 'i8'. It's just that that 'i8' and 'int8' look similar and so there is potential for confusion for users less familiar with NumPy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could just use np.int8
instead of i1
and such to avoid any confusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went ahead and switch to np.int8
and the like. Please let me know if this is acceptable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, that's great.
enc = np.empty_like(arr, dtype=self.encode_dtype) | ||
|
||
# Convert and copy | ||
enc[...] = arr.astype(self.encode_dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this might mean two copies are being done here? I.e., could you remove a copy if you replaced lines 534 and 537 with just:
enc = arr.astype(self.encode_dtype)
...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you just mean we don't need the np.empty
call above? Agreed. Will drop it.
|
||
# handle output | ||
if copy_needed: | ||
out = _buffer_copy(dec, out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an opportunity to remove a data copy here if out is None but can't immediately see the best way right now. Not essential but maybe worth it if straightforward to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't that happen already? We don't change out
here so I think that just works, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies, the logic is a bit tortuous here, I'm having trouble myself reconstructing everything after not looking at it for a while. I think this can actually be simplified as follows (which will also remove a memory copy in the case where out is None):
# view encoded data as 1D array
enc = _ndarray_from_buffer(buf, self.encode_dtype)
# apply decoding
dec = enc.astype(self.decode_dtype)
# handle output
out = _buffer_copy(dec, out)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just so I understand, are you suggesting just cutting the "setup decoded output" section basically? I think I copied it from the Delta
filter. Though you are probably right that it is overkill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this is now cleaned up.
def __repr__(self): | ||
r = '%s(encode_dtype=%s' % (type(self).__name__, self.encode_dtype) | ||
if self.decode_dtype != self.encode_dtype: | ||
r += ', decode_dtype=%s' % self.decode_dtype |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest to show both encode_dtype and decode_dtype in repr even if they are identical, as both are mandatory for this filter.
Sorry for the long turnaround. Have been out for a bit. Thanks for the feedback. Have implemented most of it verbatim. Please let me know if there is anything else. |
Thanks, all looks great to me. OK to merge? |
One last thing before I forget, could you add the new astype method into the API docs for the Array class, under docs/api/core.rst. Also could you add the new AsType codec class into the API docs under docs/api/codecs.rst. |
Good catch. Have added them. Also tried generating the documentation locally to make sure it shows up ok. Side note: I don't know if this fits with your style of documentation, but one can generate these auto command lines automatically with a few lines of code. |
Thanks for the good review. Nothing more needed on my end. At some point we should revisit casting, but am happy enough with this change as is. Shouldn't be hard to add support for it later. |
On Wednesday, January 4, 2017, jakirkham ***@***.***> wrote:
Good catch. Have added them. Also tried generating the documentation
locally to make sure it shows up ok.
Side note: I don't know if this fits with your style of documentation, but
one can generate these auto command lines automatically with a few lines
of code
<https://github.com/jakirkham/kenjutsu/blob/8bde7c5464c8ad83971bdefd6a9eef97c1b170a8/docs/conf.py#L283-L306>
.
Thanks, that's really useful to know. I'm manually crafting the API docs
for now but may use in future.
… —
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<https://github.com/alimanfoo/zarr/pull/96#issuecomment-270263520>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAq8QgzWkXSVHiuHZTKYJqWnfgrB3OrAks5rOuYGgaJpZM4LGEbS>
.
--
Alistair Miles
Head of Epidemiological Informatics
Centre for Genomics and Global Health <http://cggh.org>
The Wellcome Trust Centre for Human Genetics
Roosevelt Drive
Oxford
OX3 7BN
United Kingdom
Email: [email protected]
Web: http://purl.org/net/aliman
Twitter: https://twitter.com/alimanfoo
Tel: +44 (0)1865 287721
|
Thanks a lot for seeing this one through, it's a nice feature. Just to let you know, when I come to do other work for the 2.2 release I will refactor the AsType codec class upstream to the numcodecs package along with all other codec classes. |
Thanks for all the tips along the way. Sure make sense. Let me know if you need any help refactoring it. |
Fixes https://github.com/alimanfoo/zarr/issues/94
Allows data to be stored with one data type. When read in with this filter, it is automatically converted to another type specified. Based largely off the
Delta
filter, but tweaked a bit. Added a few tests to make sure this is working correctly.