-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Switch (some) coding/encoding in conventions.py to use xarray.coding. #1803
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The goal here is to eventually convert everything in xarray.conventions to using the new coding module, which is more modular and supports dask arrays. For now, I have switched over datetime, timedelta, unsigned integer, scaling and mask coding to use new coders. Integrating these into xarray.conventions lets us harness our existing test suite and delete a lot of redundant code. Most of the code/tests is simply reorganized. There should be no changes to public API (to keep this manageable for review). All of the original tests that are still relevant should still be present, though I have reorganized many of them into new locations to match the revised code.
not np.any(pd.isnull(fill_value))) | ||
if (has_fill or scale_factor is not None or add_offset is not None): | ||
if has_fill and np.array(fill_value).dtype.kind in ['U', 'S', 'O']: | ||
if string_encoding is not None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed this check. I think I put this logic in the wrong place: we really should ensure that _FillValue
is not provided when writing/encoding a variable to disk, not in reading/decoding.
xarray/backends/zarr.py
Outdated
coding.variables.CFMaskCoder(), | ||
coding.variables.UnsignedCoder()]: | ||
var = coder.encode(var, name=name) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer - what do you think about adding encode/decode methods to the AbstractWritableDataStore
? It seems each backend handles these slightly differently but this step happens for all backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, some of these need to get associated with the backend classes in some way. I was waiting to do that until we finish porting all of the stuff in conventions
into coding
.
scale_factor=scale_factor, | ||
add_offset=add_offset, | ||
dtype=dtype) | ||
data = lazy_elemwise_func(data, transform, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! When did lazy_elemwise_func
get added? I missed that somehow. This solves our problems related to multiple types of lazy arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was waiting for more review but if you think this is good to go we can merge it. I think this also fixes #1781 -- let me add a regression test. |
I'll give it another review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good incremental step towards the overall refactor. I went through the code and can't see any obvious problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, I had just two minor suggestions.
xarray/coding/times.py
Outdated
|
||
|
||
TIME_UNITS = frozenset(['days', 'hours', 'minutes', 'seconds', | ||
'milliseconds', 'microseconds']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can we put all these module level variables/constants up top. Since this is a new module, it would be nice to stick to an order of:
imports
constants
functions
classes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
xarray/coding/times.py
Outdated
|
||
def _infer_time_units_from_diff(unique_timedeltas): | ||
for time_unit, delta in [('days', 86400), ('hours', 3600), | ||
('minutes', 60), ('seconds', 1)]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you just iterate over _NS_PER_TIME_DELTA
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of. I refactored it to use _NS_PER_TIME_DELTA
but it's not that much cleaner than before.
Thanks @shoyer! |
Fixes #1781
The goal here is to eventually convert everything in xarray.conventions to
using the new coding module, which is more modular and supports dask arrays.
For now, I have switched over datetime, timedelta, unsigned integer, scaling
and mask coding to use new coders. Integrating these into xarray.conventions
lets us harness our existing test suite and delete a lot of redundant code.
Most of the code/tests is simply reorganized. There should be no changes to
public API (to keep this manageable for review). All of the original tests that
are still relevant should still be present, though I have reorganized many of
them into new locations to match the revised code.
git diff upstream/master **/*py | flake8 --diff
(remove if you did not edit any Python files)