Skip to content

Switch (some) coding/encoding in conventions.py to use xarray.coding. #1803

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 11, 2018

Conversation

shoyer
Copy link
Member

@shoyer shoyer commented Dec 30, 2017

Fixes #1781

The goal here is to eventually convert everything in xarray.conventions to
using the new coding module, which is more modular and supports dask arrays.
For now, I have switched over datetime, timedelta, unsigned integer, scaling
and mask coding to use new coders. Integrating these into xarray.conventions
lets us harness our existing test suite and delete a lot of redundant code.

Most of the code/tests is simply reorganized. There should be no changes to
public API (to keep this manageable for review). All of the original tests that
are still relevant should still be present, though I have reorganized many of
them into new locations to match the revised code.

  • Tests added (for all bug fixes or enhancements)
  • Tests passed (for all non-documentation changes)
  • Passes git diff upstream/master **/*py | flake8 --diff (remove if you did not edit any Python files)

The goal here is to eventually convert everything in xarray.conventions to
using the new coding module, which is more modular and supports dask arrays.
For now, I have switched over datetime, timedelta, unsigned integer, scaling
and mask coding to use new coders. Integrating these into xarray.conventions
lets us harness our existing test suite and delete a lot of redundant code.

Most of the code/tests is simply reorganized. There should be no changes to
public API (to keep this manageable for review). All of the original tests that
are still relevant should still be present, though I have reorganized many of
them into new locations to match the revised code.
not np.any(pd.isnull(fill_value)))
if (has_fill or scale_factor is not None or add_offset is not None):
if has_fill and np.array(fill_value).dtype.kind in ['U', 'S', 'O']:
if string_encoding is not None:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this check. I think I put this logic in the wrong place: we really should ensure that _FillValue is not provided when writing/encoding a variable to disk, not in reading/decoding.

coding.variables.CFMaskCoder(),
coding.variables.UnsignedCoder()]:
var = coder.encode(var, name=name)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shoyer - what do you think about adding encode/decode methods to the AbstractWritableDataStore? It seems each backend handles these slightly differently but this step happens for all backends.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, some of these need to get associated with the backend classes in some way. I was waiting to do that until we finish porting all of the stuff in conventions into coding.

scale_factor=scale_factor,
add_offset=add_offset,
dtype=dtype)
data = lazy_elemwise_func(data, transform, dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! When did lazy_elemwise_func get added? I missed that somehow. This solves our problems related to multiple types of lazy arrays.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jhamman
Copy link
Member

jhamman commented Jan 9, 2018

@shoyer - can we merge this soon so I can use it in #1800?

@shoyer
Copy link
Member Author

shoyer commented Jan 9, 2018

I was waiting for more review but if you think this is good to go we can merge it.

I think this also fixes #1781 -- let me add a regression test.

@jhamman
Copy link
Member

jhamman commented Jan 9, 2018

I'll give it another review.

Copy link
Contributor

@rabernat rabernat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good incremental step towards the overall refactor. I went through the code and can't see any obvious problems.

Copy link
Member

@jhamman jhamman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, I had just two minor suggestions.



TIME_UNITS = frozenset(['days', 'hours', 'minutes', 'seconds',
'milliseconds', 'microseconds'])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we put all these module level variables/constants up top. Since this is a new module, it would be nice to stick to an order of:

imports
constants
functions
classes

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


def _infer_time_units_from_diff(unique_timedeltas):
for time_unit, delta in [('days', 86400), ('hours', 3600),
('minutes', 60), ('seconds', 1)]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you just iterate over _NS_PER_TIME_DELTA?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of. I refactored it to use _NS_PER_TIME_DELTA but it's not that much cleaner than before.

@jhamman jhamman merged commit 50b0a69 into pydata:master Jan 11, 2018
@jhamman
Copy link
Member

jhamman commented Jan 11, 2018

Thanks @shoyer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UnboundLocalError when opening netCDF file
3 participants