Switch (some) coding/encoding in conventions.py to use xarray.coding. #1803

shoyer · 2017-12-30T23:59:11Z

The goal here is to eventually convert everything in xarray.conventions to
using the new coding module, which is more modular and supports dask arrays.
For now, I have switched over datetime, timedelta, unsigned integer, scaling
and mask coding to use new coders. Integrating these into xarray.conventions
lets us harness our existing test suite and delete a lot of redundant code.

Most of the code/tests is simply reorganized. There should be no changes to
public API (to keep this manageable for review). All of the original tests that
are still relevant should still be present, though I have reorganized many of
them into new locations to match the revised code.

Tests added (for all bug fixes or enhancements)
Tests passed (for all non-documentation changes)
Passes git diff upstream/master **/*py | flake8 --diff (remove if you did not edit any Python files)

The goal here is to eventually convert everything in xarray.conventions to using the new coding module, which is more modular and supports dask arrays. For now, I have switched over datetime, timedelta, unsigned integer, scaling and mask coding to use new coders. Integrating these into xarray.conventions lets us harness our existing test suite and delete a lot of redundant code. Most of the code/tests is simply reorganized. There should be no changes to public API (to keep this manageable for review). All of the original tests that are still relevant should still be present, though I have reorganized many of them into new locations to match the revised code.

shoyer · 2017-12-31T00:56:00Z

xarray/conventions.py

-                    not np.any(pd.isnull(fill_value)))
-        if (has_fill or scale_factor is not None or add_offset is not None):
-            if has_fill and np.array(fill_value).dtype.kind in ['U', 'S', 'O']:
-                if string_encoding is not None:


I removed this check. I think I put this logic in the wrong place: we really should ensure that _FillValue is not provided when writing/encoding a variable to disk, not in reading/decoding.

jhamman · 2018-01-03T17:30:43Z

xarray/backends/zarr.py

+                  coding.variables.CFMaskCoder(),
+                  coding.variables.UnsignedCoder()]:
+        var = coder.encode(var, name=name)
+


@shoyer - what do you think about adding encode/decode methods to the AbstractWritableDataStore? It seems each backend handles these slightly differently but this step happens for all backends.

Yes, some of these need to get associated with the backend classes in some way. I was waiting to do that until we finish porting all of the stuff in conventions into coding.

rabernat · 2018-01-08T19:32:49Z

xarray/coding/variables.py

+                                scale_factor=scale_factor,
+                                add_offset=add_offset,
+                                dtype=dtype)
+            data = lazy_elemwise_func(data, transform, dtype)


This looks great! When did lazy_elemwise_func get added? I missed that somehow. This solves our problems related to multiple types of lazy arrays.

@rabernat - see #1752

jhamman · 2018-01-09T21:25:02Z

@shoyer - can we merge this soon so I can use it in #1800?

shoyer · 2018-01-09T22:11:37Z

I was waiting for more review but if you think this is good to go we can merge it.

I think this also fixes #1781 -- let me add a regression test.

jhamman · 2018-01-09T22:20:04Z

I'll give it another review.

rabernat

This seems like a good incremental step towards the overall refactor. I went through the code and can't see any obvious problems.

jhamman

Looks good, I had just two minor suggestions.

jhamman · 2018-01-10T18:56:40Z

xarray/coding/times.py

+
+
+TIME_UNITS = frozenset(['days', 'hours', 'minutes', 'seconds',
+                        'milliseconds', 'microseconds'])


nit: can we put all these module level variables/constants up top. Since this is a new module, it would be nice to stick to an order of:

imports
constants
functions
classes

jhamman · 2018-01-10T18:58:42Z

xarray/coding/times.py

+
+def _infer_time_units_from_diff(unique_timedeltas):
+    for time_unit, delta in [('days', 86400), ('hours', 3600),
+                             ('minutes', 60), ('seconds', 1)]:


can you just iterate over _NS_PER_TIME_DELTA?

Sort of. I refactored it to use _NS_PER_TIME_DELTA but it's not that much cleaner than before.

jhamman · 2018-01-11T16:53:33Z

Thanks @shoyer!

shoyer added 2 commits December 30, 2017 18:03

Fix zarr and cmds export

61de0ce

shoyer commented Dec 31, 2017

View reviewed changes

jhamman reviewed Jan 3, 2018

View reviewed changes

rabernat reviewed Jan 8, 2018

View reviewed changes

rabernat approved these changes Jan 9, 2018

View reviewed changes

add whats-new and small cleanup

f23dfe4

jhamman mentioned this pull request Jan 10, 2018

WIP: Performance improvements for zarr backend #1800

Merged

5 tasks

Merge branch 'master' into use-new-variable-coders

4b2ca19

jhamman approved these changes Jan 10, 2018

View reviewed changes

shoyer added 2 commits January 10, 2018 19:51

Move constant to top of module

1b24ef7

use _NS_PER_TIME_DELTA

67f86fc

jhamman merged commit 50b0a69 into pydata:master Jan 11, 2018

shoyer deleted the use-new-variable-coders branch January 11, 2018 17:21

fujiisoup mentioned this pull request Mar 21, 2018

Unexpected decoded time in xarray >= 0.10.1 #2002

Closed

jhamman mentioned this pull request Feb 4, 2019

Expose a public interface for CF encoding/decoding functions #155

Open



		TIME_UNITS = frozenset(['days', 'hours', 'minutes', 'seconds',
		'milliseconds', 'microseconds'])

Uh oh!

Switch (some) coding/encoding in conventions.py to use xarray.coding. #1803

Switch (some) coding/encoding in conventions.py to use xarray.coding. #1803

Uh oh!

Conversation

shoyer commented Dec 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhamman commented Jan 9, 2018

Uh oh!

shoyer commented Jan 9, 2018

Uh oh!

jhamman commented Jan 9, 2018

Uh oh!

rabernat left a comment

Choose a reason for hiding this comment

Uh oh!

jhamman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jhamman commented Jan 11, 2018

Uh oh!

Uh oh!

shoyer commented Dec 30, 2017 •

edited

Loading