Skip to content

Commit eeb109d

Browse files
duncanwpshoyer
authored andcommitted
xarray to and from Iris (#1750)
* Added a first go at a converter from xarray.DataArrays objects to Iris.cube.Cube objects. Uses the same template as the cdms2 conversion. * Update tests to use original.coords and add extra tests for coord attributes * Remove set literals just in case. Replace has_key with in. Use AuxCoord and DimCoord correctly so 2d coords will work. Use dims variable to convert dimension numbers into names. Add 2d coord to test DataArray. * Create dimensions if the Iris cube does not have any * Add code to convert cell_methods * Don't append blank cell method * Update cell method code to use internal Iris functions. Also add tests. * Update the API for IRIS change * Move helper functions outside of main functions * Update to build dims with mix of Dimension and Auxiliary coordinates * Fix import after merge * Bug fix / refactoring * Change the dencode_cf method and raise error if coord has no var_name * Add missing test and two minor fixes * Replacing function calls with set literals * Adding what's new and updating api.rst * Adding from_cube method to docs * Make Iris dependency >=1.10 so we can rely on the newer parse_cell_methods interface (this is already pretty old now) * Adding Iris section to the I/O docs * Fixing typos * Improving error message * Test edge cases * Preserve dask data arrays if at all possible. Convert to (dask) masked arrays when going to Iris and filled (dask) arrays when coming from Iris. * Updates to remove the hard dependency on dask, and to test the conversion of dask arrays which may or may not be masked. * Minor doc fixes * Minor typo * Use dask_array_type for dask checks * Updating to run the iris docs * Fix typo * Fixing long lines
1 parent 76e9ed8 commit eeb109d

9 files changed

+371
-12
lines changed

.travis.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ matrix:
1313
- python: 2.7
1414
env: CONDA_ENV=py27-min
1515
- python: 2.7
16-
env: CONDA_ENV=py27-cdat+pynio
16+
env: CONDA_ENV=py27-cdat+iris+pynio
1717
- python: 3.4
1818
env: CONDA_ENV=py34
1919
- python: 3.5

ci/requirements-py27-cdat+pynio.yml renamed to ci/requirements-py27-cdat+iris+pynio.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,7 @@ dependencies:
2222
- seaborn
2323
- toolz
2424
- rasterio
25+
- iris>=1.10
2526
- zarr
2627
- pip:
2728
- coveralls

doc/api.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -450,6 +450,8 @@ DataArray methods
450450
DataArray.to_index
451451
DataArray.to_masked_array
452452
DataArray.to_cdms2
453+
DataArray.to_iris
454+
DataArray.from_iris
453455
DataArray.to_dict
454456
DataArray.from_series
455457
DataArray.from_cdms2

doc/environment.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,4 @@ dependencies:
1717
- rasterio=0.36.0
1818
- sphinx-gallery
1919
- zarr
20+
- iris

doc/io.rst

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -338,6 +338,38 @@ supported by netCDF4-python: 'standard', 'gregorian', 'proleptic_gregorian' 'nol
338338
By default, xarray uses the 'proleptic_gregorian' calendar and units of the smallest time
339339
difference between values, with a reference time of the first time value.
340340

341+
.. _io.iris:
342+
343+
Iris
344+
----
345+
346+
The Iris_ tool allows easy reading of common meteorological and climate model formats
347+
(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very
348+
similar to ``DataArray`` objects, while enforcing a CF-compliant data model. If iris is
349+
installed xarray can convert a ``DataArray`` into a ``Cube`` using
350+
:py:meth:`~xarray.DataArray.to_iris`:
351+
352+
.. ipython:: python
353+
354+
da = xr.DataArray(np.random.rand(4, 5), dims=['x', 'y'],
355+
coords=dict(x=[10, 20, 30, 40],
356+
y=pd.date_range('2000-01-01', periods=5)))
357+
358+
cube = da.to_iris()
359+
cube
360+
361+
Conversely, we can create a new ``DataArray`` object from a ``Cube`` using
362+
:py:meth:`~xarray.DataArray.from_iris`:
363+
364+
.. ipython:: python
365+
366+
da_cube = xr.DataArray.from_iris(cube)
367+
da_cube
368+
369+
370+
.. _Iris: http://scitools.org.uk/iris
371+
372+
341373
OPeNDAP
342374
-------
343375

doc/whats-new.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,9 @@ Enhancements
2525
1D co-ordinate (e.g. time) and a 2D co-ordinate (e.g. depth as a function of
2626
time) (:issue:`1737`).
2727
By `Deepak Cherian <https://github.com/dcherian>`_.
28+
- Added :py:meth:`DataArray.to_iris <xray.DataArray.to_iris>` and :py:meth:`DataArray.from_iris <xray.DataArray.from_iris>` for
29+
converting data arrays to and from Iris_ Cubes with the same data and coordinates (:issue:`621` and :issue:`37`).
30+
By `Neil Parley <https://github.com/nparley>`_ and `Duncan Watson-Parris <https://github.com/duncanwp>`_.
2831
- Use ``pandas.Grouper`` class in xarray resample methods rather than the
2932
deprecated ``pandas.TimeGrouper`` class (:issue:`1766`).
3033
By `Joe Hamman <https://github.com/jhamman>`_.
@@ -36,6 +39,7 @@ Enhancements
3639

3740
.. _Zarr: http://zarr.readthedocs.io/
3841

42+
.. _Iris: http://scitools.org.uk/iris
3943

4044
**New functions/methods**
4145

xarray/convert.py

Lines changed: 172 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -7,24 +7,42 @@
77
import numpy as np
88

99
from .core.dataarray import DataArray
10+
from .core.pycompat import OrderedDict, range
11+
from .core.dtypes import get_fill_value
1012
from .conventions import (
1113
maybe_encode_timedelta, maybe_encode_datetime, decode_cf)
1214

13-
ignored_attrs = set(['name', 'tileIndex'])
15+
cdms2_ignored_attrs = {'name', 'tileIndex'}
16+
iris_forbidden_keys = {'standard_name', 'long_name', 'units', 'bounds', 'axis',
17+
'calendar', 'leap_month', 'leap_year', 'month_lengths',
18+
'coordinates', 'grid_mapping', 'climatology',
19+
'cell_methods', 'formula_terms', 'compress',
20+
'missing_value', 'add_offset', 'scale_factor',
21+
'valid_max', 'valid_min', 'valid_range', '_FillValue'}
22+
cell_methods_strings = {'point', 'sum', 'maximum', 'median', 'mid_range',
23+
'minimum', 'mean', 'mode', 'standard_deviation',
24+
'variance'}
25+
26+
27+
def encode(var):
28+
return maybe_encode_timedelta(maybe_encode_datetime(var.variable))
29+
30+
31+
def _filter_attrs(attrs, ignored_attrs):
32+
""" Return attrs that are not in ignored_attrs
33+
"""
34+
return dict((k, v) for k, v in attrs.items() if k not in ignored_attrs)
1435

1536

1637
def from_cdms2(variable):
1738
"""Convert a cdms2 variable into an DataArray
1839
"""
19-
def get_cdms2_attrs(var):
20-
return dict((k, v) for k, v in var.attributes.items()
21-
if k not in ignored_attrs)
22-
2340
values = np.asarray(variable)
2441
name = variable.id
25-
coords = [(v.id, np.asarray(v), get_cdms2_attrs(v))
42+
coords = [(v.id, np.asarray(v),
43+
_filter_attrs(v.attributes, cdms2_ignored_attrs))
2644
for v in variable.getAxisList()]
27-
attrs = get_cdms2_attrs(variable)
45+
attrs = _filter_attrs(variable.attributes, cdms2_ignored_attrs)
2846
dataarray = DataArray(values, coords=coords, name=name, attrs=attrs)
2947
return decode_cf(dataarray.to_dataset())[dataarray.name]
3048

@@ -35,9 +53,6 @@ def to_cdms2(dataarray):
3553
# we don't want cdms2 to be a hard dependency
3654
import cdms2
3755

38-
def encode(var):
39-
return maybe_encode_timedelta(maybe_encode_datetime(var.variable))
40-
4156
def set_cdms2_attrs(var, attrs):
4257
for k, v in attrs.items():
4358
setattr(var, k, v)
@@ -53,3 +68,150 @@ def set_cdms2_attrs(var, attrs):
5368
cdms2_var = cdms2.createVariable(var.values, axes=axes, id=dataarray.name)
5469
set_cdms2_attrs(cdms2_var, var.attrs)
5570
return cdms2_var
71+
72+
73+
def _pick_attrs(attrs, keys):
74+
""" Return attrs with keys in keys list
75+
"""
76+
return dict((k, v) for k, v in attrs.items() if k in keys)
77+
78+
79+
def _get_iris_args(attrs):
80+
""" Converts the xarray attrs into args that can be passed into Iris
81+
"""
82+
# iris.unit is deprecated in Iris v1.9
83+
import cf_units
84+
args = {'attributes': _filter_attrs(attrs, iris_forbidden_keys)}
85+
args.update(_pick_attrs(attrs, ('standard_name', 'long_name',)))
86+
unit_args = _pick_attrs(attrs, ('calendar',))
87+
if 'units' in attrs:
88+
args['units'] = cf_units.Unit(attrs['units'], **unit_args)
89+
return args
90+
91+
92+
# TODO: Add converting bounds from xarray to Iris and back
93+
def to_iris(dataarray):
94+
""" Convert a DataArray into a Iris Cube
95+
"""
96+
# Iris not a hard dependency
97+
import iris
98+
from iris.fileformats.netcdf import parse_cell_methods
99+
from xarray.core.pycompat import dask_array_type
100+
101+
dim_coords = []
102+
aux_coords = []
103+
104+
for coord_name in dataarray.coords:
105+
coord = encode(dataarray.coords[coord_name])
106+
coord_args = _get_iris_args(coord.attrs)
107+
coord_args['var_name'] = coord_name
108+
axis = None
109+
if coord.dims:
110+
axis = dataarray.get_axis_num(coord.dims)
111+
if coord_name in dataarray.dims:
112+
iris_coord = iris.coords.DimCoord(coord.values, **coord_args)
113+
dim_coords.append((iris_coord, axis))
114+
else:
115+
iris_coord = iris.coords.AuxCoord(coord.values, **coord_args)
116+
aux_coords.append((iris_coord, axis))
117+
118+
args = _get_iris_args(dataarray.attrs)
119+
args['var_name'] = dataarray.name
120+
args['dim_coords_and_dims'] = dim_coords
121+
args['aux_coords_and_dims'] = aux_coords
122+
if 'cell_methods' in dataarray.attrs:
123+
args['cell_methods'] = \
124+
parse_cell_methods(dataarray.attrs['cell_methods'])
125+
126+
# Create the right type of masked array (should be easier after #1769)
127+
if isinstance(dataarray.data, dask_array_type):
128+
from dask.array import ma as dask_ma
129+
masked_data = dask_ma.masked_invalid(dataarray)
130+
else:
131+
masked_data = np.ma.masked_invalid(dataarray)
132+
133+
cube = iris.cube.Cube(masked_data, **args)
134+
135+
return cube
136+
137+
138+
def _iris_obj_to_attrs(obj):
139+
""" Return a dictionary of attrs when given a Iris object
140+
"""
141+
attrs = {'standard_name': obj.standard_name,
142+
'long_name': obj.long_name}
143+
if obj.units.calendar:
144+
attrs['calendar'] = obj.units.calendar
145+
if obj.units.origin != '1':
146+
attrs['units'] = obj.units.origin
147+
attrs.update(obj.attributes)
148+
return dict((k, v) for k, v in attrs.items() if v is not None)
149+
150+
151+
def _iris_cell_methods_to_str(cell_methods_obj):
152+
""" Converts a Iris cell methods into a string
153+
"""
154+
cell_methods = []
155+
for cell_method in cell_methods_obj:
156+
names = ''.join(['{}: '.format(n) for n in cell_method.coord_names])
157+
intervals = ' '.join(['interval: {}'.format(interval)
158+
for interval in cell_method.intervals])
159+
comments = ' '.join(['comment: {}'.format(comment)
160+
for comment in cell_method.comments])
161+
extra = ' '.join([intervals, comments]).strip()
162+
if extra:
163+
extra = ' ({})'.format(extra)
164+
cell_methods.append(names + cell_method.method + extra)
165+
return ' '.join(cell_methods)
166+
167+
168+
def from_iris(cube):
169+
""" Convert a Iris cube into an DataArray
170+
"""
171+
import iris.exceptions
172+
from xarray.core.pycompat import dask_array_type
173+
174+
name = cube.var_name
175+
dims = []
176+
for i in range(cube.ndim):
177+
try:
178+
dim_coord = cube.coord(dim_coords=True, dimensions=(i,))
179+
dims.append(dim_coord.var_name)
180+
except iris.exceptions.CoordinateNotFoundError:
181+
dims.append("dim_{}".format(i))
182+
183+
coords = OrderedDict()
184+
185+
for coord in cube.coords():
186+
coord_attrs = _iris_obj_to_attrs(coord)
187+
coord_dims = [dims[i] for i in cube.coord_dims(coord)]
188+
if not coord.var_name:
189+
raise ValueError("Coordinate '{}' has no "
190+
"var_name attribute".format(coord.name()))
191+
if coord_dims:
192+
coords[coord.var_name] = (coord_dims, coord.points, coord_attrs)
193+
else:
194+
coords[coord.var_name] = ((),
195+
np.asscalar(coord.points), coord_attrs)
196+
197+
array_attrs = _iris_obj_to_attrs(cube)
198+
cell_methods = _iris_cell_methods_to_str(cube.cell_methods)
199+
if cell_methods:
200+
array_attrs['cell_methods'] = cell_methods
201+
202+
# Deal with iris 1.* and 2.*
203+
cube_data = cube.core_data() if hasattr(cube, 'core_data') else cube.data
204+
205+
# Deal with dask and numpy masked arrays
206+
if isinstance(cube_data, dask_array_type):
207+
from dask.array import ma as dask_ma
208+
filled_data = dask_ma.filled(cube_data, get_fill_value(cube.dtype))
209+
elif isinstance(cube_data, np.ma.MaskedArray):
210+
filled_data = np.ma.filled(cube_data, get_fill_value(cube.dtype))
211+
else:
212+
filled_data = cube_data
213+
214+
dataarray = DataArray(filled_data, coords=coords, name=name,
215+
attrs=array_attrs, dims=dims)
216+
decoded_ds = decode_cf(dataarray._to_temp_dataset())
217+
return dataarray._from_temp_dataset(decoded_ds)

xarray/core/dataarray.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1540,6 +1540,19 @@ def from_cdms2(cls, variable):
15401540
from ..convert import from_cdms2
15411541
return from_cdms2(variable)
15421542

1543+
def to_iris(self):
1544+
"""Convert this array into a iris.cube.Cube
1545+
"""
1546+
from ..convert import to_iris
1547+
return to_iris(self)
1548+
1549+
@classmethod
1550+
def from_iris(cls, cube):
1551+
"""Convert a iris.cube.Cube into an xarray.DataArray
1552+
"""
1553+
from ..convert import from_iris
1554+
return from_iris(cube)
1555+
15431556
def _all_compat(self, other, compat_str):
15441557
"""Helper function for equals and identical"""
15451558

0 commit comments

Comments
 (0)