Skip to content

Commit 5ae4e08

Browse files
author
Daniel Rothenberg
authored
Merge branch 'master' into dt_accessor
2 parents 74f8756 + a189e8a commit 5ae4e08

32 files changed

+1014
-571
lines changed

conftest.py

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,5 +5,3 @@ def pytest_addoption(parser):
55
"""Add command-line flags for pytest."""
66
parser.addoption("--run-flaky", action="store_true",
77
help="runs flaky tests")
8-
parser.addoption("--skip-slow", action="store_true",
9-
help="skips slow tests")

doc/api.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,7 @@ Dataset contents
8080
Dataset.merge
8181
Dataset.rename
8282
Dataset.swap_dims
83+
Dataset.expand_dims
8384
Dataset.drop
8485
Dataset.set_coords
8586
Dataset.reset_coords
@@ -223,6 +224,7 @@ DataArray contents
223224
DataArray.pipe
224225
DataArray.rename
225226
DataArray.swap_dims
227+
DataArray.expand_dims
226228
DataArray.drop
227229
DataArray.reset_coords
228230
DataArray.copy
@@ -422,6 +424,7 @@ Dataset methods
422424
Dataset.from_dict
423425
Dataset.close
424426
Dataset.compute
427+
Dataset.persist
425428
Dataset.load
426429
Dataset.chunk
427430
Dataset.filter_by_attrs
@@ -447,6 +450,7 @@ DataArray methods
447450
DataArray.from_cdms2
448451
DataArray.from_dict
449452
DataArray.compute
453+
DataArray.persist
450454
DataArray.load
451455
DataArray.chunk
452456

doc/dask.rst

Lines changed: 18 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -144,12 +144,23 @@ Explicit conversion by wrapping a DataArray with ``np.asarray`` also works:
144144
[ 1.337e+00, -1.531e+00, ..., 8.726e-01, -1.538e+00],
145145
...
146146

147-
With the current version of dask, there is no automatic alignment of chunks when
148-
performing operations between dask arrays with different chunk sizes. If your
149-
computation involves multiple dask arrays with different chunks, you may need to
150-
explicitly rechunk each array to ensure compatibility. With xarray, both
151-
converting data to a dask arrays and converting the chunk sizes of dask arrays
152-
is done with the :py:meth:`~xarray.Dataset.chunk` method:
147+
Alternatively you can load the data into memory but keep the arrays as
148+
dask arrays using the `~xarray.Dataset.persist` method:
149+
150+
.. ipython::
151+
152+
ds = ds.persist()
153+
154+
This is particularly useful when using a distributed cluster because the data
155+
will be loaded into distributed memory across your machines and be much faster
156+
to use than reading repeatedly from disk. Warning that on a single machine
157+
this operation will try to load all of your data into memory. You should make
158+
sure that your dataset is not larger than available memory.
159+
160+
For performance you may wish to consider chunk sizes. The correct choice of
161+
chunk size depends both on your data and on the operations you want to perform.
162+
With xarray, both converting data to a dask arrays and converting the chunk
163+
sizes of dask arrays is done with the :py:meth:`~xarray.Dataset.chunk` method:
153164

154165
.. ipython:: python
155166
:suppress:
@@ -226,6 +237,7 @@ larger chunksizes.
226237
import os
227238
os.remove('example-data.nc')
228239
240+
229241
Optimization Tips
230242
-----------------
231243

doc/faq.rst

Lines changed: 13 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -140,18 +140,24 @@ If you are using xarray and would like to cite it in academic publication, we
140140
would certainly appreciate it. We recommend two citations.
141141

142142
1. At a minimum, we recommend citing the xarray overview journal article,
143-
submitted to the Journal of Open Research Software.
143+
published in the Journal of Open Research Software.
144144

145-
- Hoyer, S., Hamman, J. (In revision). Xarray: N-D labeled arrays and
146-
datasets in Python. Journal of Open Research Software.
145+
- Hoyer, S. & Hamman, J., (2017). xarray: N-D labeled Arrays and
146+
Datasets in Python. Journal of Open Research Software. 5(1), p.10.
147+
DOI: http://doi.org/10.5334/jors.148
147148

148149
Here’s an example of a BibTeX entry::
149150

150151
@article{hoyer2017xarray,
151-
title = {xarray: {N-D} labeled arrays and datasets in {Python}},
152-
author = {Hoyer, S. and J. Hamman},
153-
journal = {In revision, J. Open Res. Software},
154-
year = {2017}
152+
title = {xarray: {N-D} labeled arrays and datasets in {Python}},
153+
author = {Hoyer, S. and J. Hamman},
154+
journal = {Journal of Open Research Software},
155+
volume = {5},
156+
number = {1},
157+
year = {2017},
158+
publisher = {Ubiquity Press},
159+
doi = {10.5334/jors.148},
160+
url = {http://doi.org/10.5334/jors.148}
155161
}
156162

157163
2. You may also want to cite a specific version of the xarray package. We

doc/index.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,11 +57,15 @@ Documentation
5757
See also
5858
--------
5959

60+
- Stephan Hoyer and Joe Hamman's `Journal of Open Research Software paper`_ describing the xarray project.
61+
- The `UW eScience Institute's Geohackweek`_ tutorial on xarray for geospatial data scientists.
6062
- Stephan Hoyer's `SciPy2015 talk`_ introducing xarray to a general audience.
6163
- Stephan Hoyer's `2015 Unidata Users Workshop talk`_ and `tutorial`_ (`with answers`_) introducing
6264
xarray to users familiar with netCDF.
6365
- `Nicolas Fauchereau's tutorial`_ on xarray for netCDF users.
6466

67+
.. _Journal of Open Research Software paper: http://doi.org/10.5334/jors.148
68+
.. _UW eScience Institute's Geohackweek : https://geohackweek.github.io/nDarrays/
6569
.. _SciPy2015 talk: https://www.youtube.com/watch?v=X0pAhJgySxk
6670
.. _2015 Unidata Users Workshop talk: https://www.youtube.com/watch?v=J9ypQOnt5l8
6771
.. _tutorial: https://github.com/Unidata/unidata-users-workshop/blob/master/notebooks/xray-tutorial.ipynb

doc/reshaping.rst

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,28 @@ on a :py:class:`~xarray.Dataset`, use :py:meth:`~xarray.DataArray.transpose` or
2727
ds.transpose('y', 'z', 'x')
2828
ds.T
2929
30+
Expand and squeeze dimensions
31+
-----------------------------
32+
33+
To expand a :py:class:`~xarray.DataArray` or all
34+
variables on a :py:class:`~xarray.Dataset` along a new dimension,
35+
use :py:meth:`~xarray.DataArray.expand_dims`
36+
37+
.. ipython:: python
38+
39+
expanded = ds.expand_dims('w')
40+
expanded
41+
42+
This method attaches a new dimension with size 1 to all data variables.
43+
44+
To remove such a size-1 dimension from the :py:class:`~xarray.DataArray`
45+
or :py:class:`~xarray.Dataset`,
46+
use :py:meth:`~xarray.DataArray.squeeze`
47+
48+
.. ipython:: python
49+
50+
expanded.squeeze('w')
51+
3052
Converting between datasets and arrays
3153
--------------------------------------
3254

doc/whats-new.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -22,17 +22,17 @@ v0.9.3 (unreleased)
2222
Enhancements
2323
~~~~~~~~~~~~
2424

25-
- Add ``.dt`` accessor to DataArrays for computing datetime-like properties
26-
for the values they contain, similar to ``pandas.Series`` (:issue:`358`).
27-
By `Daniel Rothenberg <https://github.com/darothen>`_.
28-
2925
- Add ``.persist()`` method to Datasets and DataArrays to enable persisting
3026
data in distributed memory (:issue:`1344`).
3127
By `Matthew Rocklin <https://github.com/mrocklin>`_.
3228

3329
- New :py:meth:`~xarray.DataArray.expand_dims` method for ``DataArray`` and
3430
``Dataset`` (:issue:`1326`).
3531
By `Keisuke Fujii <https://github.com/fujiisoup>`_.
32+
33+
- Add ``.dt`` accessor to DataArrays for computing datetime-like properties
34+
for the values they contain, similar to ``pandas.Series`` (:issue:`358`).
35+
By `Daniel Rothenberg <https://github.com/darothen>`_.
3636

3737
Bug fixes
3838
~~~~~~~~~

xarray/backends/api.py

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -318,12 +318,8 @@ def maybe_decode_store(store, lock=False):
318318
return maybe_decode_store(store)
319319

320320

321-
def open_dataarray(filename_or_obj, group=None, decode_cf=True,
322-
mask_and_scale=True, decode_times=True,
323-
concat_characters=True, decode_coords=True, engine=None,
324-
chunks=None, lock=None, cache=None, drop_variables=None):
325-
"""
326-
Opens an DataArray from a netCDF file containing a single data variable.
321+
def open_dataarray(*args, **kwargs):
322+
"""Open an DataArray from a netCDF file containing a single data variable.
327323
328324
This is designed to read netCDF files with only one data variable. If
329325
multiple variables are present then a ValueError is raised.
@@ -353,6 +349,10 @@ def open_dataarray(filename_or_obj, group=None, decode_cf=True,
353349
decode_times : bool, optional
354350
If True, decode times encoded in the standard NetCDF datetime format
355351
into datetime objects. Otherwise, leave them encoded as numbers.
352+
autoclose : bool, optional
353+
If True, automatically close files to avoid OS Error of too many files
354+
being open. However, this option doesn't work with streams, e.g.,
355+
BytesIO.
356356
concat_characters : bool, optional
357357
If True, concatenate along the last dimension of character arrays to
358358
form string arrays. Dimensions will only be concatenated over (and
@@ -400,10 +400,7 @@ def open_dataarray(filename_or_obj, group=None, decode_cf=True,
400400
--------
401401
open_dataset
402402
"""
403-
dataset = open_dataset(filename_or_obj, group, decode_cf,
404-
mask_and_scale, decode_times,
405-
concat_characters, decode_coords, engine,
406-
chunks, lock, cache, drop_variables)
403+
dataset = open_dataset(*args, **kwargs)
407404

408405
if len(dataset.data_vars) != 1:
409406
raise ValueError('Given file dataset contains more than one data '
@@ -536,7 +533,7 @@ def open_mfdataset(paths, chunks=None, concat_dim=_CONCAT_DIM_DEFAULT,
536533
'h5netcdf': backends.H5NetCDFStore}
537534

538535

539-
def to_netcdf(dataset, path=None, mode='w', format=None, group=None,
536+
def to_netcdf(dataset, path_or_file=None, mode='w', format=None, group=None,
540537
engine=None, writer=None, encoding=None, unlimited_dims=None):
541538
"""This function creates an appropriate datastore for writing a dataset to
542539
disk as a netCDF file
@@ -547,18 +544,19 @@ def to_netcdf(dataset, path=None, mode='w', format=None, group=None,
547544
"""
548545
if encoding is None:
549546
encoding = {}
550-
if path is None:
551-
path = BytesIO()
547+
if path_or_file is None:
552548
if engine is None:
553549
engine = 'scipy'
554-
elif engine is not None:
550+
elif engine != 'scipy':
555551
raise ValueError('invalid engine for creating bytes with '
556552
'to_netcdf: %r. Only the default engine '
557553
"or engine='scipy' is supported" % engine)
558-
else:
554+
elif isinstance(path_or_file, basestring):
559555
if engine is None:
560-
engine = _get_default_engine(path)
561-
path = _normalize_path(path)
556+
engine = _get_default_engine(path_or_file)
557+
path_or_file = _normalize_path(path_or_file)
558+
else: # file-like object
559+
engine = 'scipy'
562560

563561
# validate Dataset keys, DataArray names, and attr keys/values
564562
_validate_dataset_names(dataset)
@@ -575,17 +573,18 @@ def to_netcdf(dataset, path=None, mode='w', format=None, group=None,
575573
# if a writer is provided, store asynchronously
576574
sync = writer is None
577575

578-
store = store_cls(path, mode, format, group, writer)
576+
target = path_or_file if path_or_file is not None else BytesIO()
577+
store = store_cls(target, mode, format, group, writer)
579578

580579
if unlimited_dims is None:
581580
unlimited_dims = dataset.encoding.get('unlimited_dims', None)
582581
try:
583582
dataset.dump_to_store(store, sync=sync, encoding=encoding,
584583
unlimited_dims=unlimited_dims)
585-
if isinstance(path, BytesIO):
586-
return path.getvalue()
584+
if path_or_file is None:
585+
return target.getvalue()
587586
finally:
588-
if sync:
587+
if sync and isinstance(path_or_file, basestring):
589588
store.close()
590589

591590
if not sync:

xarray/backends/netcdf3.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
import numpy as np
77

88
from .. import conventions, Variable
9-
from ..core import ops
9+
from ..core import duck_array_ops
1010
from ..core.pycompat import basestring, unicode_type, OrderedDict
1111

1212

@@ -45,7 +45,7 @@ def coerce_nc3_dtype(arr):
4545
if ((('int' in dtype or 'U' in dtype) and
4646
not (cast_arr == arr).all()) or
4747
('float' in dtype and
48-
not ops.allclose_or_equiv(cast_arr, arr))):
48+
not duck_array_ops.allclose_or_equiv(cast_arr, arr))):
4949
raise ValueError('could not safely cast array from dtype %s to %s'
5050
% (dtype, new_dtype))
5151
arr = cast_arr

xarray/conventions.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111
from collections import defaultdict
1212
from pandas.tslib import OutOfBoundsDatetime
1313

14-
from .core import indexing, ops, utils
14+
from .core import duck_array_ops, indexing, ops, utils
1515
from .core.formatting import format_timestamp, first_n_items, last_item
1616
from .core.variable import as_variable, Variable
1717
from .core.pycompat import iteritems, OrderedDict, PY3, basestring
@@ -632,7 +632,7 @@ def maybe_encode_dtype(var, name=None):
632632
'point data as an integer dtype without '
633633
'any _FillValue to use for NaNs' % name,
634634
RuntimeWarning, stacklevel=3)
635-
data = ops.around(data)[...]
635+
data = duck_array_ops.around(data)[...]
636636
if dtype == 'S1' and data.dtype != 'S1':
637637
data = string_to_char(np.asarray(data, 'S'))
638638
dims = dims + ('string%s' % data.shape[-1],)

0 commit comments

Comments
 (0)