@@ -37,13 +37,14 @@ which allows Dask to take full advantage of multiple processors available on
37
37
most modern computers.
38
38
39
39
For more details on Dask, read `its documentation <http://dask.pydata.org/ >`__.
40
+ Note that xarray only makes use of ``dask.array `` and ``dask.delayed ``.
40
41
41
42
.. _dask.io :
42
43
43
44
Reading and writing data
44
45
------------------------
45
46
46
- The usual way to create a dataset filled with Dask arrays is to load the
47
+ The usual way to create a `` Dataset `` filled with Dask arrays is to load the
47
48
data from a netCDF file or files. You can do this by supplying a ``chunks ``
48
49
argument to :py:func: `~xarray.open_dataset ` or using the
49
50
:py:func: `~xarray.open_mfdataset ` function.
@@ -71,8 +72,8 @@ argument to :py:func:`~xarray.open_dataset` or using the
71
72
72
73
In this example ``latitude `` and ``longitude `` do not appear in the ``chunks ``
73
74
dict, so only one chunk will be used along those dimensions. It is also
74
- entirely equivalent to opening a dataset using `` open_dataset `` and then
75
- chunking the data using the ``chunk `` method, e.g.,
75
+ entirely equivalent to opening a dataset using :py:meth: ` ~xarray. open_dataset `
76
+ and then chunking the data using the ``chunk `` method, e.g.,
76
77
``xr.open_dataset('example-data.nc').chunk({'time': 10}) ``.
77
78
78
79
To open multiple files simultaneously, use :py:func: `~xarray.open_mfdataset `::
@@ -81,11 +82,14 @@ To open multiple files simultaneously, use :py:func:`~xarray.open_mfdataset`::
81
82
82
83
This function will automatically concatenate and merge dataset into one in
83
84
the simple cases that it understands (see :py:func: `~xarray.auto_combine `
84
- for the full disclaimer). By default, `` open_mfdataset ` ` will chunk each
85
+ for the full disclaimer). By default, :py:meth: ` ~xarray. open_mfdataset ` will chunk each
85
86
netCDF file into a single Dask array; again, supply the ``chunks `` argument to
86
87
control the size of the resulting Dask arrays. In more complex cases, you can
87
- open each file individually using ``open_dataset `` and merge the result, as
88
- described in :ref: `combining data `.
88
+ open each file individually using :py:meth: `~xarray.open_dataset ` and merge the result, as
89
+ described in :ref: `combining data `. If you have a distributed cluster running,
90
+ passing the keyword argument ``parallel=True `` to :py:meth: `~xarray.open_mfdataset `
91
+ will speed up the reading of large multi-file datasets by executing those read tasks
92
+ in parallel using ``dask.delayed ``.
89
93
90
94
You'll notice that printing a dataset still shows a preview of array values,
91
95
even if they are actually Dask arrays. We can do this quickly with Dask because
@@ -105,7 +109,7 @@ usual way.
105
109
ds.to_netcdf(' manipulated-example-data.nc' )
106
110
107
111
By setting the ``compute `` argument to ``False ``, :py:meth: `~xarray.Dataset.to_netcdf `
108
- will return a Dask delayed object that can be computed later.
112
+ will return a `` dask. delayed`` object that can be computed later.
109
113
110
114
.. ipython :: python
111
115
@@ -146,7 +150,7 @@ enable label based indexing, xarray will automatically load coordinate labels
146
150
into memory.
147
151
148
152
The easiest way to convert an xarray data structure from lazy Dask arrays into
149
- eager, in-memory NumPy arrays is to use the :py:meth: `~xarray.Dataset.load ` method:
153
+ * eager * , in-memory NumPy arrays is to use the :py:meth: `~xarray.Dataset.load ` method:
150
154
151
155
.. ipython :: python
152
156
@@ -189,6 +193,7 @@ across your machines and be much faster to use than reading repeatedly from
189
193
disk.
190
194
191
195
.. warning ::
196
+
192
197
On a single machine :py:meth: `~xarray.Dataset.persist ` will try to load all of
193
198
your data into memory. You should make sure that your dataset is not larger than
194
199
available memory.
0 commit comments