Skip to content

Commit 1bbf8bf

Browse files
committed
Little more dask.
1 parent 051f6ba commit 1bbf8bf

File tree

1 file changed

+13
-8
lines changed

1 file changed

+13
-8
lines changed

doc/dask.rst

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -37,13 +37,14 @@ which allows Dask to take full advantage of multiple processors available on
3737
most modern computers.
3838

3939
For more details on Dask, read `its documentation <http://dask.pydata.org/>`__.
40+
Note that xarray only makes use of ``dask.array`` and ``dask.delayed``.
4041

4142
.. _dask.io:
4243

4344
Reading and writing data
4445
------------------------
4546

46-
The usual way to create a dataset filled with Dask arrays is to load the
47+
The usual way to create a ``Dataset`` filled with Dask arrays is to load the
4748
data from a netCDF file or files. You can do this by supplying a ``chunks``
4849
argument to :py:func:`~xarray.open_dataset` or using the
4950
:py:func:`~xarray.open_mfdataset` function.
@@ -71,8 +72,8 @@ argument to :py:func:`~xarray.open_dataset` or using the
7172
7273
In this example ``latitude`` and ``longitude`` do not appear in the ``chunks``
7374
dict, so only one chunk will be used along those dimensions. It is also
74-
entirely equivalent to opening a dataset using ``open_dataset`` and then
75-
chunking the data using the ``chunk`` method, e.g.,
75+
entirely equivalent to opening a dataset using :py:meth:`~xarray.open_dataset`
76+
and then chunking the data using the ``chunk`` method, e.g.,
7677
``xr.open_dataset('example-data.nc').chunk({'time': 10})``.
7778

7879
To open multiple files simultaneously, use :py:func:`~xarray.open_mfdataset`::
@@ -81,11 +82,14 @@ To open multiple files simultaneously, use :py:func:`~xarray.open_mfdataset`::
8182

8283
This function will automatically concatenate and merge dataset into one in
8384
the simple cases that it understands (see :py:func:`~xarray.auto_combine`
84-
for the full disclaimer). By default, ``open_mfdataset`` will chunk each
85+
for the full disclaimer). By default, :py:meth:`~xarray.open_mfdataset` will chunk each
8586
netCDF file into a single Dask array; again, supply the ``chunks`` argument to
8687
control the size of the resulting Dask arrays. In more complex cases, you can
87-
open each file individually using ``open_dataset`` and merge the result, as
88-
described in :ref:`combining data`.
88+
open each file individually using :py:meth:`~xarray.open_dataset` and merge the result, as
89+
described in :ref:`combining data`. If you have a distributed cluster running,
90+
passing the keyword argument ``parallel=True`` to :py:meth:`~xarray.open_mfdataset`
91+
will speed up the reading of large multi-file datasets by executing those read tasks
92+
in parallel using ``dask.delayed``.
8993

9094
You'll notice that printing a dataset still shows a preview of array values,
9195
even if they are actually Dask arrays. We can do this quickly with Dask because
@@ -105,7 +109,7 @@ usual way.
105109
ds.to_netcdf('manipulated-example-data.nc')
106110
107111
By setting the ``compute`` argument to ``False``, :py:meth:`~xarray.Dataset.to_netcdf`
108-
will return a Dask delayed object that can be computed later.
112+
will return a ``dask.delayed`` object that can be computed later.
109113

110114
.. ipython:: python
111115
@@ -146,7 +150,7 @@ enable label based indexing, xarray will automatically load coordinate labels
146150
into memory.
147151

148152
The easiest way to convert an xarray data structure from lazy Dask arrays into
149-
eager, in-memory NumPy arrays is to use the :py:meth:`~xarray.Dataset.load` method:
153+
*eager*, in-memory NumPy arrays is to use the :py:meth:`~xarray.Dataset.load` method:
150154

151155
.. ipython:: python
152156
@@ -189,6 +193,7 @@ across your machines and be much faster to use than reading repeatedly from
189193
disk.
190194

191195
.. warning::
196+
192197
On a single machine :py:meth:`~xarray.Dataset.persist` will try to load all of
193198
your data into memory. You should make sure that your dataset is not larger than
194199
available memory.

0 commit comments

Comments
 (0)