You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/dask.rst
+15-3Lines changed: 15 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,13 @@ Parallel computing with Dask
5
5
6
6
xarray integrates with `Dask <http://dask.pydata.org/>`__ to support parallel
7
7
computations and streaming computation on datasets that don't fit into memory.
8
-
9
8
Currently, Dask is an entirely optional feature for xarray. However, the
10
9
benefits of using Dask are sufficiently strong that Dask may become a required
11
10
dependency in a future version of xarray.
12
11
13
12
For a full example of how to use xarray's Dask integration, read the
14
-
`blog post introducing xarray and Dask`_.
13
+
`blog post introducing xarray and Dask`_. More up-to-date examples
14
+
may be found at the `Pangeo project's use-cases <http://pangeo.io/use_cases/index.html>`_.
15
15
16
16
.. _blog post introducing xarray and Dask: http://stephanhoyer.com/2015/06/11/xray-dask-out-of-core-labeled-arrays/
17
17
@@ -396,4 +396,16 @@ With analysis pipelines involving both spatial subsetting and temporal resamplin
396
396
397
397
2. Save intermediate results to disk as a netCDF files (using ``to_netcdf()``) and then load them again with ``open_dataset()`` for further computations. For example, if subtracting temporal mean from a dataset, save the temporal mean to disk before subtracting. Again, in theory, Dask should be able to do the computation in a streaming fashion, but in practice this is a fail case for the Dask scheduler, because it tries to keep every chunk of an array that it computes in memory. (See `Dask issue #874 <https://github.com/dask/dask/issues/874>`_)
398
398
399
-
3. Specify smaller chunks across space when using ``open_mfdataset()`` (e.g., ``chunks={'latitude': 10, 'longitude': 10}``). This makes spatial subsetting easier, because there's no risk you will load chunks of data referring to different chunks (probably not necessary if you follow suggestion 1).
399
+
3. Specify smaller chunks across space when using :py:meth:`~xarray.open_mfdataset` (e.g., ``chunks={'latitude': 10, 'longitude': 10}``). This makes spatial subsetting easier, because there's no risk you will load chunks of data referring to different chunks (probably not necessary if you follow suggestion 1).
400
+
401
+
4. Using the h5netcdf package by passing ``engine='h5netcdf'`` to :py:meth:`~xarray.open_mfdataset`
402
+
can be quicker than the default ``engine='netcdf4'`` that uses the netCDF4 package.
403
+
404
+
5. Some dask-specific tips may be found `here <https://docs.dask.org/en/latest/array-best-practices.html>`_.
405
+
406
+
6. The dask `diagnostics <https://docs.dask.org/en/latest/understanding-performance.html>`_ can be
407
+
useful in identifying performance bottlenecks.
408
+
409
+
7. Installing the optional `bottleneck <https://github.com/kwgoodman/bottleneck>`_ library
410
+
will result in greatly reduced memory usage when using :py:meth:`~xarray.Dataset.rolling`
0 commit comments