Skip to content

Commit 23ddc75

Browse files
authored
Add examples to docs (#630)
1 parent 8781b5f commit 23ddc75

File tree

15 files changed

+163
-105
lines changed

15 files changed

+163
-105
lines changed

docs/configuration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ All arrays in any given computation must share the same `spec` instance.
4444

4545
A YAML file is a good way to encapsulate the configuration in a single file that lives outside the Python program.
4646
It's a useful way to package up the settings for running using a particular executor, so it can be reused.
47-
The Cubed [examples](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) use YAML files for this reason.
47+
The Cubed [examples](examples/index.md) use YAML files for this reason.
4848

4949
```yaml
5050
spec:
@@ -166,7 +166,7 @@ Note that there is currently no way to set a timeout for the Dask executor.
166166
| Property | Default | Description |
167167
|------------------------------|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
168168
| `retries` | 2 | The number of times to retry a task if it fails. |
169-
| `timeout` | `None` | Tasks that take longer than the timeout will be automatically killed and retried. Defaults to the timeout specified when [deploying the lithops runtime image](https://lithops-cloud.github.io/docs/source/cli.html#lithops-runtime-deploy-runtime-name). This is 180 seconds in the [examples](https://github.com/cubed-dev/cubed/blob/main/examples/README.md). |
169+
| `timeout` | `None` | Tasks that take longer than the timeout will be automatically killed and retried. Defaults to the timeout specified when [deploying the lithops runtime image](https://lithops-cloud.github.io/docs/source/cli.html#lithops-runtime-deploy-runtime-name). This is 180 seconds in the [examples](https://github.com/cubed-dev/cubed/blob/main/examples/lithops/aws/README.md). |
170170
| `use_backups` | | Whether to use backup tasks for mitigating stragglers. Defaults to `True` only if `work_dir` is a filesystem supporting atomic writes (currently a cloud store like S3 or GCS). |
171171
| `compute_arrays_in_parallel` | `False` | Whether arrays are computed one at a time or in parallel. |
172172
| Other properties | N/A | Other properties will be passed as keyword arguments to the [`lithops.executors.FunctionExecutor`](https://lithops-cloud.github.io/docs/source/api_futures.html#lithops.executors.FunctionExecutor) constructor. |

docs/examples/basic-array-ops.md

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,85 @@
1+
# Basic array operations
2+
3+
The following examples show how to run a few basic Array API operations on Cubed arrays.
4+
5+
## Adding two small arrays
6+
7+
The first example adds two small 4x4 arrays together, and is useful for checking that the runtime is working.
8+
9+
```{eval-rst}
10+
.. literalinclude:: ../../examples/add-asarray.py
11+
```
12+
13+
Paste the code into a file called `add-asarray.py`, or [download](https://github.com/cubed-dev/cubed/blob/main/examples/add-asarray.py) from GitHub, then run with:
14+
15+
```shell
16+
python add-asarray.py
17+
```
18+
19+
If successful it will print a 4x4 array:
20+
21+
```
22+
[[ 2 4 6 8]
23+
[10 12 14 16]
24+
[18 20 22 24]
25+
[26 28 30 32]]
26+
```
27+
28+
## Adding two larger arrays
29+
30+
The next example generates two random 20GB arrays and then adds them together.
31+
32+
```{eval-rst}
33+
.. literalinclude:: ../../examples/add-random.py
34+
```
35+
36+
Paste the code into a file called `add-random.py`, or [download](https://github.com/cubed-dev/cubed/blob/main/examples/add-random.py) from GitHub, then run with:
37+
38+
```shell
39+
python add-random.py
40+
```
41+
42+
This example demonstrates how we can use callbacks to gather information about the computation.
43+
44+
- `RichProgressBar` shows a progress bar for the computation as it is running.
45+
- `TimelineVisualizationCallback` produces a plot (after the computation has completed) showing the timeline of events in the task lifecycle.
46+
- `HistoryCallback` produces various stats about the computation once it has completed.
47+
48+
The plots and stats are written in the `history` directory in a directory with a timestamp. You can open the latest plot with
49+
50+
```shell
51+
open $(ls -d history/compute-* | tail -1)/timeline.svg
52+
```
53+
54+
## Matmul
55+
56+
The next example generates two random 5GB arrays and then multiplies them together. This is a more intensive computation than addition, and will take a few minutes to run locally.
57+
58+
```{eval-rst}
59+
.. literalinclude:: ../../examples/matmul-random.py
60+
```
61+
62+
Paste the code into a file called `matmul-random.py`, or [download](https://github.com/cubed-dev/cubed/blob/main/examples/matmul-random.py) from GitHub, then run with:
63+
64+
```shell
65+
python matmul-random.py
66+
```
67+
68+
## Trying different executors
69+
70+
You can run these scripts using different executors by setting environment variables to control the Cubed configuration.
71+
72+
For example, this will use the `processes` executor to run the example:
73+
74+
```shell
75+
CUBED_SPEC__EXECUTOR_NAME=processes python add-random.py
76+
```
77+
78+
For cloud executors, it's usually best to put all of the configuration in one YAML file, and set the `CUBED_CONFIG` environment variable to point to it:
79+
80+
```shell
81+
export CUBED_CONFIG=/path/to/lithops/aws/cubed.yaml
82+
python add-random.py
83+
```
84+
85+
You can read more about how [configuration](../configuration.md) works in Cubed in general, and detailed steps to run on a particular cloud service [here](#cloud-set-up).

docs/examples/how-to-run.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# How to run
2+
3+
## Local machine
4+
5+
All the examples can be run on your laptop, so you can try them out in a familar environment before moving to the cloud.
6+
No extra set up is necessary in this case.
7+
8+
(cloud-set-up)=
9+
## Cloud set up
10+
11+
If you want to run using a cloud executor, first read <project:#which-cloud-service>
12+
13+
Then follow the instructions for your chosen executor runtime from the table below. They assume that you have cloned the Cubed GitHub repository locally so that you have access to files needed for setting up the cloud executor.
14+
15+
```shell
16+
git clone https://github.com/cubed-dev/cubed
17+
cd cubed/examples
18+
cd lithops/aws # or whichever executor/cloud combination you are using
19+
```
20+
21+
| Executor | Cloud | Set up instructions |
22+
|-----------|--------|------------------------------------------------|
23+
| Lithops | AWS | [lithops/aws/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/lithops/aws/README.md) |
24+
| | Google | [lithops/gcp/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/lithops/gcp/README.md) |
25+
| Modal | AWS | [modal/aws/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/modal/aws/README.md) |
26+
| | Google | [modal/gcp/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/modal/gcp/README.md) |
27+
| Coiled | AWS | [coiled/aws/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/coiled/aws/README.md) |
28+
| Beam | Google | [dataflow/README.md](https://github.com/cubed-dev/cubed/blob/main/examples/dataflow/README.md) |

docs/examples/index.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# Examples
2+
3+
Various examples demonstrating what you can do with Cubed.
4+
5+
```{toctree}
6+
---
7+
maxdepth: 2
8+
---
9+
how-to-run
10+
basic-array-ops
11+
pangeo
12+
```

docs/examples/pangeo.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
# Pangeo
2+
3+
## Notebooks
4+
5+
The following example notebooks demonstrate the use of Cubed with Xarray to tackle some challenging Pangeo workloads:
6+
7+
1. [Pangeo Vorticity Workload](https://github.com/cubed-dev/cubed/blob/main/examples/pangeo-1-vorticity.ipynb)
8+
2. [Pangeo Quadratic Means Workload](https://github.com/cubed-dev/cubed/blob/main/examples/pangeo-2-quadratic-means.ipynb)
9+
3. [Pangeo Transformed Eulerian Mean Workload](https://github.com/cubed-dev/cubed/blob/main/examples/pangeo-3-tem.ipynb)
10+
4. [Pangeo Climatological Anomalies Workload](https://github.com/cubed-dev/cubed/blob/main/examples/pangeo-4-climatological-anomalies.ipynb)
11+
12+
## Running the notebook examples
13+
14+
Before running these notebook examples, you will need to install some additional dependencies (besides Cubed).
15+
16+
`conda install rich pydot flox cubed-xarray`
17+
18+
`cubed-xarray` is necessary to wrap Cubed arrays as Xarray DataArrays or Xarray Datasets.
19+
`flox` is for supporting efficient groupby operations in Xarray.
20+
`pydot` allows plotting the Cubed execution plan.
21+
`rich` is for showing progress of array operations within callbacks applied to Cubed plan operations.

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Cubed is horizontally scalable and stateless, and can scale to multi-TB datasets
3131
:caption: For users
3232
getting-started
3333
user-guide/index
34-
Examples <https://github.com/tomwhite/cubed/tree/main/examples/README.md>
34+
examples/index
3535
api
3636
array-api
3737
configuration

docs/user-guide/diagnostics.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ The timeline callback will write a graphic `timeline.svg` to a directory with th
9191
```
9292

9393
### Examples in use
94-
See the [examples](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) for more information about how to use them.
94+
See the [examples](../examples/index.md) for more information about how to use them.
9595

9696
## Memray
9797

docs/user-guide/executors.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ The `processes` executor also runs on a single machine, and uses all the cores o
1212

1313
There is a third local executor called `single-threaded` that runs tasks sequentially in a single thread, and is intended for testing on small amounts of data.
1414

15+
(which-cloud-service)=
1516
## Which cloud service executor should I use?
1617

1718
When it comes to scaling out, there are a number of executors that work in the cloud.
@@ -39,4 +40,4 @@ spec = cubed.Spec(
3940
)
4041
```
4142

42-
A default spec may also be configured using a YAML file. The [examples](https://github.com/cubed-dev/cubed/blob/main/examples/README.md) show this in more detail for all of the executors described above.
43+
A default spec may also be configured using a YAML file. The [examples](#cloud-set-up) show this in more detail for all of the executors described above.

examples/README.md

Lines changed: 2 additions & 89 deletions
Original file line numberDiff line numberDiff line change
@@ -1,92 +1,5 @@
11
# Examples
22

3-
## Running on a local machine
3+
This directory contains Cubed examples in the form of Python scripts and Jupyter notebooks. There are also instructions for setting up Cubed executors to run on various cloud services.
44

5-
The `processes` executor is the recommended executor for running on a single machine, since it can use all the cores on the machine.
6-
7-
## Which cloud service executor should I use?
8-
9-
When it comes to scaling out, there are a number of executors that work in the cloud.
10-
11-
[**Lithops**](https://lithops-cloud.github.io/) is the executor we recommend for most users, since it has had the most testing so far (~1000 workers).
12-
If your data is in Amazon S3 then use Lithops with AWS Lambda, and if it's in GCS use Lithops with Google Cloud Functions. You have to build a runtime environment as a part of the setting up process.
13-
14-
[**Modal**](https://modal.com/) is very easy to get started with because it handles building a runtime environment for you automatically (note that it requires that you [sign up](https://modal.com/signup) for a free account). **At the time of writing, Modal does not guarantee that functions run in any particular cloud region, so it is not currently recommended that you run large computations since excessive data transfer fees are likely.**
15-
16-
[**Coiled**](https://www.coiled.io/) is also easy to get started with ([sign up](https://cloud.coiled.io/signup)). It uses [Coiled Functions](https://docs.coiled.io/user_guide/usage/functions/index.html) and has a 1-2 minute overhead to start a cluster.
17-
18-
[**Google Cloud Dataflow**](https://cloud.google.com/dataflow) is relatively straightforward to get started with. It has the highest overhead for worker startup (minutes compared to seconds for Modal or Lithops), and although it has only been tested with ~20 workers, it is a mature service and therefore should be reliable for much larger computations.
19-
20-
## Set up
21-
22-
Follow the instructions for setting up Cubed to run on your executor runtime:
23-
24-
| Executor | Cloud | Set up instructions |
25-
|-----------|--------|------------------------------------------------|
26-
| Processes | N/A | `pip install 'cubed[diagnostics]'` |
27-
| Lithops | AWS | [lithops/aws/README.md](lithops/aws/README.md) |
28-
| | Google | [lithops/gcp/README.md](lithops/gcp/README.md) |
29-
| Modal | AWS | [modal/aws/README.md](modal/aws/README.md) |
30-
| | Google | [modal/gcp/README.md](modal/gcp/README.md) |
31-
| Coiled | AWS | [coiled/aws/README.md](coiled/aws/README.md) |
32-
| Beam | Google | [dataflow/README.md](dataflow/README.md) |
33-
34-
## Examples
35-
36-
The `add-asarray.py` script is a small example that adds two small 4x4 arrays together, and is useful for checking that the runtime is working.
37-
Export `CUBED_CONFIG` as described in the set up instructions, then run the script. This is for running on the local machine using the `processes` executor:
38-
39-
```shell
40-
export CUBED_CONFIG=$(pwd)/processes/cubed.yaml
41-
python add-asarray.py
42-
```
43-
44-
This is for Lithops on AWS:
45-
46-
```shell
47-
export CUBED_CONFIG=$(pwd)/lithops/aws/cubed.yaml
48-
python add-asarray.py
49-
```
50-
51-
If successful it should print a 4x4 array.
52-
53-
The other examples are run in a similar way:
54-
55-
```shell
56-
export CUBED_CONFIG=...
57-
python add-random.py
58-
```
59-
60-
and
61-
62-
```shell
63-
export CUBED_CONFIG=...
64-
python matmul-random.py
65-
```
66-
67-
These will take longer to run as they operate on more data.
68-
69-
The last two examples use `TimelineVisualizationCallback` which produce a plot showing the timeline of events in the task lifecycle, and `HistoryCallback` to produce stats about memory usage.
70-
The plots are SVG files and are written in the `history` directory in a directory with a timestamp. Open the latest one with
71-
72-
```shell
73-
open $(ls -d history/compute-* | tail -1)/timeline.svg
74-
```
75-
76-
The memory usage stats are in a CSV file which you can view with
77-
78-
79-
```shell
80-
open $(ls -d history/compute-* | tail -1)/stats.csv
81-
```
82-
83-
## Running the notebook examples
84-
85-
Before running these notebook examples, you will need to install some additional dependencies (besides Cubed).
86-
87-
`mamba install rich pydot flox cubed-xarray`
88-
89-
`cubed-xarray` is necessary to wrap Cubed arrays as Xarray DataArrays or Xarray Datasets.
90-
`flox` is for supporting efficient groupby operations in Xarray.
91-
`pydot` allows plotting the Cubed execution plan.
92-
`rich` is for showing progress of array operations within callbacks applied to Cubed plan operations.
5+
See the [documentation](https://cubed-dev.github.io/cubed/examples/index.html) for details.

examples/coiled/aws/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -25,4 +25,4 @@ Before running the examples, first change to the top-level examples directory (`
2525
export CUBED_CONFIG=$(pwd)/coiled/aws
2626
```
2727

28-
Then you can run the examples described [there](../../README.md).
28+
Then you can run the examples in the [docs](https://cubed-dev.github.io/cubed/examples/index.html).

0 commit comments

Comments
 (0)