Skip to content

Solutions to the latest Tutorial Exercises #45 #55

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
9 changes: 9 additions & 0 deletions exercises/solutions/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Ignore .html and .json file generated by ray's ui
*.html
*.json

# Ignore ipynb checkpoints
.ipynb_checkpoints

# Ignore temporary files
.*~
327 changes: 327 additions & 0 deletions exercises/solutions/exercise01.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,327 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Exercise 1 - Simple Data Parallel Example\n",
"\n",
"**GOAL:** The goal of this exercise is to show how to run simple tasks in parallel.\n",
"\n",
"This script is too slow, and the computation is embarrassingly parallel. In this exercise, you will use Ray to execute the functions in parallel to speed it up.\n",
"\n",
"### Concept for this Exercise - Remote Functions\n",
"\n",
"The standard way to turn a Python function into a remote function is to add the `@ray.remote` decorator. Here is an example.\n",
"\n",
"```python\n",
"# A regular Python function.\n",
"def regular_function():\n",
" return 1\n",
"\n",
"# A Ray remote function.\n",
"@ray.remote\n",
"def remote_function():\n",
" return 1\n",
"```\n",
"\n",
"The differences are the following:\n",
"\n",
"1. **Invocation:** The regular version is called with `regular_function()`, whereas the remote version is called with `remote_function.remote()`.\n",
"2. **Return values:** `regular_function` immediately executes and returns `1`, whereas `remote_function` immediately returns an object ID (a future) and then creates a task that will be executed on a worker process. The result can be obtained with `ray.get`.\n",
" ```python\n",
" >>> regular_function()\n",
" 1\n",
" \n",
" >>> remote_function.remote()\n",
" ObjectID(1c80d6937802cd7786ad25e50caf2f023c95e350)\n",
" \n",
" >>> ray.get(remote_function.remote())\n",
" 1\n",
" ```\n",
"3. **Parallelism:** Invocations of `regular_function` happen **serially**, for example\n",
" ```python\n",
" # These happen serially.\n",
" for _ in range(4):\n",
" regular_function()\n",
" ```\n",
" whereas invocations of `remote_function` happen in **parallel**, for example\n",
" ```python\n",
" # These happen in parallel.\n",
" for _ in range(4):\n",
" remote_function.remote()\n",
" ```"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"from __future__ import absolute_import\n",
"from __future__ import division\n",
"from __future__ import print_function\n",
"\n",
"import ray\n",
"import time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Start Ray. By default, Ray does not schedule more tasks concurrently than there are CPUs. This example requires four tasks to run concurrently, so we tell Ray that there are four CPUs. Usually this is not done and Ray computes the number of CPUs using `psutil.cpu_count()`. The argument `redirect_output=True` just suppresses some logging.\n",
"\n",
"The call to `ray.init` starts a number of processes."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Waiting for redis server at 127.0.0.1:49263 to respond...\n",
"Waiting for redis server at 127.0.0.1:61954 to respond...\n",
"Starting local scheduler with 4 CPUs, 0 GPUs\n",
"\n",
"======================================================================\n",
"View the web UI at http://localhost:8893/notebooks/ray_ui84549.ipynb?token=33d5b878725f4d8f053fe2bdbfac3b2a22fd0a4ccbb3aa20\n",
"======================================================================\n",
"\n"
]
},
{
"data": {
"text/plain": [
"{'local_scheduler_socket_names': ['/tmp/scheduler21215428'],\n",
" 'node_ip_address': '127.0.0.1',\n",
" 'object_store_addresses': [ObjectStoreAddress(name='/tmp/plasma_store93576587', manager_name='/tmp/plasma_manager5822887', manager_port=40447)],\n",
" 'redis_address': '127.0.0.1:49263',\n",
" 'webui_url': 'http://localhost:8893/notebooks/ray_ui84549.ipynb?token=33d5b878725f4d8f053fe2bdbfac3b2a22fd0a4ccbb3aa20'}"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"ray.init(num_cpus=4, redirect_output=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**EXERCISE:** The function below is slow. Turn it into a remote function using the `@ray.remote` decorator."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# This function is a proxy for a more interesting and computationally\n",
"# intensive function.\n",
"def slow_function(i):\n",
" time.sleep(1)\n",
" return i\n",
"@ray.remote\n",
"def remote_function(i):\n",
" time.sleep(1)\n",
" return i"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**EXERCISE:** The loop below takes too long. The four function calls could be executed in parallel. Instead of four seconds, it should only take one second. Once `slow_function` has been made a remote function, execute these four tasks in parallel by calling `slow_function.remote()`. Then obtain the results by calling `ray.get` on a list of the resulting object IDs."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Sleep a little to improve the accuracy of the timing measurements below.\n",
"# We do this because workers may still be starting up in the background.\n",
"time.sleep(4.0) # Increased the sleep time as 2.0 was small at times\n",
"start_time = time.time()\n",
"\n",
"results = []\n",
"for i in range(4):\n",
" results.append(remote_function.remote(i))\n",
"results = ray.get(results)\n",
"end_time = time.time()\n",
"duration = end_time - start_time"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**VERIFY:** Run some checks to verify that the changes you made to the code were correct. Some of the checks should fail when you initially run the cells. After completing the exercises, the checks should pass."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Success! The example took 1.0059635639190674 seconds.\n"
]
}
],
"source": [
"assert results == [0, 1, 2, 3], 'Did you remember to call ray.get?'\n",
"assert duration < 1.1, ('The loop took {} seconds. This is too slow.'\n",
" .format(duration))\n",
"assert duration > 1, ('The loop took {} seconds. This is too fast.'\n",
" .format(duration))\n",
"\n",
"print('Success! The example took {} seconds.'.format(duration))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**EXERCISE:** Use the UI to view the task timeline and to verify that the four tasks were executed in parallel. After running the cell below, you'll need to click on **View task timeline**\".\n",
"- Using the **second** button, you can click and drag to **move** the timeline.\n",
"- Using the **third** button, you can click and drag to **zoom**. You can also zoom by holding \"alt\" and scrolling.\n",
"\n",
"**NOTE:** Normally our UI is used as a separate Jupyter notebook. However, for simplicity we embedded the relevant feature here in this notebook.\n",
"\n",
"**NOTE:** The first time you click **View task timeline** it may take **several minutes** to start up. This will change.\n",
"\n",
"**NOTE:** If you run more tasks and want to regenerate the UI, you need to move the slider bar a little bit and then click **View task timeline** again.\n",
"\n",
"**NOTE:** The timeline visualization may only work in **Chrome**."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"To view fullscreen, open chrome://tracing in Google Chrome and load `/tmp/tmpxo6zov96.json`\n"
]
},
{
"data": {
"text/html": [
"\n",
" <iframe\n",
" width=\"900\"\n",
" height=\"800\"\n",
" src=\"tmp_ycklcbh.html\"\n",
" frameborder=\"0\"\n",
" allowfullscreen\n",
" ></iframe>\n",
" "
],
"text/plain": [
"<IPython.lib.display.IFrame at 0x7fac883b59e8>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"import ray.experimental.ui as ui\n",
"ui.task_timeline()"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.0"
},
"widgets": {
"state": {
"18e4f7a4f0d14019b433838f43a21dde": {
"views": [
{
"cell_index": 11
}
]
},
"5629e28d1420403fbe81f0a88fbdeab4": {
"views": [
{
"cell_index": 11
}
]
},
"5a1628570ea7498580bec45dd430f87c": {
"views": [
{
"cell_index": 11
}
]
},
"84312b77b729401b9c6e82c457a8385c": {
"views": [
{
"cell_index": 11
}
]
},
"fcf4d4d5865342bfb66b8efb46f8d9b9": {
"views": [
{
"cell_index": 11
}
]
}
},
"version": "1.2.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading