Skip to content

Commit a285ef6

Browse files
authored
Changed min/max workers to single field (#266)
* Changed min/max workers to single field * Updated to num_workers, num_gpus, and demo update * Delete comment on 144 * Updated preview nbs
1 parent 415b055 commit a285ef6

15 files changed

+2154
-112
lines changed
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "8d4a42f6",
6+
"metadata": {},
7+
"source": [
8+
"In this first notebook, we will go through the basics of using the SDK to:\n",
9+
" - Spin up a Ray cluster with our desired resources\n",
10+
" - View the status and specs of our Ray cluster\n",
11+
" - Take down the Ray cluster when finished"
12+
]
13+
},
14+
{
15+
"cell_type": "code",
16+
"execution_count": null,
17+
"id": "b55bc3ea-4ce3-49bf-bb1f-e209de8ca47a",
18+
"metadata": {},
19+
"outputs": [],
20+
"source": [
21+
"# Import pieces from codeflare-sdk\n",
22+
"from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration\n",
23+
"from codeflare_sdk.cluster.auth import TokenAuthentication"
24+
]
25+
},
26+
{
27+
"cell_type": "code",
28+
"execution_count": null,
29+
"id": "614daa0c",
30+
"metadata": {},
31+
"outputs": [],
32+
"source": [
33+
"# Create authentication object for oc user permissions\n",
34+
"auth = TokenAuthentication(\n",
35+
" token = \"XXXXX\",\n",
36+
" server = \"XXXXX\",\n",
37+
" skip_tls=False\n",
38+
")\n",
39+
"auth.login()"
40+
]
41+
},
42+
{
43+
"cell_type": "markdown",
44+
"id": "bc27f84c",
45+
"metadata": {},
46+
"source": [
47+
"Here, we want to define our cluster by specifying the resources we require for our batch workload. Below, we define our cluster object (which generates a corresponding AppWrapper)."
48+
]
49+
},
50+
{
51+
"cell_type": "code",
52+
"execution_count": null,
53+
"id": "0f4bc870-091f-4e11-9642-cba145710159",
54+
"metadata": {},
55+
"outputs": [],
56+
"source": [
57+
"# Create and configure our cluster object (and appwrapper)\n",
58+
"cluster = Cluster(ClusterConfiguration(\n",
59+
" name='raytest',\n",
60+
" namespace='default',\n",
61+
" num_workers=2,\n",
62+
" min_cpus=1,\n",
63+
" max_cpus=1,\n",
64+
" min_memory=4,\n",
65+
" max_memory=4,\n",
66+
" num_gpus=0,\n",
67+
" image=\"quay.io/project-codeflare/ray:2.5.0-py38-cu116\", #current default\n",
68+
" instascale=False\n",
69+
"))"
70+
]
71+
},
72+
{
73+
"cell_type": "markdown",
74+
"id": "12eef53c",
75+
"metadata": {},
76+
"source": [
77+
"Next, we want to bring our cluster up, so we call the `up()` function below to submit our cluster AppWrapper yaml onto the MCAD queue, and begin the process of obtaining our resource cluster."
78+
]
79+
},
80+
{
81+
"cell_type": "code",
82+
"execution_count": null,
83+
"id": "f0884bbc-c224-4ca0-98a0-02dfa09c2200",
84+
"metadata": {},
85+
"outputs": [],
86+
"source": [
87+
"# Bring up the cluster\n",
88+
"cluster.up()"
89+
]
90+
},
91+
{
92+
"cell_type": "markdown",
93+
"id": "657ebdfb",
94+
"metadata": {},
95+
"source": [
96+
"Now, we want to check on the status of our resource cluster, and wait until it is finally ready for use."
97+
]
98+
},
99+
{
100+
"cell_type": "code",
101+
"execution_count": null,
102+
"id": "3c1b4311-2e61-44c9-8225-87c2db11363d",
103+
"metadata": {},
104+
"outputs": [],
105+
"source": [
106+
"cluster.status()"
107+
]
108+
},
109+
{
110+
"cell_type": "code",
111+
"execution_count": null,
112+
"id": "a99d5aff",
113+
"metadata": {},
114+
"outputs": [],
115+
"source": [
116+
"cluster.wait_ready()"
117+
]
118+
},
119+
{
120+
"cell_type": "code",
121+
"execution_count": null,
122+
"id": "df71c1ed",
123+
"metadata": {},
124+
"outputs": [],
125+
"source": [
126+
"cluster.status()"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"id": "b3a55fe4",
132+
"metadata": {},
133+
"source": [
134+
"Let's quickly verify that the specs of the cluster are as expected."
135+
]
136+
},
137+
{
138+
"cell_type": "code",
139+
"execution_count": null,
140+
"id": "7fd45bc5-03c0-4ae5-9ec5-dd1c30f1a084",
141+
"metadata": {},
142+
"outputs": [],
143+
"source": [
144+
"cluster.details()"
145+
]
146+
},
147+
{
148+
"cell_type": "markdown",
149+
"id": "5af8cd32",
150+
"metadata": {},
151+
"source": [
152+
"Finally, we bring our resource cluster down and release/terminate the associated resources, bringing everything back to the way it was before our cluster was brought up."
153+
]
154+
},
155+
{
156+
"cell_type": "code",
157+
"execution_count": null,
158+
"id": "5f36db0f-31f6-4373-9503-dc3c1c4c3f57",
159+
"metadata": {},
160+
"outputs": [],
161+
"source": [
162+
"cluster.down()"
163+
]
164+
},
165+
{
166+
"cell_type": "code",
167+
"execution_count": null,
168+
"id": "0d41b90e",
169+
"metadata": {},
170+
"outputs": [],
171+
"source": [
172+
"auth.logout()"
173+
]
174+
}
175+
],
176+
"metadata": {
177+
"kernelspec": {
178+
"display_name": "Python 3 (ipykernel)",
179+
"language": "python",
180+
"name": "python3"
181+
},
182+
"language_info": {
183+
"codemirror_mode": {
184+
"name": "ipython",
185+
"version": 3
186+
},
187+
"file_extension": ".py",
188+
"mimetype": "text/x-python",
189+
"name": "python",
190+
"nbconvert_exporter": "python",
191+
"pygments_lexer": "ipython3",
192+
"version": "3.8.13"
193+
},
194+
"vscode": {
195+
"interpreter": {
196+
"hash": "f9f85f796d01129d0dd105a088854619f454435301f6ffec2fea96ecbd9be4ac"
197+
}
198+
}
199+
},
200+
"nbformat": 4,
201+
"nbformat_minor": 5
202+
}
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,172 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "9865ee8c",
6+
"metadata": {},
7+
"source": [
8+
"In this second notebook, we will go over the basics of using InstaScale to scale up/down necessary resources that are not currently available on your OpenShift Cluster (in cloud environments)."
9+
]
10+
},
11+
{
12+
"cell_type": "code",
13+
"execution_count": null,
14+
"id": "b55bc3ea-4ce3-49bf-bb1f-e209de8ca47a",
15+
"metadata": {},
16+
"outputs": [],
17+
"source": [
18+
"# Import pieces from codeflare-sdk\n",
19+
"from codeflare_sdk.cluster.cluster import Cluster, ClusterConfiguration\n",
20+
"from codeflare_sdk.cluster.auth import TokenAuthentication"
21+
]
22+
},
23+
{
24+
"cell_type": "code",
25+
"execution_count": null,
26+
"id": "614daa0c",
27+
"metadata": {},
28+
"outputs": [],
29+
"source": [
30+
"# Create authentication object for oc user permissions\n",
31+
"auth = TokenAuthentication(\n",
32+
" token = \"XXXXX\",\n",
33+
" server = \"XXXXX\",\n",
34+
" skip_tls=False\n",
35+
")\n",
36+
"auth.login()"
37+
]
38+
},
39+
{
40+
"cell_type": "markdown",
41+
"id": "bc27f84c",
42+
"metadata": {},
43+
"source": [
44+
"This time, we are working in a cloud environment, and our OpenShift cluster does not have the resources needed for our desired workloads. We will use InstaScale to dynamically scale-up guaranteed resources based on our request (that will also automatically scale-down when we are finished working):"
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": null,
50+
"id": "0f4bc870-091f-4e11-9642-cba145710159",
51+
"metadata": {},
52+
"outputs": [],
53+
"source": [
54+
"# Create and configure our cluster object (and appwrapper)\n",
55+
"cluster = Cluster(ClusterConfiguration(\n",
56+
" name='instascaletest',\n",
57+
" namespace='default',\n",
58+
" num_workers=2,\n",
59+
" min_cpus=2,\n",
60+
" max_cpus=2,\n",
61+
" min_memory=8,\n",
62+
" max_memory=8,\n",
63+
" num_gpus=1,\n",
64+
" instascale=True, # InstaScale now enabled, will scale OCP cluster to guarantee resource request\n",
65+
" machine_types=[\"m5.xlarge\", \"g4dn.xlarge\"] # Head, worker AWS machine types desired\n",
66+
"))"
67+
]
68+
},
69+
{
70+
"cell_type": "markdown",
71+
"id": "12eef53c",
72+
"metadata": {},
73+
"source": [
74+
"Same as last time, we will bring the cluster up, wait for it to be ready, and confirm that the specs are as-requested:"
75+
]
76+
},
77+
{
78+
"cell_type": "code",
79+
"execution_count": null,
80+
"id": "f0884bbc-c224-4ca0-98a0-02dfa09c2200",
81+
"metadata": {},
82+
"outputs": [],
83+
"source": [
84+
"# Bring up the cluster\n",
85+
"cluster.up()\n",
86+
"cluster.wait_ready()"
87+
]
88+
},
89+
{
90+
"cell_type": "markdown",
91+
"id": "6abfe904",
92+
"metadata": {},
93+
"source": [
94+
"While the resources are being scaled, we can also go into the console and take a look at the InstaScale logs, as well as the new machines/nodes spinning up.\n",
95+
"\n",
96+
"Once the cluster is ready, we can confirm the specs:"
97+
]
98+
},
99+
{
100+
"cell_type": "code",
101+
"execution_count": null,
102+
"id": "7fd45bc5-03c0-4ae5-9ec5-dd1c30f1a084",
103+
"metadata": {},
104+
"outputs": [],
105+
"source": [
106+
"cluster.details()"
107+
]
108+
},
109+
{
110+
"cell_type": "markdown",
111+
"id": "5af8cd32",
112+
"metadata": {},
113+
"source": [
114+
"Finally, we bring our resource cluster down and release/terminate the associated resources, bringing everything back to the way it was before our cluster was brought up."
115+
]
116+
},
117+
{
118+
"cell_type": "code",
119+
"execution_count": null,
120+
"id": "5f36db0f-31f6-4373-9503-dc3c1c4c3f57",
121+
"metadata": {},
122+
"outputs": [],
123+
"source": [
124+
"cluster.down()"
125+
]
126+
},
127+
{
128+
"cell_type": "markdown",
129+
"id": "c883caea",
130+
"metadata": {},
131+
"source": [
132+
"Once again, we can look at the machines/nodes and see that everything has been successfully scaled down!"
133+
]
134+
},
135+
{
136+
"cell_type": "code",
137+
"execution_count": null,
138+
"id": "0d41b90e",
139+
"metadata": {},
140+
"outputs": [],
141+
"source": [
142+
"auth.logout()"
143+
]
144+
}
145+
],
146+
"metadata": {
147+
"kernelspec": {
148+
"display_name": "Python 3 (ipykernel)",
149+
"language": "python",
150+
"name": "python3"
151+
},
152+
"language_info": {
153+
"codemirror_mode": {
154+
"name": "ipython",
155+
"version": 3
156+
},
157+
"file_extension": ".py",
158+
"mimetype": "text/x-python",
159+
"name": "python",
160+
"nbconvert_exporter": "python",
161+
"pygments_lexer": "ipython3",
162+
"version": "3.8.13"
163+
},
164+
"vscode": {
165+
"interpreter": {
166+
"hash": "f9f85f796d01129d0dd105a088854619f454435301f6ffec2fea96ecbd9be4ac"
167+
}
168+
}
169+
},
170+
"nbformat": 4,
171+
"nbformat_minor": 5
172+
}

0 commit comments

Comments
 (0)