Commit 4246907
feat(runtimes): add support for launcher resource allocation in MPI jobs (kubeflow#2653)
* feat(runtime): add support for launcher resource allocation in MPI jobs
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Add unit tests
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Set numProcPerNode for MPI plugin
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Move util func to runtime package
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Fix torchtune plugin
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Inline if for GPU check
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Assign container resources once
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
* Add todo for test wrappers
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
---------
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
Co-authored-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>1 parent ca37406 commit 4246907
11 files changed
Lines changed: 2027 additions & 1056 deletions
File tree
- examples
- deepspeed/text-summarization
- mlx
- image-classification
- language-modeling
- pkg
- runtime
- framework
- core
- plugins
- jobset
- mpi
- torch
- util/testing
Lines changed: 475 additions & 411 deletions
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
101 | 101 | | |
102 | 102 | | |
103 | 103 | | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
104 | 112 | | |
105 | 113 | | |
106 | 114 | | |
| 115 | + | |
107 | 116 | | |
108 | | - | |
109 | | - | |
| 117 | + | |
| 118 | + | |
110 | 119 | | |
111 | | - | |
| 120 | + | |
112 | 121 | | |
113 | 122 | | |
114 | 123 | | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | 124 | | |
120 | 125 | | |
121 | 126 | | |
| |||
130 | 135 | | |
131 | 136 | | |
132 | 137 | | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
133 | 148 | | |
134 | 149 | | |
135 | 150 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
113 | 113 | | |
114 | 114 | | |
115 | 115 | | |
116 | | - | |
117 | | - | |
118 | 116 | | |
119 | 117 | | |
120 | 118 | | |
| |||
125 | 123 | | |
126 | 124 | | |
127 | 125 | | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
128 | 135 | | |
129 | 136 | | |
130 | 137 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
26 | 26 | | |
27 | 27 | | |
28 | 28 | | |
| 29 | + | |
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| |||
350 | 351 | | |
351 | 352 | | |
352 | 353 | | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
| 445 | + | |
| 446 | + | |
| 447 | + | |
| 448 | + | |
| 449 | + | |
| 450 | + | |
| 451 | + | |
| 452 | + | |
| 453 | + | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
| 459 | + | |
| 460 | + | |
| 461 | + | |
| 462 | + | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
| 479 | + | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
353 | 491 | | |
354 | 492 | | |
355 | 493 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
23 | | - | |
24 | 23 | | |
25 | 24 | | |
26 | 25 | | |
| |||
30 | 29 | | |
31 | 30 | | |
32 | 31 | | |
33 | | - | |
34 | 32 | | |
35 | 33 | | |
36 | 34 | | |
| |||
113 | 111 | | |
114 | 112 | | |
115 | 113 | | |
116 | | - | |
| 114 | + | |
117 | 115 | | |
118 | 116 | | |
119 | 117 | | |
120 | | - | |
| 118 | + | |
121 | 119 | | |
122 | 120 | | |
123 | 121 | | |
| |||
204 | 202 | | |
205 | 203 | | |
206 | 204 | | |
207 | | - | |
208 | | - | |
209 | | - | |
210 | | - | |
211 | | - | |
212 | | - | |
213 | | - | |
214 | | - | |
215 | | - | |
216 | | - | |
217 | | - | |
218 | | - | |
219 | | - | |
220 | | - | |
221 | | - | |
222 | | - | |
223 | | - | |
224 | | - | |
225 | | - | |
226 | | - | |
227 | | - | |
228 | | - | |
229 | | - | |
230 | | - | |
231 | | - | |
232 | | - | |
233 | | - | |
234 | | - | |
235 | | - | |
236 | | - | |
237 | | - | |
238 | | - | |
239 | | - | |
240 | | - | |
241 | | - | |
242 | | - | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
250 | | - | |
251 | | - | |
252 | | - | |
253 | | - | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
51 | 51 | | |
52 | 52 | | |
53 | 53 | | |
54 | | - | |
| 54 | + | |
55 | 55 | | |
56 | 56 | | |
57 | 57 | | |
58 | | - | |
| 58 | + | |
59 | 59 | | |
60 | 60 | | |
61 | 61 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
23 | 24 | | |
24 | 25 | | |
25 | 26 | | |
26 | 27 | | |
27 | 28 | | |
28 | 29 | | |
| 30 | + | |
29 | 31 | | |
30 | 32 | | |
| 33 | + | |
31 | 34 | | |
32 | 35 | | |
33 | 36 | | |
| |||
240 | 243 | | |
241 | 244 | | |
242 | 245 | | |
| 246 | + | |
| 247 | + | |
| 248 | + | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
0 commit comments