Commit b9cc7e8
authored
[Trainer][Megatron] Sequence packing + Context Parallel for Megatron (#274)
# Overview
Adds sequence packing + context parallel support for megatron backend.
Note that context parallel without sequence packing is not supported.
## Correctness Check
### CP + TP + PP
<img width="368" height="275" alt="image"
src="https://github.com/user-attachments/assets/53fdd009-3af9-4352-8e63-7604b2dfdeee"
/>
### Just Sequence Packing
<img width="366" height="278" alt="image"
src="https://github.com/user-attachments/assets/9a40dfdf-af8c-44e8-bc54-78e13d187daa"
/>
### Just CP + Sequence Packing
<img width="364" height="281" alt="image"
src="https://github.com/user-attachments/assets/c69522e8-52b1-4581-8a66-a579b29bbb0d"
/>
### Timing
Adding CP is slower as expected, adding just sequence packing is also
slightly slower for tp=2,pp=2.
<img width="362" height="286" alt="image"
src="https://github.com/user-attachments/assets/9109ce98-0740-46ce-8a92-de5cd8cf2ec2"
/>
This seems to be because of overhead in computing rotary positional embeddings - without sequence packing, it's a batched call for a well formed tensor, while without sequence packing, it iterates over sequences one by one: #274 (comment)1 parent c2bfe61 commit b9cc7e8
File tree
5 files changed
+282
-58
lines changed- skyrl-train
- examples/training_backends/megatron
- skyrl_train
- distributed/megatron
- utils
- workers/megatron
- tests/gpu
5 files changed
+282
-58
lines changedLines changed: 5 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15 | 15 | | |
16 | 16 | | |
17 | 17 | | |
| 18 | + | |
18 | 19 | | |
19 | 20 | | |
20 | 21 | | |
| |||
29 | 30 | | |
30 | 31 | | |
31 | 32 | | |
| 33 | + | |
32 | 34 | | |
| 35 | + | |
33 | 36 | | |
34 | | - | |
| 37 | + | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
| |||
56 | 59 | | |
57 | 60 | | |
58 | 61 | | |
59 | | - | |
| 62 | + | |
60 | 63 | | |
61 | 64 | | |
62 | 65 | | |
Lines changed: 144 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
3 | 4 | | |
4 | 5 | | |
5 | 6 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
30 | 32 | | |
31 | 33 | | |
32 | 34 | | |
| |||
291 | 293 | | |
292 | 294 | | |
293 | 295 | | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
| 424 | + | |
| 425 | + | |
| 426 | + | |
| 427 | + | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
| 436 | + | |
| 437 | + | |
294 | 438 | | |
295 | 439 | | |
296 | 440 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
124 | 130 | | |
125 | 131 | | |
126 | 132 | | |
127 | | - | |
128 | | - | |
129 | | - | |
130 | | - | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
131 | 136 | | |
132 | 137 | | |
133 | 138 | | |
| |||
Lines changed: 75 additions & 28 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
13 | 13 | | |
14 | 14 | | |
15 | 15 | | |
| 16 | + | |
| 17 | + | |
16 | 18 | | |
17 | 19 | | |
18 | 20 | | |
| |||
34 | 36 | | |
35 | 37 | | |
36 | 38 | | |
| 39 | + | |
37 | 40 | | |
38 | 41 | | |
39 | 42 | | |
| |||
86 | 89 | | |
87 | 90 | | |
88 | 91 | | |
| 92 | + | |
89 | 93 | | |
90 | 94 | | |
91 | 95 | | |
| |||
96 | 100 | | |
97 | 101 | | |
98 | 102 | | |
99 | | - | |
100 | | - | |
101 | | - | |
102 | | - | |
103 | | - | |
104 | | - | |
105 | | - | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
106 | 120 | | |
107 | 121 | | |
108 | 122 | | |
109 | 123 | | |
110 | 124 | | |
| 125 | + | |
111 | 126 | | |
112 | 127 | | |
113 | | - | |
114 | | - | |
115 | | - | |
116 | | - | |
117 | | - | |
118 | | - | |
119 | | - | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
120 | 145 | | |
121 | 146 | | |
122 | 147 | | |
| |||
192 | 217 | | |
193 | 218 | | |
194 | 219 | | |
| 220 | + | |
195 | 221 | | |
196 | 222 | | |
197 | 223 | | |
| |||
240 | 266 | | |
241 | 267 | | |
242 | 268 | | |
243 | | - | |
244 | | - | |
245 | | - | |
246 | | - | |
247 | | - | |
248 | | - | |
249 | | - | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
250 | 286 | | |
251 | 287 | | |
252 | 288 | | |
253 | 289 | | |
254 | 290 | | |
| 291 | + | |
255 | 292 | | |
256 | 293 | | |
257 | | - | |
258 | | - | |
259 | | - | |
260 | | - | |
261 | | - | |
262 | | - | |
263 | | - | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
264 | 311 | | |
265 | 312 | | |
266 | 313 | | |
| |||
0 commit comments