This repository was archived by the owner on Mar 20, 2026. It is now read-only.
Commit 718677e
dont project maske tokens for mlm loss (#859)
Summary:
This saves ~4-5gb gpu memory while training roberta large with `seq_len=512`.
I am able to fit `--max-sentences=16` on `volta32gb` for `roberta-large`
Pull Request resolved: fairinternal/fairseq-py#859
Differential Revision: D17435814
fbshipit-source-id: 2663909768fac0ef0102107613770ee01b1f8c001 parent 31dd13f commit 718677e
2 files changed
Lines changed: 16 additions & 9 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
30 | 30 | | |
31 | 31 | | |
32 | 32 | | |
33 | | - | |
| 33 | + | |
| 34 | + | |
34 | 35 | | |
| 36 | + | |
| 37 | + | |
35 | 38 | | |
36 | 39 | | |
37 | 40 | | |
| |||
43 | 46 | | |
44 | 47 | | |
45 | 48 | | |
46 | | - | |
| 49 | + | |
47 | 50 | | |
48 | 51 | | |
49 | 52 | | |
| |||
64 | 67 | | |
65 | 68 | | |
66 | 69 | | |
| 70 | + | |
67 | 71 | | |
68 | 72 | | |
69 | 73 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
201 | 201 | | |
202 | 202 | | |
203 | 203 | | |
204 | | - | |
| 204 | + | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
205 | 210 | | |
206 | 211 | | |
207 | 212 | | |
208 | | - | |
209 | 213 | | |
210 | 214 | | |
211 | | - | |
212 | 215 | | |
213 | 216 | | |
214 | 217 | | |
| |||
265 | 268 | | |
266 | 269 | | |
267 | 270 | | |
268 | | - | |
| 271 | + | |
269 | 272 | | |
270 | 273 | | |
271 | 274 | | |
| |||
283 | 286 | | |
284 | 287 | | |
285 | 288 | | |
286 | | - | |
| 289 | + | |
287 | 290 | | |
288 | 291 | | |
289 | 292 | | |
| |||
293 | 296 | | |
294 | 297 | | |
295 | 298 | | |
296 | | - | |
297 | | - | |
| 299 | + | |
| 300 | + | |
298 | 301 | | |
299 | 302 | | |
300 | 303 | | |
| |||
0 commit comments