Skip to content

Commit ff5c5b3

Browse files
NicolasHugfacebook-github-bot
authored andcommitted
[fbsync] Clarification for training resnext101_32x8d on ImageNet (#4390)
Summary: * Fix training resuming in references/segmentation * Clarification for training resnext101_32x8d * Update references/classification/README.md Reviewed By: kazhang Differential Revision: D30898330 fbshipit-source-id: 195c24c57ad3abe2e23e08b3b9251db68790914c Co-authored-by: Nicolas Hug <[email protected]> Co-authored-by: Nicolas Hug <[email protected]>
1 parent 16e774a commit ff5c5b3

File tree

1 file changed

+6
-1
lines changed

1 file changed

+6
-1
lines changed

references/classification/README.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,17 @@ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
4040

4141
### ResNext-101 32x8d
4242

43-
On 8 nodes, each with 8 GPUs (for a total of 64 GPUS)
4443
```
4544
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
4645
--model resnext101_32x8d --epochs 100
4746
```
4847

48+
Note that the above command corresponds to a single node with 8 GPUs. If you use
49+
a different number of GPUs and/or a different batch size, then the learning rate
50+
should be scaled accordingly. For example, the pretrained model provided by
51+
`torchvision` was trained on 8 nodes, each with 8 GPUs (for a total of 64 GPUs),
52+
with `--batch_size 16` and `--lr 0.4`, instead of the current defaults
53+
which are respectively batch_size=32 and lr=0.1
4954

5055
### MobileNetV2
5156
```

0 commit comments

Comments
 (0)