diff --git a/references/classification/README.md b/references/classification/README.md index 210a63c0bca..61c81666e16 100644 --- a/references/classification/README.md +++ b/references/classification/README.md @@ -40,12 +40,17 @@ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\ ### ResNext-101 32x8d -On 8 nodes, each with 8 GPUs (for a total of 64 GPUS) ``` python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\ --model resnext101_32x8d --epochs 100 ``` +Note that the above command corresponds to a single node with 8 GPUs. If you use +a different number of GPUs and/or a different batch size, then the learning rate +should be scaled accordingly. For example, the pretrained model provided by +`torchvision` was trained on 8 nodes, each with 8 GPUs (for a total of 64 GPUs), +with `--batch_size 16` and `--lr 0.4`, instead of the current defaults +which are respectively batch_size=32 and lr=0.1 ### MobileNetV2 ```