pytorch · yiwen-song · Dec 10, 2021 · Dec 10, 2021
diff --git a/references/classification/README.md b/references/classification/README.md
@@ -125,7 +125,7 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_400mf`, `regnet_x_800mf`, `regnet_x_1_6gf`, `regnet_y_400mf`, `regnet_y_800mf` and `regnet_y_1_6gf`. Please note we used learning rate 0.4 for `regent_y_400mf` to get the same Acc@1 as [the paper)(https://arxiv.org/abs/2003.13678).
 
-### Medium models
+#### Medium models
 ```
 torchrun --nproc_per_node=8 train.py\
      --model $MODEL --epochs 100 --batch-size 64 --wd 0.00005 --lr=0.4\
@@ -134,7 +134,7 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_3_2gf`, `regnet_x_8gf`, `regnet_x_16gf`, `regnet_y_3_2gf` and `regnet_y_8gf`.
 
-### Large models
+#### Large models
 ```
 torchrun --nproc_per_node=8 train.py\
      --model $MODEL --epochs 100 --batch-size 32 --wd 0.00005 --lr=0.2\
@@ -143,6 +143,28 @@ torchrun --nproc_per_node=8 train.py\
 ```
 Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`.
 
+### Vision Transformer
+
+#### Base models
+```
+torchrun --nproc_per_node=8 train.py\
+    --model $MODEL --epochs 300 --batch-size 64 --opt adamw --lr 0.003 --wd 0.3\
+    --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
+    --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
+    --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
+```
+Here `$MODEL` is one of `vit_b_16` and `vit_b_32`.
+
+#### Large models
+```
+torchrun --nproc_per_node=8 train.py\
+    --model $MODEL --epochs 300 --batch-size 16 --opt adamw --lr 0.003 --wd 0.3\
+    --lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
+    --lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
+    --clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
+```
+Here `$MODEL` is one of `vit_l_16` and `vit_l_32`.
+
 ## Mixed precision training
 Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp).