Skip to content

Commit 3aee34e

Browse files
committed
Update README
1 parent 7c8c719 commit 3aee34e

File tree

1 file changed

+54
-0
lines changed

1 file changed

+54
-0
lines changed

references/classification/README.md

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,60 @@ torchrun --nproc_per_node=8 train.py\
143143
```
144144
Here `$MODEL` is one of `regnet_x_32gf`, `regnet_y_16gf` and `regnet_y_32gf`.
145145

146+
### Vision Transformer
147+
148+
#### vit_b_16
149+
```
150+
torchrun --nproc_per_node=8 train.py\
151+
--model vit_b_16 --epochs 300 --batch-size 512 --opt adamw --lr 0.003 --wd 0.3\
152+
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
153+
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
154+
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
155+
```
156+
157+
Note that the above command corresponds to training on a single node with 8 GPUs.
158+
For generatring the pre-trained weights, we trained with 8 nodes, each with 8 GPUs (for a total of 64 GPUs),
159+
and `--batch_size 64`.
160+
161+
#### vit_b_32
162+
```
163+
torchrun --nproc_per_node=8 train.py\
164+
--model vit_b_32 --epochs 300 --batch-size 512 --opt adamw --lr 0.003 --wd 0.3\
165+
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
166+
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment imagenet\
167+
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
168+
```
169+
170+
Note that the above command corresponds to training on a single node with 8 GPUs.
171+
For generatring the pre-trained weights, we trained with 2 nodes, each with 8 GPUs (for a total of 16 GPUs),
172+
and `--batch_size 256`.
173+
174+
#### vit_l_16
175+
```
176+
torchrun --nproc_per_node=8 train.py\
177+
--model vit_l_16 --epochs 600 --batch-size 128 --lr 0.5 --lr-scheduler cosineannealinglr\
178+
--lr-warmup-method linear --lr-warmup-epochs 5 --label-smoothing 0.1 --mixup-alpha 0.2\
179+
--auto-augment ta_wide --random-erase 0.1 --weight-decay 0.00002 --norm-weight-decay 0.0\
180+
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema --val-resize-size 232
181+
```
182+
183+
Note that the above command corresponds to training on a single node with 8 GPUs.
184+
For generatring the pre-trained weights, we trained with 2 nodes, each with 8 GPUs (for a total of 16 GPUs),
185+
and `--batch_size 64`.
186+
187+
#### vit_l_32
188+
```
189+
torchrun --nproc_per_node=8 train.py\
190+
--model vit_l_32 --epochs 300 --batch-size 512 --opt adamw --lr 0.003 --wd 0.3\
191+
--lr-scheduler cosineannealinglr --lr-warmup-method linear --lr-warmup-epochs 30\
192+
--lr-warmup-decay 0.033 --amp --label-smoothing 0.11 --mixup-alpha 0.2 --auto-augment ra\
193+
--clip-grad-norm 1 --ra-sampler --cutmix-alpha 1.0 --model-ema
194+
```
195+
196+
Note that the above command corresponds to training on a single node with 8 GPUs.
197+
For generatring the pre-trained weights, we trained with 8 nodes, each with 8 GPUs (for a total of 64 GPUs),
198+
and `--batch_size 64`.
199+
146200
## Mixed precision training
147201
Automatic Mixed Precision (AMP) training on GPU for Pytorch can be enabled with the [torch.cuda.amp](https://pytorch.org/docs/stable/amp.html?highlight=amp#module-torch.cuda.amp).
148202

0 commit comments

Comments
 (0)