Skip to content

Commit ca26537

Browse files
authored
Adding revamp docs for vision_transformers and regnet (#5856)
* Add docs for regnet, still need to update the comment docs on models * Fix a little typo on .rst file * Update regnet docstring * Add vision_transformer docs, and fix typo on regnet docs * Update docstring to make sure it does not exceed 120 chars per line * Improve formatting * Change the new line location for vision_transformer docstring
1 parent a64c674 commit ca26537

File tree

8 files changed

+320
-54
lines changed

8 files changed

+320
-54
lines changed

docs/source/models/regnet.rst

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
RegNet
2+
======
3+
4+
.. currentmodule:: torchvision.models
5+
6+
The RegNet model is based on the `Designing Network Design Spaces
7+
<https://arxiv.org/abs/2003.13678>`_ paper.
8+
9+
10+
Model builders
11+
--------------
12+
13+
The following model builders can be used to instantiate a RegNet model, with or
14+
without pre-trained weights. All the model builders internally rely on the
15+
``torchvision.models.regnet.RegNet`` base class. Please refer to the `source code
16+
<https://github.com/pytorch/vision/blob/main/torchvision/models/regnet.py>`_ for
17+
more details about this class.
18+
19+
.. autosummary::
20+
:toctree: generated/
21+
:template: function.rst
22+
23+
regnet_y_400mf
24+
regnet_y_800mf
25+
regnet_y_1_6gf
26+
regnet_y_3_2gf
27+
regnet_y_8gf
28+
regnet_y_16gf
29+
regnet_y_32gf
30+
regnet_y_128gf
31+
regnet_x_400mf
32+
regnet_x_800mf
33+
regnet_x_1_6gf
34+
regnet_x_3_2gf
35+
regnet_x_8gf
36+
regnet_x_16gf
37+
regnet_x_32gf

docs/source/models/resnet.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ The ResNet model is based on the `Deep Residual Learning for Image Recognition
1010
Model builders
1111
--------------
1212

13-
The following model builders can be used to instanciate a ResNet model, with or
13+
The following model builders can be used to instantiate a ResNet model, with or
1414
without pre-trained weights. All the model builders internally rely on the
1515
``torchvision.models.resnet.ResNet`` base class. Please refer to the `source
1616
code

docs/source/models/squeezenet.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ paper.
1111
Model builders
1212
--------------
1313

14-
The following model builders can be used to instanciate a SqueezeNet model, with or
14+
The following model builders can be used to instantiate a SqueezeNet model, with or
1515
without pre-trained weights. All the model builders internally rely on the
1616
``torchvision.models.squeezenet.SqueezeNet`` base class. Please refer to the `source
1717
code

docs/source/models/vgg.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Image Recognition <https://arxiv.org/abs/1409.1556>`_ paper.
1010
Model builders
1111
--------------
1212

13-
The following model builders can be used to instanciate a VGG model, with or
13+
The following model builders can be used to instantiate a VGG model, with or
1414
without pre-trained weights. All the model buidlers internally rely on the
1515
``torchvision.models.vgg.VGG`` base class. Please refer to the `source code
1616
<https://github.com/pytorch/vision/blob/main/torchvision/models/vgg.py>`_ for
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
VisionTransformer
2+
=================
3+
4+
.. currentmodule:: torchvision.models
5+
6+
The VisionTransformer model is based on the `An Image is Worth 16x16 Words:
7+
Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_ paper.
8+
9+
10+
Model builders
11+
--------------
12+
13+
The following model builders can be used to instantiate a VisionTransformer model, with or
14+
without pre-trained weights. All the model builders internally rely on the
15+
``torchvision.models.vision_transformer.VisionTransformer`` base class.
16+
Please refer to the `source code
17+
<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_ for
18+
more details about this class.
19+
20+
.. autosummary::
21+
:toctree: generated/
22+
:template: function.rst
23+
24+
vit_b_16
25+
vit_b_32
26+
vit_l_16
27+
vit_l_32
28+
vit_h_14

docs/source/models_new.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,11 @@ weights:
3636
.. toctree::
3737
:maxdepth: 1
3838

39+
models/regnet
3940
models/resnet
4041
models/squeezenet
4142
models/vgg
43+
models/vision_transformer
4244

4345

4446
Table of all available classification weights

torchvision/models/regnet.py

Lines changed: 190 additions & 36 deletions
Large diffs are not rendered by default.

torchvision/models/vision_transformer.py

Lines changed: 60 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -490,11 +490,20 @@ class ViT_H_14_Weights(WeightsEnum):
490490
def vit_b_16(*, weights: Optional[ViT_B_16_Weights] = None, progress: bool = True, **kwargs: Any) -> VisionTransformer:
491491
"""
492492
Constructs a vit_b_16 architecture from
493-
`"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" <https://arxiv.org/abs/2010.11929>`_.
493+
`An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_.
494494
495495
Args:
496-
weights (ViT_B_16_Weights, optional): The pretrained weights for the model
497-
progress (bool): If True, displays a progress bar of the download to stderr
496+
weights (:class:`~torchvision.models.vision_transformer.ViT_B_16_Weights`, optional): The pretrained
497+
weights to use. See :class:`~torchvision.models.vision_transformer.ViT_B_16_Weights`
498+
below for more details and possible values. By default, no pre-trained weights are used.
499+
progress (bool, optional): If True, displays a progress bar of the download to stderr. Default is True.
500+
**kwargs: parameters passed to the ``torchvision.models.vision_transformer.VisionTransformer``
501+
base class. Please refer to the `source code
502+
<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_
503+
for more details about this class.
504+
505+
.. autoclass:: torchvision.models.vision_transformer.ViT_B_16_Weights
506+
:members:
498507
"""
499508
weights = ViT_B_16_Weights.verify(weights)
500509

@@ -514,11 +523,20 @@ def vit_b_16(*, weights: Optional[ViT_B_16_Weights] = None, progress: bool = Tru
514523
def vit_b_32(*, weights: Optional[ViT_B_32_Weights] = None, progress: bool = True, **kwargs: Any) -> VisionTransformer:
515524
"""
516525
Constructs a vit_b_32 architecture from
517-
`"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" <https://arxiv.org/abs/2010.11929>`_.
526+
`An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_.
518527
519528
Args:
520-
weights (ViT_B_32_Weights, optional): The pretrained weights for the model
521-
progress (bool): If True, displays a progress bar of the download to stderr
529+
weights (:class:`~torchvision.models.vision_transformer.ViT_B_32_Weights`, optional): The pretrained
530+
weights to use. See :class:`~torchvision.models.vision_transformer.ViT_B_32_Weights`
531+
below for more details and possible values. By default, no pre-trained weights are used.
532+
progress (bool, optional): If True, displays a progress bar of the download to stderr. Default is True.
533+
**kwargs: parameters passed to the ``torchvision.models.vision_transformer.VisionTransformer``
534+
base class. Please refer to the `source code
535+
<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_
536+
for more details about this class.
537+
538+
.. autoclass:: torchvision.models.vision_transformer.ViT_B_32_Weights
539+
:members:
522540
"""
523541
weights = ViT_B_32_Weights.verify(weights)
524542

@@ -538,11 +556,20 @@ def vit_b_32(*, weights: Optional[ViT_B_32_Weights] = None, progress: bool = Tru
538556
def vit_l_16(*, weights: Optional[ViT_L_16_Weights] = None, progress: bool = True, **kwargs: Any) -> VisionTransformer:
539557
"""
540558
Constructs a vit_l_16 architecture from
541-
`"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" <https://arxiv.org/abs/2010.11929>`_.
559+
`An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_.
542560
543561
Args:
544-
weights (ViT_L_16_Weights, optional): The pretrained weights for the model
545-
progress (bool): If True, displays a progress bar of the download to stderr
562+
weights (:class:`~torchvision.models.vision_transformer.ViT_L_16_Weights`, optional): The pretrained
563+
weights to use. See :class:`~torchvision.models.vision_transformer.ViT_L_16_Weights`
564+
below for more details and possible values. By default, no pre-trained weights are used.
565+
progress (bool, optional): If True, displays a progress bar of the download to stderr. Default is True.
566+
**kwargs: parameters passed to the ``torchvision.models.vision_transformer.VisionTransformer``
567+
base class. Please refer to the `source code
568+
<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_
569+
for more details about this class.
570+
571+
.. autoclass:: torchvision.models.vision_transformer.ViT_L_16_Weights
572+
:members:
546573
"""
547574
weights = ViT_L_16_Weights.verify(weights)
548575

@@ -562,11 +589,20 @@ def vit_l_16(*, weights: Optional[ViT_L_16_Weights] = None, progress: bool = Tru
562589
def vit_l_32(*, weights: Optional[ViT_L_32_Weights] = None, progress: bool = True, **kwargs: Any) -> VisionTransformer:
563590
"""
564591
Constructs a vit_l_32 architecture from
565-
`"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" <https://arxiv.org/abs/2010.11929>`_.
592+
`An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_.
566593
567594
Args:
568-
weights (ViT_L_32_Weights, optional): The pretrained weights for the model
569-
progress (bool): If True, displays a progress bar of the download to stderr
595+
weights (:class:`~torchvision.models.vision_transformer.ViT_L_32_Weights`, optional): The pretrained
596+
weights to use. See :class:`~torchvision.models.vision_transformer.ViT_L_32_Weights`
597+
below for more details and possible values. By default, no pre-trained weights are used.
598+
progress (bool, optional): If True, displays a progress bar of the download to stderr. Default is True.
599+
**kwargs: parameters passed to the ``torchvision.models.vision_transformer.VisionTransformer``
600+
base class. Please refer to the `source code
601+
<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_
602+
for more details about this class.
603+
604+
.. autoclass:: torchvision.models.vision_transformer.ViT_L_32_Weights
605+
:members:
570606
"""
571607
weights = ViT_L_32_Weights.verify(weights)
572608

@@ -585,11 +621,20 @@ def vit_l_32(*, weights: Optional[ViT_L_32_Weights] = None, progress: bool = Tru
585621
def vit_h_14(*, weights: Optional[ViT_H_14_Weights] = None, progress: bool = True, **kwargs: Any) -> VisionTransformer:
586622
"""
587623
Constructs a vit_h_14 architecture from
588-
`"An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" <https://arxiv.org/abs/2010.11929>`_.
624+
`An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale <https://arxiv.org/abs/2010.11929>`_.
589625
590626
Args:
591-
weights (ViT_H_14_Weights, optional): The pretrained weights for the model
592-
progress (bool): If True, displays a progress bar of the download to stderr
627+
weights (:class:`~torchvision.models.vision_transformer.ViT_H_14_Weights`, optional): The pretrained
628+
weights to use. See :class:`~torchvision.models.vision_transformer.ViT_H_14_Weights`
629+
below for more details and possible values. By default, no pre-trained weights are used.
630+
progress (bool, optional): If True, displays a progress bar of the download to stderr. Default is True.
631+
**kwargs: parameters passed to the ``torchvision.models.vision_transformer.VisionTransformer``
632+
base class. Please refer to the `source code
633+
<https://github.com/pytorch/vision/blob/main/torchvision/models/vision_transformer.py>`_
634+
for more details about this class.
635+
636+
.. autoclass:: torchvision.models.vision_transformer.ViT_H_14_Weights
637+
:members:
593638
"""
594639
weights = ViT_H_14_Weights.verify(weights)
595640

0 commit comments

Comments
 (0)