pytorch
diff --git a/‎.circleci/unittest/linux/scripts/run_test.sh
Lines changed: 1 addition & 1 deletion b/‎.circleci/unittest/linux/scripts/run_test.sh
Lines changed: 1 addition & 1 deletion
diff --git a/‎.circleci/unittest/windows/scripts/run_test.sh
Lines changed: 1 addition & 1 deletion b/‎.circleci/unittest/windows/scripts/run_test.sh
Lines changed: 1 addition & 1 deletion
diff --git a/‎.coveragerc
Lines changed: 0 additions & 7 deletions b/‎.coveragerc
Lines changed: 0 additions & 7 deletions
diff --git a/‎CONTRIBUTING.md
Lines changed: 2 additions & 3 deletions b/‎CONTRIBUTING.md
Lines changed: 2 additions & 3 deletions
diff --git a/‎CONTRIBUTING_MODELS.md
Lines changed: 65 additions & 0 deletions b/‎CONTRIBUTING_MODELS.md
Lines changed: 65 additions & 0 deletions
diff --git a/‎docs/source/datasets.rst
Lines changed: 68 additions & 22 deletions b/‎docs/source/datasets.rst
Lines changed: 68 additions & 22 deletions
diff --git a/‎docs/source/models.rst
Lines changed: 24 additions & 1 deletion b/‎docs/source/models.rst
Lines changed: 24 additions & 1 deletion
diff --git a/‎docs/source/ops.rst
Lines changed: 1 addition & 0 deletions b/‎docs/source/ops.rst
Lines changed: 1 addition & 0 deletions
diff --git a/‎hubconf.py
Lines changed: 1 addition & 0 deletions b/‎hubconf.py
Lines changed: 1 addition & 0 deletions
diff --git a/‎references/classification/README.md
Lines changed: 3 additions & 2 deletions b/‎references/classification/README.md
Lines changed: 3 additions & 2 deletions
diff --git a/‎references/classification/train.py
Lines changed: 1 addition & 1 deletion b/‎references/classification/train.py
Lines changed: 1 addition & 1 deletion
diff --git a/‎references/classification/train_quantization.py
Lines changed: 15 additions & 7 deletions b/‎references/classification/train_quantization.py
Lines changed: 15 additions & 7 deletions
@@ -7,4 +7,4 @@ conda activate ./env
 
 export PYTORCH_TEST_WITH_SLOW='1'
 python -m torch.utils.collect_env
-pytest --cov=torchvision --junitxml=test-results/junit.xml -v --durations 20
+pytest --junitxml=test-results/junit.xml -v --durations 20
@@ -10,4 +10,4 @@ source "$this_dir/set_cuda_envs.sh"
 
 export PYTORCH_TEST_WITH_SLOW='1'
 python -m torch.utils.collect_env
-pytest --cov=torchvision --junitxml=test-results/junit.xml -v --durations 20
+pytest --junitxml=test-results/junit.xml -v --durations 20
@@ -186,10 +186,9 @@ You can also choose to only build a subset of the examples by using the
 example ``EXAMPLES_PATTERN="transforms" make html`` will only build the examples
 with "transforms" in their name.
 
-### New model
+### New architecture or improved model weights
 
-More details on how to add a new model will be provided later. Please, do not send any PR with a new model without discussing 
-it in an issue as, most likely, it will not be accepted.
+Please refer to the guidelines in [Contributing to Torchvision - Models](https://github.com/pytorch/vision/blob/main/CONTRIBUTING_MODELS.md).
 
 ### New dataset
 
 
@@ -0,0 +1,65 @@
+# Contributing to Torchvision - Models
+
+- [New Model Architectures - Overview](#new-model-architectures---overview)
+
+- [New Weights for Existing Model Architectures](#new-weights-for-existing-model-architectures)
+
+## New Model Architectures - Overview
+
+For someone who would be interested in adding a model architecture, it is also expected to train the model, so here are a few important considerations:
+
+- Training big models requires lots of resources and the cost quickly adds up
+
+- Reproducing models is fun but also risky as you might not always get the results reported on the paper. It might require a huge amount of effort to close the gap
+
+- The contribution might not get merged if we significantly lack in terms of accuracy, speed etc
+
+- Including new models in TorchVision might not be the best approach, so other options such as releasing the model through to [Pytorch Hub](https://pytorch.org/hub/) should be considered
+
+So, before starting any work and submitting a PR there are a few critical things that need to be taken into account in order to make sure the planned contribution is within the context of TorchVision, and the requirements and expectations are discussed beforehand. If this step is skipped and a PR is submitted without prior discussion it will almost certainly be rejected.
+
+### 1. Preparation work
+
+- Start by looking into this [issue](https://github.com/pytorch/vision/issues/2707) in order to have an idea of the models that are being considered, express your willingness to add a new model and discuss with the community whether or not this model should be included in TorchVision. It is very important at this stage to make sure that there is an agreement on the value of having this model in TorchVision and there is no one else already working on it.
+
+- If the decision is to include the new model, then please create a new ticket which will be used for all design and implementation discussions prior to the PR. One of the TorchVision maintainers will reach out at this stage and this will be your POC from this point onwards in order to provide support, guidance and regular feedback.
+
+### 2.  Implement the model
+
+Please take a look at existing models in TorchVision to get familiar with the idioms. Also please look at recent contributions for new models. If in doubt about any design decisions you can ask for feedback on the issue created in step 1.  Example of things to take into account:
+
+- The implementation should be as close as possible to the canonical implementation/paper
+- The PR must include the code implementation, documentation and tests
+- It should also extend the existing reference scripts used to train the model
+- The weights need to reproduce closely the results of the paper in terms of accuracy, even though the final weights to be deployed will be those trained by the TorchVision maintainers
+- The PR description should include commands/configuration used to train the model, so that the TorchVision maintainers can easily run them to verify the implementation and generate the final model to be released
+- Make sure we re-use existing components as much as possible (inheritance)
+- New primitives (transforms, losses, etc) can be added if necessary, but the final location will be determined after discussion with the dedicated maintainer
+- Please take a look at the detailed [implementation and documentation guidelines](https://github.com/pytorch/vision/issues/5319) for a fine grain list of things not to be missed
+
+### 3. Train the model with reference scripts
+
+To validate the new model against the common benchmark, as well as to generate pre-trained weights, you must use TorchVision’s reference scripts to train the model.
+
+Make sure all logs and a final (or best) checkpoint are saved, because it is expected that a submission shows that a model has been successfully trained  and the results are in line with the original paper/repository. This will allow the reviewers to quickly check the validity of the submission, but please note that the final model to be released will be re-trained by the maintainers in order to verify reproducibility,  ensure that the changes occurred during the PR review did not introduce any bugs, and to avoid moving around a large amount of data (including all checkpoints and logs).
+
+### 4. Submit a PR
+
+Submit a PR and tag the assigned maintainer. This PR should:
+
+- Link the original ticket
+- Provide a link for the original paper and the original repository if available
+- Highlight the important test metrics and how they compare to the original paper
+- Highlight any design choices that deviate from the original paper/implementation and rationale for these choices
+
+## New Weights for Existing Model Architectures
+
+The process of improving existing models, for instance improving accuracy by retraining the model with a different set of hyperparameters or augmentations, is the following:
+
+1. Open a ticket and discuss with the community and maintainers whether this improvement should be added to TorchVision. Note that to add new weights the improvement should be significant.
+
+2. Train the model using TorchVision reference scripts. You can add new primitives (transforms, losses, etc) when necessary, but the final location will be determined after discussion with the dedicated maintainer.
+
+3. Open a PR with the new weights, together with the training logs and the checkpoint chosen so the reviewers can verify the submission.  Details on how the model was trained, i.e., the training command using the reference scripts, should be included in the PR.
+
+4. The PR reviewers should replicate the results on their side to verify the submission and if all goes well the new weights should be ready to be released!
@@ -5,7 +5,7 @@ Torchvision provides many built-in datasets in the ``torchvision.datasets``
 module, as well as utility classes for building your own datasets.
 
 Built-in datasets
-~~~~~~~~~~~~~~~~~
+-----------------
 
 All datasets are subclasses of :class:`torch.utils.data.Dataset`
 i.e, they have ``__getitem__`` and ``__len__`` methods implemented.
@@ -25,6 +25,8 @@ All the datasets have almost similar API. They all have two common arguments:
 ``transform`` and  ``target_transform`` to transform the input and target respectively.
 You can also create your own datasets using the provided :ref:`base classes <base_classes_datasets>`.
 
+Image classification
+~~~~~~~~~~~~~~~~~~~~
 
 .. autosummary::
     :toctree: generated/
@@ -35,61 +37,105 @@ You can also create your own datasets using the provided :ref:`base classes <bas
     CelebA
     CIFAR10
     CIFAR100
-    Cityscapes
-    CocoCaptions
-    CocoDetection
     Country211
     DTD
     EMNIST
     EuroSAT
     FakeData
     FashionMNIST
     FER2013
+    FGVCAircraft
     Flickr8k
     Flickr30k
     Flowers102
-    FlyingChairs
-    FlyingThings3D
     Food101
-    FGVCAircraft
     GTSRB
-    HD1K
-    HMDB51
-    ImageNet
     INaturalist
-    Kinetics400
-    Kitti
-    KittiFlow
+    ImageNet
     KMNIST
     LFWPeople
-    LFWPairs
     LSUN
     MNIST
     Omniglot
     OxfordIIITPet
-    PCAM
-    PhotoTour
     Places365
-    RenderedSST2
+    PCAM
     QMNIST
-    SBDataset
-    SBU
+    RenderedSST2
     SEMEION
-    Sintel
+    SBU
     StanfordCars
     STL10
     SUN397
     SVHN
-    UCF101
     USPS
+
+Image detection or segmentation
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. autosummary::
+    :toctree: generated/
+    :template: class_dataset.rst
+
+    CocoDetection
+    CelebA
+    Cityscapes
+    GTSRB
+    Kitti
+    OxfordIIITPet
+    SBDataset
     VOCSegmentation
     VOCDetection
     WIDERFace
 
+Optical Flow
+~~~~~~~~~~~~
+
+.. autosummary::
+    :toctree: generated/
+    :template: class_dataset.rst
+
+    FlyingChairs
+    FlyingThings3D
+    HD1K
+    KittiFlow
+    Sintel
+
+Image pairs
+~~~~~~~~~~~
+
+.. autosummary::
+    :toctree: generated/
+    :template: class_dataset.rst
+
+    LFWPairs
+    PhotoTour
+
+Image captioning
+~~~~~~~~~~~~~~~~
+
+.. autosummary::
+    :toctree: generated/
+    :template: class_dataset.rst
+
+    CocoCaptions
+
+Video classification
+~~~~~~~~~~~~~~~~~~~~
+
+.. autosummary::
+    :toctree: generated/
+    :template: class_dataset.rst
+
+    HMDB51
+    Kinetics400
+    UCF101
+
+
 .. _base_classes_datasets:
 
 Base classes for custom datasets
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+--------------------------------
 
 .. autosummary::
     :toctree: generated/
 
@@ -89,6 +89,10 @@ You can construct a model with random weights by calling its constructor:
     vit_b_32 = models.vit_b_32()
     vit_l_16 = models.vit_l_16()
     vit_l_32 = models.vit_l_32()
+    convnext_tiny = models.convnext_tiny()
+    convnext_small = models.convnext_small()
+    convnext_base = models.convnext_base()
+    convnext_large = models.convnext_large()
 
 We provide pre-trained models, using the PyTorch :mod:`torch.utils.model_zoo`.
 These can be constructed by passing ``pretrained=True``:
@@ -136,6 +140,10 @@ These can be constructed by passing ``pretrained=True``:
     vit_b_32 = models.vit_b_32(pretrained=True)
     vit_l_16 = models.vit_l_16(pretrained=True)
     vit_l_32 = models.vit_l_32(pretrained=True)
+    convnext_tiny = models.convnext_tiny(pretrained=True)
+    convnext_small = models.convnext_small(pretrained=True)
+    convnext_base = models.convnext_base(pretrained=True)
+    convnext_large = models.convnext_large(pretrained=True)
 
 Instancing a pre-trained model will download its weights to a cache directory.
 This directory can be set using the `TORCH_HOME` environment variable. See
@@ -248,7 +256,10 @@ vit_b_16                          81.072          95.318
 vit_b_32                          75.912          92.466
 vit_l_16                          79.662          94.638
 vit_l_32                          76.972          93.070
-convnext_tiny (prototype)         82.520          96.146
+convnext_tiny                     82.520          96.146
+convnext_small                    83.616          96.650
+convnext_base                     84.062          96.870
+convnext_large                    84.414          96.976
 ================================  =============   =============
 
 
@@ -464,6 +475,18 @@ VisionTransformer
     vit_l_16
     vit_l_32
 
+ConvNeXt
+--------
+
+.. autosummary::
+    :toctree: generated/
+    :template: function.rst
+
+    convnext_tiny
+    convnext_small
+    convnext_base
+    convnext_large
+
 Quantized Models
 ----------------
 
 
@@ -21,6 +21,7 @@ Operators
     clip_boxes_to_image
     deform_conv2d
     generalized_box_iou
+    generalized_box_iou_loss
     masks_to_boxes
     nms
     ps_roi_align
 
@@ -2,6 +2,7 @@
 dependencies = ["torch"]
 
 from torchvision.models.alexnet import alexnet
+from torchvision.models.convnext import convnext_tiny, convnext_small, convnext_base, convnext_large
 from torchvision.models.densenet import densenet121, densenet169, densenet201, densenet161
 from torchvision.models.efficientnet import (
     efficientnet_b0,
 
@@ -201,11 +201,12 @@ and `--batch_size 64`.
 ### ConvNeXt
 ```
 torchrun --nproc_per_node=8 train.py\ 
---model convnext_tiny --batch-size 128 --opt adamw --lr 1e-3 --lr-scheduler cosineannealinglr \ 
+--model $MODEL --batch-size 128 --opt adamw --lr 1e-3 --lr-scheduler cosineannealinglr \ 
 --lr-warmup-epochs 5 --lr-warmup-method linear --auto-augment ta_wide --epochs 600 --random-erase 0.1 \ 
 --label-smoothing 0.1 --mixup-alpha 0.2 --cutmix-alpha 1.0 --weight-decay 0.05 --norm-weight-decay 0.0 \
---train-crop-size 176 --model-ema --val-resize-size 236 --ra-sampler --ra-reps 4
+--train-crop-size 176 --model-ema --val-resize-size 232 --ra-sampler --ra-reps 4
 ```
+Here `$MODEL` is one of `convnext_tiny`, `convnext_small`, `convnext_base` and `convnext_large`. Note that each variant had its `--val-resize-size` optimized in a post-training step, see their `Weights` entry for their exact value.
 
 Note that the above command corresponds to training on a single node with 8 GPUs.
 For generatring the pre-trained weights, we trained with 2 nodes, each with 8 GPUs (for a total of 16 GPUs),
 
@@ -178,7 +178,7 @@ def load_data(traindir, valdir, args):
 
     print("Creating data loaders")
     if args.distributed:
-        if args.ra_sampler:
+        if hasattr(args, "ra_sampler") and args.ra_sampler:
             train_sampler = RASampler(dataset, shuffle=True, repetitions=args.ra_reps)
         else:
             train_sampler = torch.utils.data.distributed.DistributedSampler(dataset)
 
@@ -13,14 +13,16 @@
 
 
 try:
-    from torchvision.prototype import models as PM
+    from torchvision import prototype
 except ImportError:
-    PM = None
+    prototype = None
 
 
 def main(args):
-    if args.weights and PM is None:
+    if args.prototype and prototype is None:
         raise ImportError("The prototype module couldn't be found. Please install the latest torchvision nightly.")
+    if not args.prototype and args.weights:
+        raise ValueError("The weights parameter works only in prototype mode. Please pass the --prototype argument.")
     if args.output_dir:
         utils.mkdir(args.output_dir)
 
@@ -54,14 +56,14 @@ def main(args):
 
     print("Creating model", args.model)
     # when training quantized models, we always start from a pre-trained fp32 reference model
-    if not args.weights:
+    if not args.prototype:
         model = torchvision.models.quantization.__dict__[args.model](pretrained=True, quantize=args.test_only)
     else:
-        model = PM.quantization.__dict__[args.model](weights=args.weights, quantize=args.test_only)
+        model = prototype.models.quantization.__dict__[args.model](weights=args.weights, quantize=args.test_only)
     model.to(device)
 
     if not (args.test_only or args.post_training_quantize):
-        model.fuse_model()
+        model.fuse_model(is_qat=True)
         model.qconfig = torch.ao.quantization.get_default_qat_qconfig(args.backend)
         torch.ao.quantization.prepare_qat(model, inplace=True)
 
@@ -95,7 +97,7 @@ def main(args):
             ds, batch_size=args.batch_size, shuffle=False, num_workers=args.workers, pin_memory=True
         )
         model.eval()
-        model.fuse_model()
+        model.fuse_model(is_qat=False)
         model.qconfig = torch.ao.quantization.get_default_qconfig(args.backend)
         torch.ao.quantization.prepare(model, inplace=True)
         # Calibrate first
@@ -264,6 +266,12 @@ def get_args_parser(add_help=True):
     parser.add_argument("--clip-grad-norm", default=None, type=float, help="the maximum gradient norm (default None)")
 
     # Prototype models only
+    parser.add_argument(
+        "--prototype",
+        dest="prototype",
+        help="Use prototype model builders instead those from main area",
+        action="store_true",
+    )
     parser.add_argument("--weights", default=None, type=str, help="the weights enum name to load")
 
     return parser