Skip to content

Prototype references #7220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

NicolasHug
Copy link
Member

Same as #6433 but with latest changes / design decisions.

I encountered quite a few issues / bug (see TODOs). We will need some solid integration tests of different transformation pipelines before we migrate.

pmeier and others added 30 commits August 17, 2022 10:32
@NicolasHug
Copy link
Member Author

NicolasHug commented Mar 24, 2023

torch core version: 2.0.0.dev20230210

Classification

MobileNet V3

(pt) ➜  classification git:(proto_references_latest_at_least_for_now) ✗ PYTHONPATH=$PYTHONPATH:pwd python -u ~/slurm/run_with_submitit.py --ngpus 8 --nodes 1      --model mobilenet_v3_small --epochs 600 --opt rmsprop --batch-size 128 --lr 0.064 --wd 0.00001 --lr-step-size 2 --lr-gamma 0.973 --auto-augment imagenet --random-erase 0.2 --data-path /datasets01_ontap/imagenet_full_size/061417/

V1 on main, V2 on 170ed2a

job last epoch total Acc@1 Acc@5
V1 PIL 0:07:00 3 days, 0:35:31 66.738 87.004
V2 PIL 0:07:07 3 days, 1:45:51 66.880 86.834
V2 datapoint 0:08:05 3 days, 13:06:11 67.078 87.00
V2 tensor 0:08:01 3 days, 12:02:46 65.196 85.734

Detection

SSD lite

EDIT: see new results below #7220 (comment)

(pt) ➜  detection git:(main) ✗ PYTHONPATH=$PYTHONPATH:pwd python -u ~/slurm/run_with_submitit.py --ngpus 8 --nodes 1             --dataset coco --model ssdlite320_mobilenet_v3_large --epochs 660\
    --aspect-ratio-group-factor 3 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24\
    --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719                    

V1 on main, V2 on 035ccd7

job last epoch total AP 0.5:0.95 AP 0.5 AP 0.75
V1 PIL 0:03:18 1 day, 22:36:23 0.210 0.341 0.219
V2 PIL 0:05:04 2 days, 21:42:16 0.211 0.342 0.219
V2 datapoint 0:05:08 2 days, 22:11:00 0.211 0.341 0.219
V2 tensor 0:05:07 2 days, 22:17:10 0.211 0.340 0.222

Mask R-CNN

(pt) ➜  detection git:(main) ✗ PYTHONPATH=$PYTHONPATH:pwd python -u ~/slurm/run_with_submitit.py --ngpus 8 --nodes 1       --dataset coco --model maskrcnn_resnet50_fpn --epochs 26 --lr-steps 16 22 --aspect-ratio-group-factor 3 --weights-backbone ResNet50_Weights.IMAGENET1K_V1 --data-path /datasets01_ontap/COCO/022719                 

V1 on main efd6bc0, V2 on b5e3b91

job last epoch total box AP .5:.95 mask AP .5:.95
V1 PIL 0:15:08 7:21:32 0.388 0.348
V2 PIL 0:20:21 9:31:45 0.386 0.346
V2 datapoint 0:20:13 9:33:37 0.386 0.346
V2 tensor 0:20:21 9:32:49 0.386 0.347

Keypoint RCNN

EDIT: well this isn't super relevant, keypoints aren't supported / transformed by V2.

(pt) ➜  detection git:(main) ✗ PYTHONPATH=$PYTHONPATH:pwd python -u ~/slurm/run_with_submitit.py --ngpus 8 --nodes 1          --dataset coco_kp --model keypointrcnn_resnet50_fpn --epochs 46\
    --lr-steps 36 43 --aspect-ratio-group-factor 3 --weights-backbone ResNet50_Weights.IMAGENET1K_V1 --data-path /datasets01_ontap/COCO/022719               

V1 on main efd6bc0, V2 on c00a181

job last epoch total box AP .5:.95 KP AP .5:.95
V1 PIL 0:07:39 6:20:53 0.555 0.649
V2 PIL 0:08:36 7:03:21 0.533 0.578
V2 datapoint 0:08:33 7:01:41 0.534 0.572
V2 tensor 0:08:28 7:01:05 0.532 0.576

Segmentation

LRASPP

(pt) ➜  segmentation git:(proto_references_latest_at_least_for_now) ✗ PYTHONPATH=$PYTHONPATH:pwd python -u ~/slurm/run_with_submitit.py --ngpus 8 --nodes 1   --dataset coco -b 4 --model lraspp_mobilenet_v3_large --wd 0.000001 --weights-backbone MobileNet_V3_Large_Weights.IMAGENET1K_V1 --data-path /datasets01_ontap/COCO/022719 --backend pil

V1 on main efd6bc0 V2 on 5147d8b

job last epoch total mIoU pix acc
V1 PIL 0:05:08 2:43:33 55.1 90.5
V2 PIL 0:05:17 2:48:04 55.2 90.7
V2 datapoint 0:05:35 2:55:18 55.2 90.4
V2 tensor 0:05:36 2:57:15 55.2 90.7

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 24, 2023

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/vision/7220

Note: Links to docs will display an error until the docs builds have been completed.

❌ 4 Failures

As of commit e4de74b:

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@NicolasHug
Copy link
Member Author

New results with @pmeier after investigations from #7494 and improvements to V2 dataset warpper in #7488.

The detection V2 training references are much faster than V1 (~30%). As described in #7494 those improvements mostly don't come from the transforms, but rather from the V2 dataset wrapper which a) is much faster than the one we have in the current references and b) doesn't return masks by default, so the V2 references do not do the extra unnecessary work of transforming masks (while the V1 do).

Addressing b) for the current references is possible, and is tracked in #7489. It would lead to a more accurate speed-wise comparison of transforms V1 vs V2.

SSD lite

(pt) ➜  detection git:(main) ✗ PYTHONPATH=$PYTHONPATH:pwd python -u ~/slurm/run_with_submitit.py --ngpus 8 --nodes 1             --dataset coco --model ssdlite320_mobilenet_v3_large --epochs 660\
    --aspect-ratio-group-factor 3 --lr-scheduler cosineannealinglr --lr 0.15 --batch-size 24\
    --weight-decay 0.00004 --data-augmentation ssdlite --data-path /datasets01_ontap/COCO/022719                    

V1 PIL comes from main 5b07d6c while the V2 come from e4de74b

job last epoch total AP 0.5:0.95 AP 0.5 AP 0.75
V1 PIL 0:03:19 2 days, 0:48:03 0.212 0.341 0.221
V2 PIL 0:02:10 1 day, 12:38:20 0.211 0.341 0.221
V2 datapoint 0:02:13 1 day, 12:57:52 0.210 0.340 0.221
V2 tensor 0:02:16 1 day, 12:36:16 0.211 0.343 0.221

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants