RAFT training reference Improvement #5590

YosuaMichael · 2022-03-11T00:30:59Z

Do some of the task on: #5056

Change function name from validate to evaluate
Support --device and enable training on non-distributed mode
Include optimizer and scheduler in the checkpoint

Sample script to run on non-distributed mode and on cpu:

 python train.py \
    --dataset-root $dataset_root \
    --name $name_chairs \
    --model raft_small \
    --train-dataset chairs \
    --batch-size 2 \
    --lr 0.0004 \
    --weight-decay 0.0001 \
    --epochs 2 \
    --output-dir $out_chairs \
    --device cpu

To test on CPU, I run on a mock dataset by replacing https://github.com/pytorch/vision/blob/main/torchvision/datasets/_optical_flow.py with https://gist.github.com/YosuaMichael/9c49729243ff9d467ece06ab8641680d.

Note that as of now, if we run on distributed mode using torchrun, then it must use --device cuda.

…so it is similar to other references

facebook-github-bot · 2022-03-11T00:31:07Z

💊 CI failures summary and remediations

As of commit 0e7ab27 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

NicolasHug

Thank you for the PR @YosuaMichael. There are 2 minor issues (see below), but otherwise this looks great!

references/optical_flow/train.py

…/vision into raft-reference-improvement

YosuaMichael · 2022-03-11T15:03:12Z

Update:

Support saving of the optimizer and scheduler on the checkpoint.

YosuaMichael · 2022-03-11T15:05:27Z

Hi @NicolasHug , I decided to put the commit for saving optimizer and scheduler in this PR as well: 09d78d1
Could you also help to review this? Thanks!

NicolasHug

Thanks @YosuaMichael , we're almost there :) . I made a few comments below

references/optical_flow/train.py

NicolasHug

Thanks @YosuaMichael, nice work ! There was a minor issue left, which I fixed in 2857e21: when no trainset is specified we want to directly go to evaluate, without worrying about train_dataset - the previous code would fail because it's None.

github-actions · 2022-03-15T10:20:28Z

Hey @NicolasHug!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

Summary: * Change optical flow train.py function name from validate to evaluate so it is similar to other references * Add --device as parameter and enable to run in non distributed mode * Format with ufmt * Fix unneccessary param and bug * Enable saving the optimizer and scheduler on the checkpoint * Fix bug when evaluate before resume and save or load model without ddp * Fix case where --train-dataset is None (Note: this ignores all push blocking failures!) Reviewed By: YosuaMichael Differential Revision: D35216768 fbshipit-source-id: 3b575d9f4a51caed920ff402e160a26ff6f3c2d4 Co-authored-by: Nicolas Hug <[email protected]>

YosuaMichael added 2 commits March 10, 2022 21:03

Change optical flow train.py function name from validate to evaluate …

1e00873

…so it is similar to other references

Add --device as parameter and enable to run in non distributed mode

0eab0bf

pytorch-bot bot added the ciflow/default label Mar 11, 2022

facebook-github-bot added the cla signed label Mar 11, 2022

Format with ufmt

7502076

NicolasHug reviewed Mar 11, 2022

View reviewed changes

references/optical_flow/train.py Outdated Show resolved Hide resolved

references/optical_flow/train.py Outdated Show resolved Hide resolved

YosuaMichael added 4 commits March 11, 2022 14:13

Fix unneccessary param and bug

2276b22

Merge branch 'main' into raft-reference-improvement

38f5a9c

Enable saving the optimizer and scheduler on the checkpoint

09d78d1

:Merge branch 'raft-reference-improvement' of github.com:YosuaMichael…

3a4d43a

…/vision into raft-reference-improvement

YosuaMichael changed the title ~~Enable RAFT training reference to run on cpu and non-distributed mode~~ RAFT training reference Improvement Mar 11, 2022

NicolasHug reviewed Mar 14, 2022

View reviewed changes

references/optical_flow/train.py Outdated Show resolved Hide resolved

references/optical_flow/train.py Outdated Show resolved Hide resolved

references/optical_flow/train.py Outdated Show resolved Hide resolved

references/optical_flow/train.py Outdated Show resolved Hide resolved

YosuaMichael and others added 2 commits March 14, 2022 23:05

Fix bug when evaluate before resume and save or load model without ddp

83a09cd

Fix case where --train-dataset is None

2857e21

NicolasHug approved these changes Mar 15, 2022

View reviewed changes

Merge branch 'main' into raft-reference-improvement

0e7ab27

NicolasHug merged commit 3aa2a93 into pytorch:main Mar 15, 2022

NicolasHug mentioned this pull request Mar 15, 2022

Follow-up improvements to RAFT training reference #5056

Closed

7 tasks

NicolasHug added enhancement module: reference scripts labels Mar 15, 2022

NicolasHug mentioned this pull request Mar 21, 2022

Minor updates to optical flow ref for consistency #5654

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RAFT training reference Improvement #5590

RAFT training reference Improvement #5590

YosuaMichael commented Mar 11, 2022 •

edited

Loading

facebook-github-bot commented Mar 11, 2022 •

edited

Loading

NicolasHug left a comment

YosuaMichael commented Mar 11, 2022

YosuaMichael commented Mar 11, 2022

NicolasHug left a comment •

edited

Loading

NicolasHug left a comment •

edited

Loading

github-actions bot commented Mar 15, 2022

RAFT training reference Improvement #5590

RAFT training reference Improvement #5590

Conversation

YosuaMichael commented Mar 11, 2022 • edited Loading

facebook-github-bot commented Mar 11, 2022 • edited Loading

💊 CI failures summary and remediations

NicolasHug left a comment

Choose a reason for hiding this comment

YosuaMichael commented Mar 11, 2022

YosuaMichael commented Mar 11, 2022

NicolasHug left a comment • edited Loading

Choose a reason for hiding this comment

NicolasHug left a comment • edited Loading

Choose a reason for hiding this comment

github-actions bot commented Mar 15, 2022

YosuaMichael commented Mar 11, 2022 •

edited

Loading

facebook-github-bot commented Mar 11, 2022 •

edited

Loading

NicolasHug left a comment •

edited

Loading

NicolasHug left a comment •

edited

Loading