Add training reference for optical flow models #5027

NicolasHug · 2021-12-03T18:17:50Z

Towards #4644

This PR adds a training reference for optical flow models (mostly just RAFT at the moment), as well utilities for evaluating the model on Sintel (epe, 1px, 3px, 5px) or Kitti (per-image-epe, F1).

Right now, the training script assumes that CUDA is available and that DDP is available as well. It must be run with torchrun, e.g.

torchrun --nproc_per_node 8 --nnodes 1 references/optical_flow/train.py --batch-size 10 --train-dataset chairs --val-dataset kitti sintel

Our custom run_with_submitit.py script is also partially supported (but it's not as useful anyway, because the training procedure involves training on more than one dataset).

I marked a few TODO comments, leaving them as future potential improvements.

CC @fmassa @datumbox @haooooooqi

cc @datumbox

facebook-github-bot · 2021-12-03T18:17:58Z

💊 CI failures summary and remediations

As of commit 2ef1af5 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

references/optical_flow/utils.py

NicolasHug · 2021-12-03T18:21:50Z

references/optical_flow/utils.py

+
+
+def sequence_loss(flow_preds, flow_gt, valid_flow_mask, gamma=0.8, max_flow=400):
+    """Loss function defined over sequence of flow predictions"""


This loss function is very RAFT-specific, because it assumes the model outputs a series of predictions, instead of a single predicted flow.

NicolasHug · 2021-12-03T18:22:47Z

references/optical_flow/utils.py

+import torch.nn.functional as F
+
+
+class SmoothedValue:


This, and the MetricLogger below have been copy/pasted from the classification references. I only made some very minor changes like setting some defaults, probably not worth reviewing.

references/optical_flow/train.py

fmassa

Thanks for the PR!

I've made a few comments, all of which can be addressed in follow-up PRs.

In particular, I think we can split a bit more the loss so that there are parts that can be re-used across train / val.

fmassa · 2021-12-07T09:54:53Z

references/optical_flow/train.py

+            torch.save(model.state_dict(), Path(args.output_dir) / f"{args.name}_{current_epoch}.pth")
+            torch.save(model.state_dict(), Path(args.output_dir) / f"{args.name}.pth")


Should we also save the optimizer and the scheduler so that we can resume training? This is what we do in the other reference scripts

We should definitely do this. I think it's worth refactoring the script to have same functionality and structure as other reference scripts. Moreover we will need to link the ref scripts with the model prototype and add the --weights feature switch.

@NicolasHug do you mind creating an issue for all the above so that we dont forget?

Opened #5056

references/optical_flow/utils.py

fmassa · 2021-12-07T10:53:26Z

references/optical_flow/train.py

+        # As future improvement, we could probably be using a distributed sampler here
+        # The distribution is S(.71), T(.135), K(.135), H(.02)
+        return 100 * sintel + 200 * kitti + 5 * hd1k + things_clean


Ok with me. So you added support for __mul__ in those datasets?

references/optical_flow/train.py

Reviewed By: NicolasHug Differential Revision: D32950938 fbshipit-source-id: 0f271d45026c821c109493d9aa7f404b5373012d

Add training reference for optical flow models

f566a78

NicolasHug added module: reference scripts other if you have no clue or if you will manually handle the PR in the release notes labels Dec 3, 2021

pytorch-probot bot added the ciflow/default label Dec 3, 2021

facebook-github-bot added the cla signed label Dec 3, 2021

NicolasHug mentioned this pull request Dec 3, 2021

RAFT model and training reference #4644

Closed

12 tasks

NicolasHug commented Dec 3, 2021

View reviewed changes

f1 computation: show percentage

4707883

NicolasHug mentioned this pull request Dec 7, 2021

Add raft builders and presets in prototypes #5043

Merged

fmassa approved these changes Dec 7, 2021

View reviewed changes

NicolasHug added 7 commits December 7, 2021 14:11

Merge branch 'main' of github.com:pytorch/vision into raft_training_ref

df4d9a8

Unify epe and metrics computations

e49844f

avoid for loop in sequence_loss

2396716

remove old code

2783402

create separate train_one_epoch function

1f7159f

Added TODO for the last remaining comment

be1d9f3

remove old param

2ef1af5

NicolasHug merged commit 4dd8b5c into pytorch:main Dec 7, 2021

facebook-github-bot pushed a commit that referenced this pull request Dec 9, 2021

[fbsync] Add training reference for optical flow models (#5027)

cf9a5aa

Reviewed By: NicolasHug Differential Revision: D32950938 fbshipit-source-id: 0f271d45026c821c109493d9aa7f404b5373012d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add training reference for optical flow models #5027

Add training reference for optical flow models #5027

Uh oh!

NicolasHug commented Dec 3, 2021 •

edited by pytorch-probot bot

Loading

Uh oh!

facebook-github-bot commented Dec 3, 2021 •

edited

Loading

Uh oh!

Uh oh!

NicolasHug Dec 3, 2021

Uh oh!

NicolasHug Dec 3, 2021

Uh oh!

Uh oh!

fmassa left a comment

Uh oh!

fmassa Dec 7, 2021

Uh oh!

datumbox Dec 7, 2021

Uh oh!

NicolasHug Dec 8, 2021

Uh oh!

Uh oh!

fmassa Dec 7, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!



		def sequence_loss(flow_preds, flow_gt, valid_flow_mask, gamma=0.8, max_flow=400):
		"""Loss function defined over sequence of flow predictions"""

		torch.save(model.state_dict(), Path(args.output_dir) / f"{args.name}_{current_epoch}.pth")
		torch.save(model.state_dict(), Path(args.output_dir) / f"{args.name}.pth")

		import torch.nn.functional as F


		class SmoothedValue:

Add training reference for optical flow models #5027

Add training reference for optical flow models #5027

Uh oh!

Conversation

NicolasHug commented Dec 3, 2021 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Dec 3, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

Uh oh!

NicolasHug Dec 3, 2021

Choose a reason for hiding this comment

Uh oh!

NicolasHug Dec 3, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

fmassa Dec 7, 2021

Choose a reason for hiding this comment

Uh oh!

datumbox Dec 7, 2021

Choose a reason for hiding this comment

Uh oh!

NicolasHug Dec 8, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

fmassa Dec 7, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NicolasHug commented Dec 3, 2021 •

edited by pytorch-probot bot

Loading

facebook-github-bot commented Dec 3, 2021 •

edited

Loading