Skip to content

Added SceneFLow variant datasets #6345

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Aug 3, 2022
Merged

Added SceneFLow variant datasets #6345

merged 4 commits into from
Aug 3, 2022

Conversation

TeodorPoncu
Copy link
Contributor

This is a continuation of the PR split (#6311, #6269) which contains the SceneFlow dataset variants.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TeodorPoncu , just a few comments but it looks great already

"final": "frames_finalpass",
}

num_examples = 1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
num_examples = 1


Args:
root (string): Root directory where SceneFlow is located.
split (string): Which dataset variant to user, "FlyingThings3D" (default), "Monkaa" or "Driving".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on our discussoin in #6269 (comment), it looks like we don't really need to expose this dataset. There's no obvious use-case yet, so I'm happy to drop it.

If really we want to allow users to control this, I think we should just call this parameter variant instead of split, as split is usually reserved for train / test / val. But perhaps the dataset authors have a dedicated name already?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the CREStereo paper, they have a vague formulation from which one could deduce that they are using ALL mentioned datasets (about 6 synthetic datasets if I recall correctly).

I agree that variant might be a better naming scheme.

class SceneFlowStereoTestCase(datasets_utils.ImageDatasetTestCase):
DATASET_CLASS = datasets.SceneFlowStereo
ADDITIONAL_CONFIGS = datasets_utils.combinations_grid(
split=("FlyingThings3D", "Driving", "Monkaa"), pass_name=("clean", "final")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
split=("FlyingThings3D", "Driving", "Monkaa"), pass_name=("clean", "final")
split=("FlyingThings3D", "Driving", "Monkaa"), pass_name=("clean", "final", "both")

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TeodorPoncu LGTM, just one final question regarding the variant parameter


root = Path(root) / "SceneFlow"

verify_str_arg(variant, "variant", valid_values=("FlyingThings3D", "Driving", "Monkaa"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused: I thought we were relying on all variants for the training? Do we want to allow "all" here as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, in the Optical Flow datasets we directly provide access to FlyingThings3D. Some authors use all the variants, or just a subset of them. An "all" split could work as well, however if a user requires some sort of variant combination I guess they could opt for ConcatDataset, without requiring us to provide a way of handling variant permutations.

If I recall correctly we settled for removing variants for the CREStereo Dataset since the only existing use-case is to use it with all its variants (#6351, #6269 (comment)).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I recall correctly we settled for removing variants for the CREStereo Dataset since the only existing use-case is to use it with all its variants (#6351, #6269 (comment)).

Is this the case here as well?

Honestly I'm fine with either at this point, I'm just surprised that we don't need "all" because this is what we needed for CREStereo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add an "all" split as well. As a user, generally my preference is to have access to a "smaller" version of a dataset for quick experimentation / validation. Similarly to the ImageNet vs. ImageNette scenario.

As far as I am aware RAFT-Stereo uses all 3 variants as well, however I do not have an exhaustive list to guarantee that this is a universally used approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, well let's just leave it as is and figure out whether not having it complicates things in the training reference. Thanks for the details

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, this is a configuration example for a dataset chain schedule, similarly to how RAFT-Stereo is used, where the datasets are trained in sequential order, performing optimisation using samples from the dataset for the specified number of steps. Adding the "all" variant or removing the variant argument all-together would result in:

train_dataset: ["instereo-2k", "sintel", "sceneflow"]
dataset_steps: [200_000, 150_000, 100_000]

Having the current version would yield:

train_dataset: ["instereo-2k", "sintel", "flythings3d", "monkaa", "driving"]
dataset_steps: [200_000, 150_000, 33_000, 33_000, 33_000]

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @TeodorPoncu

@TeodorPoncu TeodorPoncu merged commit 96aa3d9 into main Aug 3, 2022
@github-actions
Copy link

github-actions bot commented Aug 3, 2022

Hey @TeodorPoncu!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

facebook-github-bot pushed a commit that referenced this pull request Aug 23, 2022
Summary:
* added SceneFLow variant datasets

* Changed split name to variant name

* removed trailing commented code line

Reviewed By: datumbox

Differential Revision: D38824231

fbshipit-source-id: 14dc283f11df26287fe6446946b441f51eb82181
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants