Skip to content

Adding support of Video to remaining Transforms and Kernels #6724

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Oct 10, 2022

Conversation

datumbox
Copy link
Contributor

@datumbox datumbox commented Oct 7, 2022

Builds upon #6667

This PR ensures that the following have support for Video:

  • Updated Transforms: Grayscale, RandomGrayscale, FiveCrop, TenCrop, ConvertImageDtype
  • Updated Dispatchers: rgb_to_grayscale, get_image_size, five_crop, ten_crop
  • New Kernels: five_crop_video, ten_crop_video

Pending: MixUp and CutMix that might require additional thought on how to implement them.

@datumbox datumbox marked this pull request as draft October 7, 2022 15:32
@datumbox datumbox changed the title [WIP] Adding support of Video to missed Transforms and Kernels [WIP] Adding support of Video to remaining Transforms and Kernels Oct 7, 2022
@pmeier
Copy link
Collaborator

pmeier commented Oct 10, 2022

From #6667 (comment) only MixUp and CutMix are missing. Since there has been no discussion on the first PR, maybe we can do it here? Do we want those transforms for videos as well?

Plus, why are we only supporting images and videos in {five, ten}_crop? If we support more than images, i.e. making them more regular, shouldn't we also support bounding boxes and masks?

@datumbox
Copy link
Contributor Author

@pmeier Yes we should add Mixup/Cutmix. The PR is still in progress, I'll update it with more commits soon.

Concerning Five/Ten crop, this is usually an inference transform used in Classification. As discussed previously, supporting Detection/Segmentation will require updating the labels/bboxes not just the images. Thus we would need a BC break to support them. Extending support to them would require additional discussions. This PR aims to ensure we support all existing Video compatible kernels in V2 as we do in V1.

@datumbox
Copy link
Contributor Author

@pmeier As discussed offline, to unblock other pending PRs and reduce conflicts, we'll add MixUp/CutMix on a follow up.

@datumbox datumbox requested a review from pmeier October 10, 2022 09:16
@datumbox datumbox marked this pull request as ready for review October 10, 2022 09:17
@datumbox datumbox changed the title [WIP] Adding support of Video to remaining Transforms and Kernels Adding support of Video to remaining Transforms and Kernels Oct 10, 2022
Copy link
Collaborator

@pmeier pmeier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments inline. Otherwise LGTM if CI is green.

... def forward(self, sample: Tuple[Tuple[Union[features.Image, features.Video], ...], features.Label]):
... images_or_videos, labels = sample
... batch_size = len(images_or_videos)
... images_or_videos = features.Image.wrap_like(images_or_videos[0], torch.stack(images_or_videos))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
... images_or_videos = features.Image.wrap_like(images_or_videos[0], torch.stack(images_or_videos))
... image_or_video = images_or_videos[0]
... images_or_videos = type(image_or_video).wrap_like(image_or_video, torch.stack(images_or_videos))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I forced pushed and lost this change. Let me reapply.

Comment on lines +183 to +189
) -> Tuple[
features.ImageOrVideoType,
features.ImageOrVideoType,
features.ImageOrVideoType,
features.ImageOrVideoType,
features.ImageOrVideoType,
]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to ignore if mypy is happy

This is not accurate. We don't have Tuple[features.ImageOrVideoType, ...] here, but rather Union[Tuple[features.ImageType],...], Tuple[features.VideoType, ...]]. Meaning, the type will not vary inside the returned tuple. We either get a tuple of images or a tuple of videos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why IMO we should avoid features.ImageOrVideoType and instead define it as Union[Image, Video]. Since this will be fixed in a follow up, there is no point messing with mypy (which is happy) here. I'll implement this on a follow up.

@@ -55,6 +55,10 @@ def get_spatial_size_image_pil(image: PIL.Image.Image) -> List[int]:
return [height, width]


# TODO: Should we have get_spatial_size_video here? How about masks/bbox etc? What is the criterion for deciding when
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_spatial_size should apply to everything, right? That was the whole reason we have extracted it out, because bounding boxes and masks can provide this information, while num_channels is reserved for images and videos.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already updated get_spatial_size to handle all inputs.

I think you are trying to answer a different question from what I ask here. What I think we should discuss is whether there should be specific kernels for each type, unrelated to whether the dispatcher can handle everything. We already have kernels (like erase_video) that aren't necessarily used in the dispatcher. So here I'm asking, what should the convention over providing kernels for individual types should be.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, sorry, yes I was confused. That is a good question and I don't have an answer for it yet. My gut says that we should stay consistent and provide the kernels just as we do for the other transforms.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same feeling here. I'll leave the TODO for the follow up. I think we can answer this on the PR where we switch image_size to spatial_size

@datumbox datumbox force-pushed the prototype/video_corrections branch from ef29a77 to 9cbec19 Compare October 10, 2022 09:31
@datumbox datumbox force-pushed the prototype/video_corrections branch from c9e876b to 11bc8c2 Compare October 10, 2022 10:31
Comment on lines 1423 to 1424
tmp = (inpt.wrap_like(inpt, item) for item in output) # type: ignore[arg-type]
output = tmp # type: ignore[assignment]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay on merging. JIT was driving nuts:

RuntimeError: expected type comment but found 'ident' here:
output = (inpt.wrap_like(inpt, item) for item in output)  # type: ignore[assignment,arg-type]
~~~~~~ <--- HERE

Something breaks if ignore contains both assignment and arg-type directives. I suspect JIT might be looking for an assignment directive and completely breaks when we apply the mypy workaround. This idiom passes both JIT and mypy.

@datumbox datumbox merged commit a3fe870 into pytorch:main Oct 10, 2022
@datumbox datumbox deleted the prototype/video_corrections branch October 10, 2022 11:40
facebook-github-bot pushed a commit that referenced this pull request Oct 17, 2022
…6724)

Summary:
* Adding support of Video to missed Transforms and Kernels

* Fixing Grayscale Transform.

* Fixing FiveCrop and TenCrop Transforms.

* Fix Linter

* Fix more kernels.

* Add `five_crop_video` and `ten_crop_video` kernels

* Added a TODO.

* Missed Video isinstance

* nits

* Fix bug on AugMix

* Nits and TODOs.

* Reapply Philip's recommendation

* Fix mypy and JIT

* Fixing test

Reviewed By: NicolasHug

Differential Revision: D40427468

fbshipit-source-id: e7f699aee86b80ea3f614dc4e09ae1aaf22fc37d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants