Adding support of Video to remaining Transforms and Kernels #6724

datumbox · 2022-10-07T15:31:38Z

Builds upon #6667

This PR ensures that the following have support for Video:

Updated Transforms: Grayscale, RandomGrayscale, FiveCrop, TenCrop, ConvertImageDtype
Updated Dispatchers: rgb_to_grayscale, get_image_size, five_crop, ten_crop
New Kernels: five_crop_video, ten_crop_video

Pending: MixUp and CutMix that might require additional thought on how to implement them.

pmeier · 2022-10-10T05:56:27Z

From #6667 (comment) only MixUp and CutMix are missing. Since there has been no discussion on the first PR, maybe we can do it here? Do we want those transforms for videos as well?

Plus, why are we only supporting images and videos in {five, ten}_crop? If we support more than images, i.e. making them more regular, shouldn't we also support bounding boxes and masks?

datumbox · 2022-10-10T08:23:24Z

@pmeier Yes we should add Mixup/Cutmix. The PR is still in progress, I'll update it with more commits soon.

Concerning Five/Ten crop, this is usually an inference transform used in Classification. As discussed previously, supporting Detection/Segmentation will require updating the labels/bboxes not just the images. Thus we would need a BC break to support them. Extending support to them would require additional discussions. This PR aims to ensure we support all existing Video compatible kernels in V2 as we do in V1.

datumbox · 2022-10-10T09:16:47Z

@pmeier As discussed offline, to unblock other pending PRs and reduce conflicts, we'll add MixUp/CutMix on a follow up.

pmeier

Some minor comments inline. Otherwise LGTM if CI is green.

torchvision/prototype/features/_video.py

pmeier · 2022-10-10T09:19:50Z

torchvision/prototype/transforms/_geometry.py

+        ...     def forward(self, sample: Tuple[Tuple[Union[features.Image, features.Video], ...], features.Label]):
+        ...         images_or_videos, labels = sample
+        ...         batch_size = len(images_or_videos)
+        ...         images_or_videos = features.Image.wrap_like(images_or_videos[0], torch.stack(images_or_videos))


Suggested change

... images_or_videos = features.Image.wrap_like(images_or_videos[0], torch.stack(images_or_videos))

... image_or_video = images_or_videos[0]

... images_or_videos = type(image_or_video).wrap_like(image_or_video, torch.stack(images_or_videos))

Sorry I forced pushed and lost this change. Let me reapply.

torchvision/prototype/transforms/_geometry.py

pmeier · 2022-10-10T09:23:35Z

torchvision/prototype/transforms/_geometry.py

+    ) -> Tuple[
+        features.ImageOrVideoType,
+        features.ImageOrVideoType,
+        features.ImageOrVideoType,
+        features.ImageOrVideoType,
+        features.ImageOrVideoType,
+    ]:


Feel free to ignore if mypy is happy

This is not accurate. We don't have Tuple[features.ImageOrVideoType, ...] here, but rather Union[Tuple[features.ImageType],...], Tuple[features.VideoType, ...]]. Meaning, the type will not vary inside the returned tuple. We either get a tuple of images or a tuple of videos.

This is why IMO we should avoid features.ImageOrVideoType and instead define it as Union[Image, Video]. Since this will be fixed in a follow up, there is no point messing with mypy (which is happy) here. I'll implement this on a follow up.

torchvision/prototype/transforms/_geometry.py

pmeier · 2022-10-10T09:26:08Z

torchvision/prototype/transforms/functional/_meta.py

@@ -55,6 +55,10 @@ def get_spatial_size_image_pil(image: PIL.Image.Image) -> List[int]:
    return [height, width]


+# TODO: Should we have get_spatial_size_video here? How about masks/bbox etc? What is the criterion for deciding when


get_spatial_size should apply to everything, right? That was the whole reason we have extracted it out, because bounding boxes and masks can provide this information, while num_channels is reserved for images and videos.

I've already updated get_spatial_size to handle all inputs.

I think you are trying to answer a different question from what I ask here. What I think we should discuss is whether there should be specific kernels for each type, unrelated to whether the dispatcher can handle everything. We already have kernels (like erase_video) that aren't necessarily used in the dispatcher. So here I'm asking, what should the convention over providing kernels for individual types should be.

Ah, sorry, yes I was confused. That is a good question and I don't have an answer for it yet. My gut says that we should stay consistent and provide the kernels just as we do for the other transforms.

Same feeling here. I'll leave the TODO for the follow up. I think we can answer this on the PR where we switch image_size to spatial_size

torchvision/prototype/transforms/_auto_augment.py

datumbox · 2022-10-10T10:59:52Z

torchvision/prototype/transforms/functional/_geometry.py

+            tmp = (inpt.wrap_like(inpt, item) for item in output)  # type: ignore[arg-type]
+            output = tmp  # type: ignore[assignment]


Sorry for the delay on merging. JIT was driving nuts:

RuntimeError: expected type comment but found 'ident' here: output = (inpt.wrap_like(inpt, item) for item in output) # type: ignore[assignment,arg-type] ~~~~~~ <--- HERE

Something breaks if ignore contains both assignment and arg-type directives. I suspect JIT might be looking for an assignment directive and completely breaks when we apply the mypy workaround. This idiom passes both JIT and mypy.

…6724) Summary: * Adding support of Video to missed Transforms and Kernels * Fixing Grayscale Transform. * Fixing FiveCrop and TenCrop Transforms. * Fix Linter * Fix more kernels. * Add `five_crop_video` and `ten_crop_video` kernels * Added a TODO. * Missed Video isinstance * nits * Fix bug on AugMix * Nits and TODOs. * Reapply Philip's recommendation * Fix mypy and JIT * Fixing test Reviewed By: NicolasHug Differential Revision: D40427468 fbshipit-source-id: e7f699aee86b80ea3f614dc4e09ae1aaf22fc37d

Adding support of Video to missed Transforms and Kernels

ec20920

facebook-github-bot added the cla signed label Oct 7, 2022

datumbox marked this pull request as draft October 7, 2022 15:32

datumbox and others added 8 commits October 7, 2022 16:56

Merge branch 'main' into prototype/video_corrections

ae8b31c

Fixing Grayscale Transform.

8c383fb

Fixing FiveCrop and TenCrop Transforms.

1b78d0a

Fix Linter

6fe3c7f

Merge branch 'main' into prototype/video_corrections

123856b

Fix more kernels.

99b529e

Add five_crop_video and ten_crop_video kernels

d896c3d

Added a TODO.

fb4b76e

datumbox added module: transforms prototype labels Oct 7, 2022

datumbox changed the title ~~[WIP] Adding support of Video to missed Transforms and Kernels~~ [WIP] Adding support of Video to remaining Transforms and Kernels Oct 7, 2022

datumbox added 2 commits October 10, 2022 09:32

Missed Video isinstance

25cacbd

nits

7e7701d

datumbox requested a review from pmeier October 10, 2022 09:16

datumbox marked this pull request as ready for review October 10, 2022 09:17

datumbox changed the title ~~[WIP] Adding support of Video to remaining Transforms and Kernels~~ Adding support of Video to remaining Transforms and Kernels Oct 10, 2022

pmeier approved these changes Oct 10, 2022

View reviewed changes

Fix bug on AugMix

9cbec19

datumbox force-pushed the prototype/video_corrections branch from ef29a77 to 9cbec19 Compare October 10, 2022 09:31

datumbox commented Oct 10, 2022

View reviewed changes

torchvision/prototype/transforms/_auto_augment.py Show resolved Hide resolved

Nits and TODOs.

11bc8c2

datumbox force-pushed the prototype/video_corrections branch from c9e876b to 11bc8c2 Compare October 10, 2022 10:31

datumbox and others added 2 commits October 10, 2022 11:31

Merge branch 'main' into prototype/video_corrections

d502942

Reapply Philip's recommendation

ac58aec

Fix mypy and JIT

da98c37

datumbox commented Oct 10, 2022

View reviewed changes

datumbox and others added 2 commits October 10, 2022 12:20

Fixing test

5f33ef1

Merge branch 'main' into prototype/video_corrections

93f697b

datumbox merged commit a3fe870 into pytorch:main Oct 10, 2022

datumbox deleted the prototype/video_corrections branch October 10, 2022 11:40

pmeier mentioned this pull request Oct 18, 2022

Add tests and proper support for videos in ConvertImageDtype #6783

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adding support of Video to remaining Transforms and Kernels #6724

Adding support of Video to remaining Transforms and Kernels #6724

Uh oh!

datumbox commented Oct 7, 2022 •

edited

Loading

Uh oh!

pmeier commented Oct 10, 2022

Uh oh!

datumbox commented Oct 10, 2022

Uh oh!

datumbox commented Oct 10, 2022

Uh oh!

pmeier left a comment

Uh oh!

Uh oh!

pmeier Oct 10, 2022

Uh oh!

datumbox Oct 10, 2022

Uh oh!

Uh oh!

pmeier Oct 10, 2022

Uh oh!

datumbox Oct 10, 2022

Uh oh!

Uh oh!

pmeier Oct 10, 2022

Uh oh!

datumbox Oct 10, 2022

Uh oh!

pmeier Oct 10, 2022

Uh oh!

datumbox Oct 10, 2022

Uh oh!

Uh oh!

datumbox Oct 10, 2022

Uh oh!

Uh oh!

	... images_or_videos = features.Image.wrap_like(images_or_videos[0], torch.stack(images_or_videos))
	... image_or_video = images_or_videos[0]
	... images_or_videos = type(image_or_video).wrap_like(image_or_video, torch.stack(images_or_videos))

		@@ -55,6 +55,10 @@ def get_spatial_size_image_pil(image: PIL.Image.Image) -> List[int]:
		return [height, width]


		# TODO: Should we have get_spatial_size_video here? How about masks/bbox etc? What is the criterion for deciding when

		tmp = (inpt.wrap_like(inpt, item) for item in output) # type: ignore[arg-type]
		output = tmp # type: ignore[assignment]

Adding support of Video to remaining Transforms and Kernels #6724

Adding support of Video to remaining Transforms and Kernels #6724

Uh oh!

Conversation

datumbox commented Oct 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pmeier commented Oct 10, 2022

Uh oh!

datumbox commented Oct 10, 2022

Uh oh!

datumbox commented Oct 10, 2022

Uh oh!

pmeier left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

datumbox commented Oct 7, 2022 •

edited

Loading