[NOMERGE] Review new Transforms API #5500

datumbox · 2022-02-28T21:04:16Z

Adding comments for the proposed new Transforms API as a whole. Part of the comments need to be addressed now, others are FYI and others can be tackled by opening issues and to track the work.

There are a few low-hanging fruit improvements that can be done immediately in separate PRs without much discussion. Others definitely require more thought and discussion so for these I propose having a chat offline and the summarizing here.

cc @pmeier @vfdev-5

facebook-github-bot · 2022-02-28T21:04:24Z

💊 CI failures summary and remediations

As of commit 969635e (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

torchvision/prototype/transforms/functional/_misc.py

pmeier · 2022-03-01T07:12:15Z

torchvision/prototype/features/_bounding_box.py

@@ -39,6 +39,7 @@ def __new__(
    def to_format(self, format: Union[str, BoundingBoxFormat]) -> BoundingBox:
        # TODO: this is useful for developing and debugging but we should remove or at least revisit this before we
        #  promote this out of the prototype state
+        # Do we still need this?


Need, no. Want, yes. I actually want to keep them after this is promoted out of prototype. It is a convenient way to work with the features outside of a training pipeline. Compare for example

from torchvision.prototype import features, transforms transform = transforms.ConvertBoundingBoxFormat(format="xyxy") bounding_box = features.BoundingBox(...) xyxy_bounding_box = transform(bounding_box)

or

from torchvision.prototype import features from torchvision.prototype.transforms import functional as F bounding_box = features.BoundingBox(...) xyxy_bounding_box = features.BoundingBox.new_like( bounding_box, convert_bounding_box_format(bounding_box, old_format=bounding_box.format, new_format="xyxy"), format="xyxy", )

to

from torchvision.prototype import features bounding_box = features.BoundingBox(...) xyxy_bounding_box = bounding_box.to_format("xyxy")

Still, I agree, the lazy import needs to go. My proposal is to move the actual conversion code from transforms.functional to a private function like _convert_bounding_box_format in this module. We could use it here in to_format and simply expose it in transforms.functional.

This comment applies to all other comments that ask the same.

I agree that the verbosity of option 2 is an indication that we need a utility. It doesn't necessarily means that the method should be part of BoundingBox, so let's explore options. Note that we should avoid the use of strings in favour of their enums in our examples and internal code to get better static analysis.

This comment applies to all other comments that ask the same.

The 2 visualization methods show() and draw_bounding_box() on Image are not the same thing. We should remove them as we don't want to necessarily push PIL on the new API.

Note that we should avoid the use of strings in favour of their enums in our examples and internal code to get better static analysis.

Internally I agree 100%, but do we actually want to discourage users from using them in application code? "xyxy" is more concise than BoundingBoxFormat.XYXY and potentially needs one less import.

We have previously made the decision to move with enums (you can see this on multiple APIs including io, multi-weights etc). For consistency we should continue this here.

Where we agree is that people should be allowed to pass strings if they want to. I want to discourage them from doing so by offering all the examples an internal code using enums but if they choose to pass strings so be it. I just don't want to promote this usage. Note that this is true for virtually all places where we have enums.

As discussed offline, quietly supporting strings is still a good idea, but for everything else like documentation or examples we should use the enums.

pmeier · 2022-03-01T07:16:34Z

torchvision/prototype/features/_image.py

+    # On io, we currently support Alpha transparency for Grayscale and RGB. We also support palette for PNG.
+    # Our enum must know about these. We should also allow for users to strip out the alpha transparency which is
+    # currently not supported by any colour transform.


We also support palette for PNG.

Could you add a link for that? Otherwise, we are talking about this enum, correct?

vision/torchvision/io/image.py

Lines 16 to 31 in 95d4189

class ImageReadMode(Enum):

"""

Support for various modes while reading images.

Use ``ImageReadMode.UNCHANGED`` for loading the image as-is,

``ImageReadMode.GRAY`` for converting to grayscale,

``ImageReadMode.GRAY_ALPHA`` for grayscale with transparency,

``ImageReadMode.RGB`` for RGB and ``ImageReadMode.RGB_ALPHA`` for

RGB with transparency.

"""

UNCHANGED = 0

GRAY = 1

GRAY_ALPHA = 2

RGB = 3

RGB_ALPHA = 4

I'll send a PR to include the RGBA and GRAYSCALE_ALPHA into the enum and also provide conversion functions for them.

Looking into this, conversion from RGBA to RGB is more complicated. For a proper conversion we need to know the background color. We could assume white which will probably work for most cases, but we probably should discuss this first.

Palettes are supported if the user reads with IMAGE_READ_MODE_UNCHANGED. The outcome will be an image with a single channel that has integer IDs for every colour. This is ideal for reading masks that have integer IDs for the classes.

Effectively palettes look like grayscale images but their values being ids on the 0-255 scales. Given they can be useful only for niche applications, I don't feel strongly about supporting them on this API.

Paletted images can be also used to store segmentation masks e.g. VOC

pmeier · 2022-03-01T07:17:00Z

torchvision/prototype/transforms/__init__.py

+# 1. Deprecated methods which have equivalents on the new API (_legacy.py?):
+# - Grayscale, RandomGrayscale: use ConvertImageColorSpace


pmeier · 2022-03-01T07:24:52Z

torchvision/prototype/transforms/__init__.py

+# - Pad, RandomCrop, RandomHorizontalFlip, RandomVerticalFlip, RandomPerspective, FiveCrop, TenCrop, ColorJitter,
+# RandomRotation, RandomAffine, GaussianBlur, RandomInvert, RandomPosterize, RandomSolarize, RandomAdjustSharpness,
+# RandomAutocontrast, RandomEqualize, LinearTransformation (must be added)


👍 For everything but FiveCrop and TenCrop. They are outliers in the sense that they get one input and produce multiple outputs. This kind of structure would be suited a lot better to a datapipe rather than a transform.

This is also reinforced by the fact that these transforms can only be put at the end of any pipeline with the old API given that the output type is not compatible with all the other transforms. Even then, using such a transform would most likely need a custom collate function.

I couldn't find any usage in our reference scripts. Does someone know how exactly they are used so I can propose something more concrete here?

The problem is if we want to roll out the API with BC support, we must include implementations for all previous methods + offer alternatives for those that would be placed in deprecated state. Though I understand the challenge, excluding these methods is not OK and we should flag if others like these exist.

The specific transforms are used to produce multiple crops for inference in classification. It is a common trick (especially in the early days of ImageNet competition) to boost the accuracy of the model by averaging the responses of multiple crops. You can see an example of how this is used here.

pmeier · 2022-03-01T07:27:08Z

torchvision/prototype/transforms/__init__.py

+# - Pad, RandomCrop, RandomHorizontalFlip, RandomVerticalFlip, RandomPerspective, FiveCrop, TenCrop, ColorJitter,
+# RandomRotation, RandomAffine, GaussianBlur, RandomInvert, RandomPosterize, RandomSolarize, RandomAdjustSharpness,
+# RandomAutocontrast, RandomEqualize, LinearTransformation (must be added)
+# - PILToTensor, ToPILImage (_legacy.py?)


Shouldn't we also have a more general ConvertImageType transform here that takes one of "pil", "vanilla_tensor", "feature" as input? That would follow the same approach that we have for the other conversions.

Here is what I mean. Let's say we are close to releasing the prototype and we have achieved a significant level of BC. On day 0, we will need a PILToTensor to exist. These are likely be in a deprecated state with warnings informing the user that this functionality is removed and they should use X. Your proposal seems like a valid X alternative.

So here I think we need to figure out 2 things:

What do we do with these legacy classes + provide their implementations on the new API (or aliases). Make sure that the classes continue to work along with the other non-deprecated classes in existing pipelines.

Provide an alternative class that offers the new functionality in the new API.

torchvision/prototype/transforms/_auto_augment.py

torchvision/prototype/transforms/_container.py

pmeier · 2022-03-01T07:56:56Z

torchvision/prototype/transforms/_meta.py

+    # Question: Why do we have both this and a ToDtype Transform? Is this due to BC? Ideally we could move people off
+    # from methods that did an implicit normalization of the values (like this one, or to_tensor). cc @vfdev-5
+    # If that's the case, we should move to _legacy and add deprecation warnings from day one to push people to use
+    # the new methods.


One idea that @datumbox and I discussed a while back is abolish the close correspondence of the dtype and the value range for images. Right now we assume the range [0, 1] for floating point images and [0, torch.iinfo(dtype).max] for integral images. A possible solution for this would be have a value_range meta data on the features.Image class. It can be set with the defaults from above so for a regular use case the user won't feel any difference.

For example, let's say we have a uint8 image

import torch from torchvision.prototype import features image = features.Image(torch.randint(0, 256, (3, 128, 128)))

By default print(image.value_range) would print (0, 255). If we now convert the dtype like image = image.to(torch.float32), printing the value range again would give (0.0, 255.0) rather than (0.0, 1.0) what happens through ConvertImageDtype.

This means, all transformations that implicitly rely on the value range like Normalize need to be adapted to use the explicit value instead, but we could get rid of ConvertImageDtype all together.

I absolutely love this and 100% agree we should do it. I've previously asked @vfdev-5 to create an issue and see which methods of the low-level kernels should be expanded to receive a max_value. We can write this in a BC manner on the low-level kernels; if the max_value is not passed we do the previous behaviour. Tensors will continue making this assumption, but Images shouldn't.

I feel this is a critical part of the proposal that needs to be investigated prior implementing all other transforms. This is probably one of the few remained uninvestigated bits of the proposal.

torchvision/prototype/transforms/_meta.py

torchvision/prototype/transforms/_transform.py

datumbox · 2022-03-23T12:26:21Z

torchvision/prototype/transforms/functional/__init__.py

@@ -63,3 +69,13 @@
 )
 from ._misc import normalize_image_tensor, gaussian_blur_image_tensor
 from ._type_conversion import decode_image_with_pil, decode_video_with_av, label_to_one_hot
+
+# What are the migration plans for public methods without new API equivalents? There are two categories:


@pmeier Reminder for migration plan on these as well. Might be already on your radar as part of the Transform Classes.

Review new Transforms API

969635e

pytorch-bot bot added the ciflow/default label Feb 28, 2022

facebook-github-bot added the cla signed label Feb 28, 2022

vfdev-5 reviewed Feb 28, 2022

View reviewed changes

torchvision/prototype/transforms/functional/_misc.py Show resolved Hide resolved

pmeier reviewed Mar 1, 2022

View reviewed changes

vfdev-5 mentioned this pull request Mar 1, 2022

Update low-level functional transforms with value_range #5502

Closed

pmeier mentioned this pull request Mar 1, 2022

Prototype transforms cleanup #5504

Merged

datumbox closed this Mar 1, 2022

datumbox deleted the nomerge/api_feedback branch March 1, 2022 16:37

This was referenced Mar 2, 2022

[RFC] How do we want to deal with images that include alpha channels? #5510

Closed

port FiveCrop and TenCrop to prototype API #5513

Merged

draw_bounding_boxes should check that (W, H) >= (xmax, ymax) >= (xmin, ymin) >= (0, 0) #5517

Closed

datumbox commented Mar 23, 2022

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[NOMERGE] Review new Transforms API #5500

[NOMERGE] Review new Transforms API #5500

datumbox commented Feb 28, 2022

facebook-github-bot commented Feb 28, 2022 •

edited

Loading

pmeier Mar 1, 2022

datumbox Mar 1, 2022

pmeier Mar 1, 2022

datumbox Mar 1, 2022

pmeier Mar 1, 2022 •

edited

Loading

pmeier Mar 1, 2022

pmeier Mar 1, 2022

datumbox Mar 1, 2022 •

edited

Loading

vfdev-5 Mar 1, 2022

pmeier Mar 1, 2022

pmeier Mar 1, 2022

datumbox Mar 1, 2022 •

edited

Loading

pmeier Mar 1, 2022

datumbox Mar 1, 2022

pmeier Mar 1, 2022

datumbox Mar 1, 2022

datumbox Mar 1, 2022

datumbox Mar 23, 2022

	class ImageReadMode(Enum):
	"""
	Support for various modes while reading images.

	Use ``ImageReadMode.UNCHANGED`` for loading the image as-is,
	``ImageReadMode.GRAY`` for converting to grayscale,
	``ImageReadMode.GRAY_ALPHA`` for grayscale with transparency,
	``ImageReadMode.RGB`` for RGB and ``ImageReadMode.RGB_ALPHA`` for
	RGB with transparency.
	"""

	UNCHANGED = 0
	GRAY = 1
	GRAY_ALPHA = 2
	RGB = 3
	RGB_ALPHA = 4

		# 1. Deprecated methods which have equivalents on the new API (_legacy.py?):
		# - Grayscale, RandomGrayscale: use ConvertImageColorSpace

[NOMERGE] Review new Transforms API #5500

[NOMERGE] Review new Transforms API #5500

Conversation

datumbox commented Feb 28, 2022

facebook-github-bot commented Feb 28, 2022 • edited Loading

💊 CI failures summary and remediations

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pmeier Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datumbox Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

datumbox Mar 1, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 28, 2022 •

edited

Loading

pmeier Mar 1, 2022 •

edited

Loading

datumbox Mar 1, 2022 •

edited

Loading

datumbox Mar 1, 2022 •

edited

Loading