-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[NOMERGE] Review new Transforms API #5500
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -17,6 +17,9 @@ class ColorSpace(StrEnum): | |||||||||||||||||||||||||||||||||
OTHER = 0 | ||||||||||||||||||||||||||||||||||
GRAYSCALE = 1 | ||||||||||||||||||||||||||||||||||
RGB = 3 | ||||||||||||||||||||||||||||||||||
# On io, we currently support Alpha transparency for Grayscale and RGB. We also support palette for PNG. | ||||||||||||||||||||||||||||||||||
# Our enum must know about these. We should also allow for users to strip out the alpha transparency which is | ||||||||||||||||||||||||||||||||||
# currently not supported by any colour transform. | ||||||||||||||||||||||||||||||||||
Comment on lines
+20
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Could you add a link for that? Otherwise, we are talking about this enum, correct? vision/torchvision/io/image.py Lines 16 to 31 in 95d4189
I'll send a PR to include the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looking into this, conversion from RGBA to RGB is more complicated. For a proper conversion we need to know the background color. We could assume white which will probably work for most cases, but we probably should discuss this first. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Palettes are supported if the user reads with Effectively palettes look like grayscale images but their values being ids on the 0-255 scales. Given they can be useful only for niche applications, I don't feel strongly about supporting them on this API. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Paletted images can be also used to store segmentation masks e.g. VOC |
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
class Image(_Feature): | ||||||||||||||||||||||||||||||||||
|
@@ -78,9 +81,11 @@ def guess_color_space(data: torch.Tensor) -> ColorSpace: | |||||||||||||||||||||||||||||||||
def show(self) -> None: | ||||||||||||||||||||||||||||||||||
# TODO: this is useful for developing and debugging but we should remove or at least revisit this before we | ||||||||||||||||||||||||||||||||||
# promote this out of the prototype state | ||||||||||||||||||||||||||||||||||
# Do we still need this? | ||||||||||||||||||||||||||||||||||
to_pil_image(make_grid(self.view(-1, *self.shape[-3:]))).show() | ||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||
def draw_bounding_box(self, bounding_box: BoundingBox, **kwargs: Any) -> Image: | ||||||||||||||||||||||||||||||||||
# TODO: this is useful for developing and debugging but we should remove or at least revisit this before we | ||||||||||||||||||||||||||||||||||
# promote this out of the prototype state | ||||||||||||||||||||||||||||||||||
# Do we still need this? | ||||||||||||||||||||||||||||||||||
return Image.new_like(self, draw_bounding_boxes(self, bounding_box.to_format("xyxy").view(-1, 4), **kwargs)) |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -12,3 +12,14 @@ | |
from ._misc import Identity, Normalize, ToDtype, Lambda | ||
from ._presets import CocoEval, ImageNetEval, VocEval, Kinect400Eval, RaftEval | ||
from ._type_conversion import DecodeImage, LabelToOneHot | ||
|
||
# What are the migration plans for Classes without new API equivalents? There are two categories: | ||
# 1. Deprecated methods which have equivalents on the new API (_legacy.py?): | ||
# - Grayscale, RandomGrayscale: use ConvertImageColorSpace | ||
Comment on lines
+17
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 |
||
# 2. Those without equivalents on the new API: | ||
# - Pad, RandomCrop, RandomHorizontalFlip, RandomVerticalFlip, RandomPerspective, FiveCrop, TenCrop, ColorJitter, | ||
# RandomRotation, RandomAffine, GaussianBlur, RandomInvert, RandomPosterize, RandomSolarize, RandomAdjustSharpness, | ||
# RandomAutocontrast, RandomEqualize, LinearTransformation (must be added) | ||
Comment on lines
+20
to
+22
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 👍 For everything but This is also reinforced by the fact that these transforms can only be put at the end of any pipeline with the old API given that the output type is not compatible with all the other transforms. Even then, using such a transform would most likely need a custom collate function. I couldn't find any usage in our reference scripts. Does someone know how exactly they are used so I can propose something more concrete here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The problem is if we want to roll out the API with BC support, we must include implementations for all previous methods + offer alternatives for those that would be placed in deprecated state. Though I understand the challenge, excluding these methods is not OK and we should flag if others like these exist. The specific transforms are used to produce multiple crops for inference in classification. It is a common trick (especially in the early days of ImageNet competition) to boost the accuracy of the model by averaging the responses of multiple crops. You can see an example of how this is used here. |
||
# - PILToTensor, ToPILImage (_legacy.py?) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shouldn't we also have a more general There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here is what I mean. Let's say we are close to releasing the prototype and we have achieved a significant level of BC. On day 0, we will need a PILToTensor to exist. These are likely be in a deprecated state with warnings informing the user that this functionality is removed and they should use X. Your proposal seems like a valid X alternative. So here I think we need to figure out 2 things:
|
||
# - ToTensor (deprecate vfdev-5?) | ||
# We need a plan for both categories implemented on the new API. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,6 +34,10 @@ def __init__( | |
if p < 0 or p > 1: | ||
raise ValueError("Random erasing probability should be between 0 and 1") | ||
# TODO: deprecate p in favor of wrapping the transform in a RandomApply | ||
# The above approach is very composable but will lead to verbose code. Instead, we can create a base class | ||
# that inherits from Transform (say RandomTransform) that receives the `p` on constructor and by default | ||
# implements the `p` random check on forward. This is likely to happen on the final clean ups, so perhaps | ||
# update the comment to indicate accordingly OR create an issue to track this discussion. | ||
pmeier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.p = p | ||
self.scale = scale | ||
self.ratio = ratio | ||
|
@@ -84,6 +88,8 @@ def _get_params(self, sample: Any) -> Dict[str, Any]: | |
break | ||
else: | ||
i, j, h, w, v = 0, 0, img_h, img_w, image | ||
# FYI: Now taht we are not JIT-scriptable, I probably can avoid copying-pasting the image to itself in this | ||
# scenario. Perhaps a simple clone would do. | ||
Comment on lines
+91
to
+92
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, there is major cleanup coming up. A lot of the annotations on the transforms are either wrong or to strict to appease torchscript. We should revisit before we promote out of prototype. There is also a user request for this: #5398 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
||
return dict(zip("ijhwv", (i, j, h, w, v))) | ||
|
||
|
@@ -95,6 +101,7 @@ def _transform(self, input: Any, params: Dict[str, Any]) -> Any: | |
return features.Image.new_like(input, output) | ||
elif isinstance(input, torch.Tensor): | ||
return F.erase_image_tensor(input, **params) | ||
# FYI: we plan to add support for PIL, as part of Batteries Included | ||
else: | ||
return input | ||
|
||
|
@@ -112,6 +119,8 @@ def __init__(self, *, alpha: float) -> None: | |
self._dist = torch.distributions.Beta(torch.tensor([alpha]), torch.tensor([alpha])) | ||
|
||
def _get_params(self, sample: Any) -> Dict[str, Any]: | ||
# Question: Shall we enforce here the existence of Labels in the sample? If yes, this method of validating | ||
# input won't work if get_params() gets public and the user sis able to provide params in forward. | ||
pmeier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
return dict(lam=float(self._dist.sample(()))) | ||
|
||
def _transform(self, input: Any, params: Dict[str, Any]) -> Any: | ||
|
@@ -134,6 +143,7 @@ def __init__(self, *, alpha: float) -> None: | |
self._dist = torch.distributions.Beta(torch.tensor([alpha]), torch.tensor([alpha])) | ||
|
||
def _get_params(self, sample: Any) -> Dict[str, Any]: | ||
# Question: Same as above for Labels. | ||
lam = float(self._dist.sample(())) | ||
|
||
image = query_image(sample) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,7 +4,7 @@ | |
import torch | ||
from torchvision.prototype import features | ||
from torchvision.prototype.transforms import Transform, functional as F | ||
from torchvision.transforms.functional import convert_image_dtype | ||
from torchvision.transforms.functional import convert_image_dtype # We should have our an alias for this on the new API | ||
|
||
|
||
class ConvertBoundingBoxFormat(Transform): | ||
|
@@ -23,6 +23,10 @@ def _transform(self, input: Any, params: Dict[str, Any]) -> Any: | |
|
||
|
||
class ConvertImageDtype(Transform): | ||
# Question: Why do we have both this and a ToDtype Transform? Is this due to BC? Ideally we could move people off | ||
# from methods that did an implicit normalization of the values (like this one, or to_tensor). cc @vfdev-5 | ||
# If that's the case, we should move to _legacy and add deprecation warnings from day one to push people to use | ||
# the new methods. | ||
Comment on lines
+26
to
+29
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. One idea that @datumbox and I discussed a while back is abolish the close correspondence of the dtype and the value range for images. Right now we assume the range For example, let's say we have a uint8 image import torch
from torchvision.prototype import features
image = features.Image(torch.randint(0, 256, (3, 128, 128))) By default This means, all transformations that implicitly rely on the value range like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I absolutely love this and 100% agree we should do it. I've previously asked @vfdev-5 to create an issue and see which methods of the low-level kernels should be expanded to receive a max_value. We can write this in a BC manner on the low-level kernels; if the I feel this is a critical part of the proposal that needs to be investigated prior implementing all other transforms. This is probably one of the few remained uninvestigated bits of the proposal. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. FYI #5502 |
||
def __init__(self, dtype: torch.dtype = torch.float32) -> None: | ||
super().__init__() | ||
self.dtype = dtype | ||
|
@@ -59,7 +63,7 @@ def _transform(self, input: Any, params: Dict[str, Any]) -> Any: | |
return features.Image.new_like(input, output, color_space=self.color_space) | ||
elif isinstance(input, torch.Tensor): | ||
if self.old_color_space is None: | ||
raise RuntimeError("") | ||
raise RuntimeError("") # Add better exception message | ||
pmeier marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
return F.convert_image_color_space_tensor( | ||
input, old_color_space=self.old_color_space, new_color_space=self.color_space | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,9 @@ | ||
# He should create an issue that lists the steps that need to be performed for rolling out the API to main TorchVision. | ||
# I got a similar for the models, see here: https://github.com/pytorch/vision/issues/4679 | ||
# One of the key things we would need to take care of is that all the kernels below will need logging. This is because | ||
# there will be no high-level kernel (like `F` on main) and we would instead need to put tracking directly in each | ||
# low-level kernels which will be now public (now functional_pil|tensor are private). | ||
|
||
from torchvision.transforms import InterpolationMode # usort: skip | ||
from ._meta import ( | ||
convert_bounding_box_format, | ||
|
@@ -63,3 +69,13 @@ | |
) | ||
from ._misc import normalize_image_tensor, gaussian_blur_image_tensor | ||
from ._type_conversion import decode_image_with_pil, decode_video_with_av, label_to_one_hot | ||
|
||
# What are the migration plans for public methods without new API equivalents? There are two categories: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @pmeier Reminder for migration plan on these as well. Might be already on your radar as part of the Transform Classes. |
||
# 1. Deprecated methods which have equivalents on the new API (_legacy.py?): | ||
# - get_image_size, get_image_num_channels: use get_dimensions_image_tensor|pil | ||
# - to_grayscale, rgb_to_grayscale: use convert_image_color_space_tensor|pil | ||
# 2. Those without equivalents on the new API: | ||
# - five_crop, ten_crop (must be added) | ||
# - pil_to_tensor, to_pil_image (_legacy.py?) | ||
# - to_tensor() (deprecate vfdev-5?) | ||
# We need a plan for both categories implemented on the new API. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need, no. Want, yes. I actually want to keep them after this is promoted out of prototype. It is a convenient way to work with the features outside of a training pipeline. Compare for example
or
to
Still, I agree, the lazy import needs to go. My proposal is to move the actual conversion code from
transforms.functional
to a private function like_convert_bounding_box_format
in this module. We could use it here into_format
and simply expose it intransforms.functional
.This comment applies to all other comments that ask the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the verbosity of option 2 is an indication that we need a utility. It doesn't necessarily means that the method should be part of BoundingBox, so let's explore options. Note that we should avoid the use of strings in favour of their enums in our examples and internal code to get better static analysis.
The 2 visualization methods
show()
anddraw_bounding_box()
onImage
are not the same thing. We should remove them as we don't want to necessarily push PIL on the new API.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Internally I agree 100%, but do we actually want to discourage users from using them in application code?
"xyxy"
is more concise thanBoundingBoxFormat.XYXY
and potentially needs one less import.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have previously made the decision to move with enums (you can see this on multiple APIs including io, multi-weights etc). For consistency we should continue this here.
Where we agree is that people should be allowed to pass strings if they want to. I want to discourage them from doing so by offering all the examples an internal code using enums but if they choose to pass strings so be it. I just don't want to promote this usage. Note that this is true for virtually all places where we have enums.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline, quietly supporting strings is still a good idea, but for everything else like documentation or examples we should use the enums.