-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Update low-level functional transforms with value_range #5502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I agree with the idea of passing Concerning using single values vs single MinMax vs Multi MinMax, I think all boils down to what is our target. Do we see TorchVision offering native support for Transforms for more exotic non-RGB/Grayscale spaces? Perhaps that could be useful for medical or astronomical applications but we should be careful not to overcomplicate the API with features we won't use. Are transforms that use I would like to get the advise of you and @pmeier on this one for how we proceed. |
Apart from making the implicit ranges more explicit, what are other problems that having I agree that the way we currently handle ranges isn't nice (read: not documented enough). But from my non-informed POV, it looks like adding |
@NicolasHug Thanks for the feedback, I think we have a description/context gap on this issue. Instead of jumping directly to the problem, we should provide info on what we try to solve and why. The current API uses ToTensor and PILToTensor to not only convert the image to Tensor but also scale it to [0,1]. This is somehow problematic because the ToTensor does too many things. As a result of this implicit scaling, the low-level kernels try to guess the max pixel value of the image from its type. Though we will maintain this guessing for BC, providing a |
Summarizing an internal discussion about this: apart from Keras, all other frameworks also make assumptions on the value range of images. From all the approaches we found, we think our approach that is also used by Albumentations is the most reasonable. We use We could provide a utility function that returns the assumed value range for a given dtype. This could also be used for |
If you check our current code you will see a TODO for using @vfdev-5 I wonder if you can help us with the above? |
no, neither |
@vfdev-5 Thanks for checking. Is it possible to simplify the |
I think the reason is that torch script does not support global. We could do however something like: def convert_image_dtype(image: torch.Tensor, dtype: torch.dtype = torch.float) -> torch.Tensor:
# ...
_max_values: Dict[torch.dtype, int] = {
torch.uint8: 255,
torch.int8: 127,
torch.int16: int(2 ** 15),
torch.int32: int(2 ** 31),
torch.int64: int(2 ** 63),
...
}
input_max = _max_values[image.dtype] @datumbox do you think it worth a change ? EDIT: updated 256 -> 255 |
I think that makes sense. There is typo on the first record for |
For the integer types, there is a >>> assert torch.iinfo(torch.int16).max == 2**15 - 1 |
* Removed _max_value method and added a dictionary Related to #5502 * Addressed failing tests and restored _max_value method * Added xfailing test to switch quicker * Switch to if/else impl
I think we should close this for now. |
Summary: * Removed _max_value method and added a dictionary Related to #5502 * Addressed failing tests and restored _max_value method * Added xfailing test to switch quicker * Switch to if/else impl Reviewed By: vmoens Differential Revision: D34878986 fbshipit-source-id: 2e8268eda1bff6f5375fc1b1c946a68af539689b
Following #5500 (comment) we may want to update low-level functional transforms with
value_range
argument to avoid implicit hard-coded max range definition:Today this is done for
adjust_hue
vision/torchvision/transforms/functional_tensor.py
Lines 210 to 211 in 95d4189
all ops using
_blend
:vision/torchvision/transforms/functional_tensor.py
Lines 259 to 262 in 95d4189
We can introduce new argument
value_range
and use it explicitly for these ops.In general we can think of
value_range
as a tuple (min, max) which would cover majority of imagery where channels ranges are similar. There could be however other type of images (e.g. think of non-RGB color spaces or particular imagery) where value ranges could vary per channel, thus we may need to representvalue_range
as a list of 2-tuples: [(min_1, max_1), (min_2, max_2), ...]cc @bjuncek @datumbox
The text was updated successfully, but these errors were encountered: