-
Notifications
You must be signed in to change notification settings - Fork 7.1k
[RFC] How do we want to deal with images that include alpha channels? #5510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I personally would go with a combined approach:
|
I agree that for torchvision supported datasets if alpha channel is not used later by a CNN then we can strip it from the beginning. For example, as you say as ImageNet can have RGBA images, it would be better to provide only RGB images as it is done right now in the stable API. One way I could think about that is to have a preset datapipe transforms that could be deactivated: dp = datasets.load("imagenet", decode_presets=datasets.imagenet.decode_preset) When you say that "image reading functions support RGBA images". RGBA is a conventional as PIL "RGBA" mode or premultiplied as PIL "RGBa" ? (for differences between: https://shawnhargreaves.com/blog/premultiplied-alpha.html) Another question, Some thoughts on By the way, PIL handles correctly alpha channel on transformations, so impacted color transforms are only related to As for choosing |
I don't think that's the focus right now. Maybe on the long long future but definitely not something we plan to do anytime soon.
+1 on that. Given that our existing functional transforms have fill=0, does it make sense to go with black? |
I might be pedantic here, but I want to be sure we are all on the same page: we cannot simply strip the alpha channel unless the alpha channel is all ones or 255s depending on the input type. If that is not the case, stripping the alpha channel has an effect on the actual color of the image as can be seen in the equation above.
How would this approach know which image needs the special conversion? If we have a general approach like this, we also need to solve the conversion in a general sense.
I don't know. @datumbox has introduced the enum. Do you have any insights here?
Yes. For now, we only support vision/torchvision/prototype/features/_image.py Lines 16 to 19 in 71d2bb0
But if we decide how we want to handle RGBA we can of course add it.
That is already implemented. vision/torchvision/prototype/transforms/_meta.py Lines 55 to 78 in 71d2bb0
I agree with @datumbox here. While the API allows all these conversions, we shouldn't add anything here unless there is a specific user request. If we have other color spaces that are equivalent to RGB such as HSV or HSL, we definitely need a
The alpha channel is different from a fill color. Choosing alpha all white means no transparency and thus the background is irrelevant. For our all of our current datasets, RGBA images are outliers that are most likely not intended by the authors. They happen if the images are scraped from the internet and the person who uploaded them made a mistake to include an alpha channel although there was no need for it. For example, this is the RGBA image from ImageNet (in thumbnail size to limit copyright implications) including the alpha channel: |
Maybe, I'm missing something important, but for me, everything could just follow the same logic as actual API does with Pillow images: when image is read and decoded you'll have a color space. Prior info is that for imagenet we provide RGB only images. So, all images with non |
That is the crucial point I was touching on in this issue. How are we going to implement the conversion from RGBA to RGB? Not all image files carry information about the background color in the first place. The image above is one of them. In such a case PIL falls back to Floyd-Steinberg dithering. Not only would we need to implement custom kernels for that, but dithering might also significantly reduce the image quality. Compare this test image
or converted by the formula above assuming a white background |
So the two key approaches discussed here are:
What is the approach used by CV2? |
I'll dig in what PIL is exactly doing, because what I failed to realize is that they also seem to assume a white background for the dithering. |
CV2I'm not 100% sure, but my best guess is that white is also assumed. I've tracked it to this OpenCL call. There are multiple issues on the OpenCV repo such as opencv/opencv#13135 that report that the conversion does not take alpha into account. I think the consensus is that one should not use skimageskimage has an Thus, I think we are fairly safe if we also assume white and provide an option to set it manually if required. |
After some offline discussion, we decided to align with PIL for now. The only difference should be that we should fail the transformation if the alpha channel is not the max value everywhere. This way we can implement the correct conversion as detailed in my top comment later without worrying about BC. |
@pmeier You mean rgba2rgb method: https://github.com/python-pillow/Pillow/blob/c58d2817bc891c26e6b8098b8909c0eb2e7ce61b/src/libImaging/Convert.c#L443-L453 right ? How do you see that it assumes white background ? Pixel size is 4 for RGB: https://github.com/python-pillow/Pillow/blob/95cff6e959bb3c37848158ed2145d49d49806a31/src/libImaging/Storage.c#L125-L129
In PIL docs they say to use dithering for conversion like RGB to P or 1 |
You are right. PIL also seems to only "strip" the alpha channel by setting it to 255 everywhere: import PIL.Image
import torch
from torchvision.transforms.functional import pil_to_tensor
image = PIL.Image.open("text2.png")
a = pil_to_tensor(image)[:3, ...]
b = pil_to_tensor(image.convert("RGB"))
torch.testing.assert_close(a, b) Thus, the behavior is aligned with OpenCV. This doesn't change the plan from #5510 (comment). |
Isn't "strip" the alpha channel means assuming a black background? If I take a RGBA image with transparent background and call Image.convert(), it will become RGB with black background. Which is different from white background acchieved from alpha_composite(). |
@bsun0802 Could you share the image so I can take a look? |
Sorry, I was a little slow this morning. Note that the color what you call "background color", i.e. the black region around the people, is not the same as the "background" we are talking here, but rather foreground. In case there is transparency in the image, the background color is determined by the canvas you are drawing on. In your case that is white from the notebook. If you put the same image on a green canvas, the background will be green. Meaning, you are using the alpha channel here to mask out the black background the image originally has. This issue is about what we want to do in case we are forced to convert from RGBA to RGB, but don't have canvas to draw on. |
This discussion started in #5500 (comment) and @vfdev-5 and I continued offline.
PIL as well as our image reading functions support RGBA images
vision/torchvision/io/image.py
Lines 16 to 31 in 95d4189
but our color transformations currently only support RGB images ignoring an extra alpha channel. This leads to wrong results. One thing that we agreed upon is that these transforms should fail if anything but 3 channels is detected.
Still, some datasets include non-RGB images so we need to deal with this for a smooth UX. Previously we implicitly converted every image to RGB before returning it from a dataset
vision/torchvision/datasets/folder.py
Lines 245 to 249 in f9fbc10
Since we no longer decode images in the datasets, we need to provide a solution for the users here. I currently see two possible options:
We could deal with this on a per-image basis within the dataset. For example, the train split of ImageNet contains a single RGBA image. We could simply perform an appropriate conversion for irregular image modes in the dataset so this issue is abstracted away from the user.
tensorflow-datasets
uses this approach: https://github.com/tensorflow/datasets/blob/a1caff379ed3164849fdefd147473f72a22d3fa7/tensorflow_datasets/image_classification/imagenet.py#L105-L131The most common non-RGB image in datasets are grayscale images. For example, the train split of ImageNet contains 19970 grayscale images. Thus, the users will need a
transforms.ConvertImageColorSpace("rgb")
in most cases anyway. If that would support RGBA to RGB conversions the problem would also be solved. The conversion happens with this formula:where
pixel_{old|new}
is a single value from a color channel. Since we don't knowbackground
we need to either make assumptions or require the user to provide a value for it. I'd wager a guess that in 99% of the cases the background is white. i.e.background == 1
, but we can't be sure about that.Another issue with this is that the user has no option to set the background on a per-image basis in the transforms pipeline if that is needed.
In special case for
alpha == 1
everywhere, the equation above simplifies towhich is equivalent to stripping the alpha channel. We could check for that and only perform the RGBA to RGB transform if the condition holds or the user supplies a background color.
cc @pmeier @vfdev-5 @datumbox @bjuncek
The text was updated successfully, but these errors were encountered: