Skip to content

[RFC] How do we want to deal with images that include alpha channels? #5510

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pmeier opened this issue Mar 2, 2022 · 18 comments · Fixed by #5567
Closed

[RFC] How do we want to deal with images that include alpha channels? #5510

pmeier opened this issue Mar 2, 2022 · 18 comments · Fixed by #5567

Comments

@pmeier
Copy link
Collaborator

pmeier commented Mar 2, 2022

This discussion started in #5500 (comment) and @vfdev-5 and I continued offline.

PIL as well as our image reading functions support RGBA images

class ImageReadMode(Enum):
"""
Support for various modes while reading images.
Use ``ImageReadMode.UNCHANGED`` for loading the image as-is,
``ImageReadMode.GRAY`` for converting to grayscale,
``ImageReadMode.GRAY_ALPHA`` for grayscale with transparency,
``ImageReadMode.RGB`` for RGB and ``ImageReadMode.RGB_ALPHA`` for
RGB with transparency.
"""
UNCHANGED = 0
GRAY = 1
GRAY_ALPHA = 2
RGB = 3
RGB_ALPHA = 4

but our color transformations currently only support RGB images ignoring an extra alpha channel. This leads to wrong results. One thing that we agreed upon is that these transforms should fail if anything but 3 channels is detected.

Still, some datasets include non-RGB images so we need to deal with this for a smooth UX. Previously we implicitly converted every image to RGB before returning it from a dataset

def pil_loader(path: str) -> Image.Image:
# open path as file to avoid ResourceWarning (https://github.com/python-pillow/Pillow/issues/835)
with open(path, "rb") as f:
img = Image.open(f)
return img.convert("RGB")

Since we no longer decode images in the datasets, we need to provide a solution for the users here. I currently see two possible options:

  1. We could deal with this on a per-image basis within the dataset. For example, the train split of ImageNet contains a single RGBA image. We could simply perform an appropriate conversion for irregular image modes in the dataset so this issue is abstracted away from the user. tensorflow-datasets uses this approach: https://github.com/tensorflow/datasets/blob/a1caff379ed3164849fdefd147473f72a22d3fa7/tensorflow_datasets/image_classification/imagenet.py#L105-L131

  2. The most common non-RGB image in datasets are grayscale images. For example, the train split of ImageNet contains 19970 grayscale images. Thus, the users will need a transforms.ConvertImageColorSpace("rgb") in most cases anyway. If that would support RGBA to RGB conversions the problem would also be solved. The conversion happens with this formula:

    pixel_new = (1 - alpha) * background + alpha * pixel_old
    

    where pixel_{old|new} is a single value from a color channel. Since we don't know background we need to either make assumptions or require the user to provide a value for it. I'd wager a guess that in 99% of the cases the background is white. i.e. background == 1, but we can't be sure about that.

    Another issue with this is that the user has no option to set the background on a per-image basis in the transforms pipeline if that is needed.

    In special case for alpha == 1 everywhere, the equation above simplifies to

    pixel_new = pixel_old
    

    which is equivalent to stripping the alpha channel. We could check for that and only perform the RGBA to RGB transform if the condition holds or the user supplies a background color.

cc @pmeier @vfdev-5 @datumbox @bjuncek

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 2, 2022

I personally would go with a combined approach:

  1. If we are aware of any images in a dataset outside of what we can convert automatically without assumptions, the dataset should handle these images.
  2. Add a background parameter to convert_image_color_space. If this is not passed and the alpha channel is not 1 everywhere fail.

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 2, 2022

I agree that for torchvision supported datasets if alpha channel is not used later by a CNN then we can strip it from the beginning. For example, as you say as ImageNet can have RGBA images, it would be better to provide only RGB images as it is done right now in the stable API. One way I could think about that is to have a preset datapipe transforms that could be deactivated:

dp = datasets.load("imagenet", decode_presets=datasets.imagenet.decode_preset)

When you say that "image reading functions support RGBA images". RGBA is a conventional as PIL "RGBA" mode or premultiplied as PIL "RGBa" ? (for differences between: https://shawnhargreaves.com/blog/premultiplied-alpha.html)

Another question, features.Image can have ColorSpace as RGBA ?

Some thoughts on ConvertImageColorSpace("rgb"). This means that we should be aware of the current mode and
possibly can transform to hsv, hsl and other modes (e.g. like opencv).

By the way, PIL handles correctly alpha channel on transformations, so impacted color transforms are only related to features.Image.

As for choosing background as 1, 255 or white, a problem can be seen on geometric transforms where default fill color is 0 or black, thus either user have to add everywhere fill color as white or they will have 2 types of no-data pixels: black and white...

@datumbox
Copy link
Contributor

datumbox commented Mar 2, 2022

possibly can transform to hsv, hsl and other modes

I don't think that's the focus right now. Maybe on the long long future but definitely not something we plan to do anytime soon.

As for choosing background as 1, 255 or white, .....

+1 on that. Given that our existing functional transforms have fill=0, does it make sense to go with black?

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 2, 2022

@vfdev-5

I agree that for torchvision supported datasets if alpha channel is not used later by a CNN then we can strip it from the beginning.

I might be pedantic here, but I want to be sure we are all on the same page: we cannot simply strip the alpha channel unless the alpha channel is all ones or 255s depending on the input type. If that is not the case, stripping the alpha channel has an effect on the actual color of the image as can be seen in the equation above.

For example, as you say as ImageNet can have RGBA images, it would be better to provide only RGB images as it is done right now in the stable API. One way I could think about that is to have a preset datapipe transforms that could be deactivated:

How would this approach know which image needs the special conversion? If we have a general approach like this, we also need to solve the conversion in a general sense.

When you say that "image reading functions support RGBA images". RGBA is a conventional as PIL "RGBA" mode or premultiplied as PIL "RGBa" ?

I don't know. @datumbox has introduced the enum. Do you have any insights here?

Another question, features.Image can have ColorSpace as RGBA ?

Yes. For now, we only support

class ColorSpace(StrEnum):
OTHER = StrEnum.auto()
GRAYSCALE = StrEnum.auto()
RGB = StrEnum.auto()

But if we decide how we want to handle RGBA we can of course add it.

Some thoughts on ConvertImageColorSpace("rgb"). This means that we should be aware of the current mode and

That is already implemented. features.Image's and PIL.Image.Image's store this information. Only for vanilla tensors, the user needs to provide it manually:

if isinstance(input, features.Image):
output = F.convert_image_color_space_tensor(
input, old_color_space=input.color_space, new_color_space=self.color_space
)
return features.Image.new_like(input, output, color_space=self.color_space)
elif isinstance(input, torch.Tensor):
if self.old_color_space is None:
raise RuntimeError(
f"In order to convert vanilla tensor images, `{type(self).__name__}(...)` "
f"needs to be constructed with the `old_color_space=...` parameter."
)
return F.convert_image_color_space_tensor(
input, old_color_space=self.old_color_space, new_color_space=self.color_space
)
elif isinstance(input, PIL.Image.Image):
old_color_space = {
"L": features.ColorSpace.GRAYSCALE,
"RGB": features.ColorSpace.RGB,
}.get(input.mode, features.ColorSpace.OTHER)
return F.convert_image_color_space_pil(
input, old_color_space=old_color_space, new_color_space=self.color_space
)

possibly can transform to hsv, hsl and other modes (e.g. like opencv).

I agree with @datumbox here. While the API allows all these conversions, we shouldn't add anything here unless there is a specific user request. If we have other color spaces that are equivalent to RGB such as HSV or HSL, we definitely need a color_space parameter on all color transforms and convert to RGB before we start applying the transform.

As for choosing background as 1, 255 or white, a problem can be seen on geometric transforms where default fill color is 0 or black, thus either user have to add everywhere fill color as white or they will have 2 types of no-data pixels: black and white...

The alpha channel is different from a fill color. Choosing alpha all white means no transparency and thus the background is irrelevant. For our all of our current datasets, RGBA images are outliers that are most likely not intended by the authors. They happen if the images are scraped from the internet and the person who uploaded them made a mistake to include an alpha channel although there was no need for it. For example, this is the RGBA image from ImageNet (in thumbnail size to limit copyright implications) including the alpha channel:

imagenet_rgba

@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 2, 2022

How would this approach know which image needs the special conversion? If we have a general approach like this, we also need to solve the conversion in a general sense.

Maybe, I'm missing something important, but for me, everything could just follow the same logic as actual API does with Pillow images: when image is read and decoded you'll have a color space. Prior info is that for imagenet we provide RGB only images. So, all images with non RGB space should be converted to RGB.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 2, 2022

So, all images with non RGB space should be converted to RGB.

That is the crucial point I was touching on in this issue. How are we going to implement the conversion from RGBA to RGB? Not all image files carry information about the background color in the first place. The image above is one of them. In such a case PIL falls back to Floyd-Steinberg dithering.

Not only would we need to implement custom kernels for that, but dithering might also significantly reduce the image quality. Compare this test image

.convert("RGB")'ed by PIL with Floyd-Steinberg

dithered

or converted by the formula above assuming a white background

converted

@datumbox
Copy link
Contributor

datumbox commented Mar 4, 2022

So the two key approaches discussed here are:

  1. Handle RGBA to RGB conversion exactly like PIL using Floyd-Steinberg
  2. Use the formula provided by Philip and assume white background

What is the approach used by CV2?

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 4, 2022

I'll dig in what PIL is exactly doing, because what I failed to realize is that they also seem to assume a white background for the dithering.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 7, 2022

PIL assumes a white background.

@datumbox
Copy link
Contributor

datumbox commented Mar 7, 2022

@pmeier Can you check also CV2? If they do the same, I think it would make sense to make the same assumption on TorchVision.

@vfdev-5 Thoughts?

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 7, 2022

CV2

I'm not 100% sure, but my best guess is that white is also assumed. I've tracked it to this OpenCL call. There are multiple issues on the OpenCV repo such as opencv/opencv#13135 that report that the conversion does not take alpha into account. I think the consensus is that one should not use cv2.cvtColor for RGBA -> RGB conversions, but rather blend if the alpha actually contains information. Ignoring the alpha channel is effectively the same than setting it to white.

skimage

skimage has an background parameter on their rgba2rgb() conversion function. It defaults to white.

Thus, I think we are fairly safe if we also assume white and provide an option to set it manually if required.

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 8, 2022

After some offline discussion, we decided to align with PIL for now. The only difference should be that we should fail the transformation if the alpha channel is not the max value everywhere. This way we can implement the correct conversion as detailed in my top comment later without worrying about BC.

@pmeier pmeier self-assigned this Mar 8, 2022
@vfdev-5
Copy link
Collaborator

vfdev-5 commented Mar 8, 2022

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 8, 2022

You are right. PIL also seems to only "strip" the alpha channel by setting it to 255 everywhere:

import PIL.Image
import torch
from torchvision.transforms.functional import pil_to_tensor

image = PIL.Image.open("text2.png")
a = pil_to_tensor(image)[:3, ...]
b = pil_to_tensor(image.convert("RGB"))

torch.testing.assert_close(a, b)

Thus, the behavior is aligned with OpenCV. This doesn't change the plan from #5510 (comment).

@bsun0802
Copy link

bsun0802 commented Mar 28, 2023

You are right. PIL also seems to only "strip" the alpha channel by setting it to 255 everywhere:

import PIL.Image
import torch
from torchvision.transforms.functional import pil_to_tensor

image = PIL.Image.open("text2.png")
a = pil_to_tensor(image)[:3, ...]
b = pil_to_tensor(image.convert("RGB"))

torch.testing.assert_close(a, b)

Thus, the behavior is aligned with OpenCV. This doesn't change the plan from #5510 (comment).

Isn't "strip" the alpha channel means assuming a black background? If I take a RGBA image with transparent background and call Image.convert(), it will become RGB with black background. Which is different from white background acchieved from alpha_composite().

image

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 28, 2023

@bsun0802 Could you share the image so I can take a look?

@bsun0802
Copy link

@pmeier
GDsO1wOe_31QL2AEAAAAAABOhk1Rbs0HAAAD

@pmeier
Copy link
Collaborator Author

pmeier commented Mar 28, 2023

Sorry, I was a little slow this morning. Note that the color what you call "background color", i.e. the black region around the people, is not the same as the "background" we are talking here, but rather foreground.

In case there is transparency in the image, the background color is determined by the canvas you are drawing on. In your case that is white from the notebook. If you put the same image on a green canvas, the background will be green. Meaning, you are using the alpha channel here to mask out the black background the image originally has.

This issue is about what we want to do in case we are forced to convert from RGBA to RGB, but don't have canvas to draw on. PIL just strips the alpha channel and thus revealing all colors in the image that have been masked previously, i.e. the black regions in your case (note that there is also some white in the top left corner that you cannot see if you display the image on a white canvas).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants