Skip to content

Support specifying output channels (RGB vs grayscale) for io.image.read_image #2948

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rwightman opened this issue Nov 2, 2020 · 3 comments · Fixed by #2988
Closed

Support specifying output channels (RGB vs grayscale) for io.image.read_image #2948

rwightman opened this issue Nov 2, 2020 · 3 comments · Fixed by #2988

Comments

@rwightman
Copy link
Contributor

🚀 Feature

Similar to channels argument for tf.io.decode_image or PIL.Image.open(..).convert('format') allow specifying the output channel 'format' and have a sensible conversion performed for you when decoding an image if set.

Motivation

Prevent the need to manually check and do color conversion from RGB -> grayscale or grayscale -> RGB when datasets often have mixes of RGB and grayscale images. The output color format of image loading in data pipeline needs to be consistent for many typical use cases.

The lower level libraries (libjpeg, libpng) etc have facilities for specifying the output color space and will do the conversion for you if you set it up (ie out_color_space for libjpeg). This is what tensorflow decode_image does and is the most efficient.

Pitch

Add a channels= arg to read_image that matches Tensorflow semantics.

  • channels=0 - leave as original, a grayscale image decodes to 1 channel out, rgb to 3
  • channels=1 - grayscale out
  • channels=3 - RGB out
  • channels=4 - RGBA out (PNG or formats that support alpha, not valid for jpeg)

Alternatives

Additional context

@datumbox
Copy link
Contributor

datumbox commented Nov 3, 2020

@rwightman Thanks for reporting.

This is also related to #2823.

@fmassa
Copy link
Member

fmassa commented Nov 5, 2020

Thanks for the feature request @rwightman !

I think this is a nice addition to have, and we will be looking into adding support for this feature in torchvision.
We are already planning a few improvements to PNG support as @datumbox mentioned, so we could go one step further and also allow for a finer-grained control on the output format for read_image.

@datumbox datumbox self-assigned this Nov 11, 2020
@datumbox
Copy link
Contributor

datumbox commented Nov 11, 2020

Since we now support palette, we need to adapt slightly the pitch:

  • channels=0 - leave as original (grayscale, palette, grayscale with alpha, rgb, rgb with alpha)
  • channels=1 - grayscale out
  • channels=2 - grayscale with alpha out (PNG or formats that support alpha, not valid for jpeg)
  • channels=3 - RGB out
  • channels=4 - RGBA out (PNG or formats that support alpha, not valid for jpeg)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants