Skip to content

A request: generalizing the design of affine transforms #7240

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
xvdp opened this issue Feb 14, 2023 · 1 comment
Open

A request: generalizing the design of affine transforms #7240

xvdp opened this issue Feb 14, 2023 · 1 comment

Comments

@xvdp
Copy link

xvdp commented Feb 14, 2023

🚀 The feature

Torchvision transformations contain legacy code related to PIL, making code a bit cumbersome, limited and containing special cases.

  1. PIL should be fully spit off in some form, maybe with overloads? If one werent concerned with legacy it ought to be completely removed.
  2. In future versions torch vision should be unified with higher dimensional vision algorithms.

PIL
On the image io side, PIL for only handles a limited number of basic formats and not the more interesting ones supporting hdr and floating point data: for instance .exr. While it is true that most data in the wild is .png or .jpg, this is constricting.

An example within the affine() function torchvision/transforms/functional.py

  • interpolation arg is typed as a legacy PIL enum containing interpolation modes not supported by torch interpolate function in torch/nn/functional.py
    This special case then requires another special case in the affine inside torchvision/transforms/functional_tensor.py which is not up to date with the torch interpolate function.
    Specifically the assert on line 611 (as of 2023.02.13 7074570)
    _assert_grid_transform_inputs(img, matrix, interpolation, fill, ["nearest", "bilinear"])
    Should this not support (nearest, area, bilinear, and bicubic)? without this blocking assert which looks derived from having to support both PIL.Image and Tensor in the same function.

There are other design choices that ought to be cleaned up such the same function as requiring Lists (excluding Tensors? what if I get the center from data.mean(axis-...)? and so on) for center, translate and shear, or angles being required in degrees instead of the native radians: if one has the angle in radians one will incur useless loss in the conversion and reconversion.

3d vision
Why 3d and 2d. Even though 2d has been a longtime research topic for DL vision, the full ML vision pipeline includes 3d since ever, as well homogeneous coordinate systems, the most used basic full camera matrix with radial and tangential distorsion as well as the annoying OGL protective space.

There is no reason why torchvision transformation code should only support images and not higher dimensional matrix operations.

I do understand that removing legacy is not simple and yet computer vision is more than uint8 images.

Motivation, pitch

static images, images in motion, images extracted from a 3d world or images projected into the 3d world have no substantive difference: they are all classical computer vision. The trend to re-unify these spaces is here, from 3d GANs to neural rendering.

Why should one not use pytorch3d instead? It, is also cumbersome in that again it considers 3d as separate from 2d and not a continuum in the field of computer vision.

Alternatives

to use drjit ?

Additional context

No response

@pmeier
Copy link
Collaborator

pmeier commented Feb 27, 2023

Hey @xvdp I tried to understand your points, but I'm not sure I got the spirit. To be frank, the comment reads more like a general rant rather than a feature request. I'll focus on a few points below that touch on affine transformations per title of the issue.

Torchvision transformations contain legacy code related to PIL
[...]
interpolation arg is typed as a legacy PIL enum containing interpolation modes not supported by torch interpolate

PIL is part of the core part of the API and not legacy in any way. We also don't support any PIL enum but have our own:

class InterpolationMode(Enum):

The three interpolation modes

# For PIL compatibility
BOX = "box"
HAMMING = "hamming"
LANCZOS = "lanczos"

are just added on top to enable selecting these interpolation modes when using PIL since the tensor backend does not support them.

On the image io side, PIL for only handles a limited number of basic formats and not the more interesting ones supporting hdr and floating point data: for instance .exr.

You are not restricted to use PIL for image I/O. Use whatever library you want to get the raw data and use torchvision as soon as you have a tensor. Most libraries in the PyData ecosystem support conversion to numpy. If you have a np.ndarray, you can just use torch.from_numpy to create a tensor from it.

Plus, the tensor backend of our transformations fully supports floating point images. Be aware that the implicit assumption is that values are in the [0.0, 1.0] range. That shouldn't make a difference for affine ops, but can lead to errors when color ops are used.

There are other design choices that ought to be cleaned up such the same function as requiring Lists (excluding Tensors? what if I get the center from data.mean(axis-...)? and so on) for center, translate and shear,

You can just call .tolist() on any tensor to convert it into a list. Even if we would allow passing tensors there, internally we would need to deconstruct anyway eliminating any performance benefits.

angles being required in degrees instead of the native radians: if one has the angle in radians one will incur useless loss in the conversion and reconversion.

That works both ways. What if you have degrees and we require radians?


To summarize some of the suggestions:

  • Remove PIL support? Not going to happen anytime soon (if ever). The tensor backend is available for quite some time and still the PIL backend is dominant. A handful of operators are currently an order of magnitude faster then the tensor backend counterpart: Performance improvements for transforms v2 vs. v1 #6818 (comment). We are on that, but it takes time.
  • Add support for bicubic interpolation to the affine transformations? Seems like a reasonable suggestion. AFAIK this was not supported in the past, which is why you see the restricting assertion in the implementation. cc @vfdev-5 => Add new interpolation modes to grid_sample pytorch#25039
  • Allow tensors as parameters to affine ops: There is an issue that requests just that, but for a different reason. The request is about being able to backprop through these parameters, which is why a tensor is needed. @vfdev-5 Could you link that? I was unable to find it. => [RFC] Make transforms.functional methods differential w.r.t. their parameters #5157
  • Change the "format" of a parameter like degrees to radians? Not going to happen for BC reasons unless there is a very good reason for it. Saving two scalar multiplications is not one of them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants