-
Notifications
You must be signed in to change notification settings - Fork 7.1k
A request: generalizing the design of affine transforms #7240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey @xvdp I tried to understand your points, but I'm not sure I got the spirit. To be frank, the comment reads more like a general rant rather than a feature request. I'll focus on a few points below that touch on affine transformations per title of the issue.
PIL is part of the core part of the API and not legacy in any way. We also don't support any
The three interpolation modes vision/torchvision/transforms/functional.py Lines 31 to 34 in af04819
are just added on top to enable selecting these interpolation modes when using
You are not restricted to use Plus, the tensor backend of our transformations fully supports floating point images. Be aware that the implicit assumption is that values are in the
You can just call
That works both ways. What if you have degrees and we require radians? To summarize some of the suggestions:
|
🚀 The feature
Torchvision transformations contain legacy code related to PIL, making code a bit cumbersome, limited and containing special cases.
PIL
On the image io side, PIL for only handles a limited number of basic formats and not the more interesting ones supporting hdr and floating point data: for instance .exr. While it is true that most data in the wild is .png or .jpg, this is constricting.
An example within the affine() function
torchvision/transforms/functional.py
torch/nn/functional.py
This special case then requires another special case in the affine inside torchvision/transforms/functional_tensor.py which is not up to date with the torch interpolate function.
Specifically the assert on line 611 (as of 2023.02.13 7074570)
_assert_grid_transform_inputs(img, matrix, interpolation, fill, ["nearest", "bilinear"])
Should this not support (nearest, area, bilinear, and bicubic)? without this blocking assert which looks derived from having to support both PIL.Image and Tensor in the same function.
There are other design choices that ought to be cleaned up such the same function as requiring Lists (excluding Tensors? what if I get the center from data.mean(axis-...)? and so on) for center, translate and shear, or angles being required in degrees instead of the native radians: if one has the angle in radians one will incur useless loss in the conversion and reconversion.
3d vision
Why 3d and 2d. Even though 2d has been a longtime research topic for DL vision, the full ML vision pipeline includes 3d since ever, as well homogeneous coordinate systems, the most used basic full camera matrix with radial and tangential distorsion as well as the annoying OGL protective space.
There is no reason why torchvision transformation code should only support images and not higher dimensional matrix operations.
I do understand that removing legacy is not simple and yet computer vision is more than uint8 images.
Motivation, pitch
static images, images in motion, images extracted from a 3d world or images projected into the 3d world have no substantive difference: they are all classical computer vision. The trend to re-unify these spaces is here, from 3d GANs to neural rendering.
Why should one not use pytorch3d instead? It, is also cumbersome in that again it considers 3d as separate from 2d and not a continuum in the field of computer vision.
Alternatives
to use drjit ?
Additional context
No response
The text was updated successfully, but these errors were encountered: