[prototype] Optimize and clean up all affine methods#6945
[prototype] Optimize and clean up all affine methods#6945datumbox merged 18 commits intopytorch:mainfrom
Conversation
|
There are some related failures. I'll check them on Monday. |
| float32_vs_uint8=True, | ||
| # TODO: investigate | ||
| closeness_kwargs=pil_reference_pixel_difference(100, agg_method="mean"), | ||
| closeness_kwargs=pil_reference_pixel_difference(110, agg_method="mean"), |
There was a problem hiding this comment.
Flaky test unrelated to this PR that popped up previously on another PR:
Unrelated flakiness:
FAILED test/test_prototype_transforms_functional.py::TestKernels::test_against_reference[rotate_image_tensor-38] - AssertionError: The 'mean' of the absolute difference is 104.21571906354515, but only 100.0 is allowed.
There was a problem hiding this comment.
Here are some high-level notes on the changes. After this PR we would very minimally use _FT in our code-base. Some changes can be reverted or moved on FT stable if we want to minimize copy-pasted code.
I will follow up with benchmarks for the affected methods.
| raise ValueError(f"Interpolation mode '{interpolation}' is unsupported with Tensor input") | ||
|
|
||
|
|
||
| def _affine_grid( |
There was a problem hiding this comment.
This one is modified minimally. Only a rename (to match the naming of other methods such as _perspective_grid) and 1 tiny in-place that won't matter. Can be reverted.
| def _get_inverse_affine_matrix( | ||
| center: List[float], angle: float, translate: List[float], scale: float, shear: List[float], inverted: bool = True | ||
| ) -> List[float]: |
There was a problem hiding this comment.
Here we do some caching of intermediate results and minor refactoring (especially with the negative values) to make the code a bit more readable IMO. Can be reverted.
| return matrix | ||
|
|
||
|
|
||
| def _compute_affine_output_size(matrix: List[float], w: int, h: int) -> Tuple[int, int]: |
There was a problem hiding this comment.
aminmax call + a bunch of in-place ops to speed things up.
| return int(size[0]), int(size[1]) # w, h | ||
|
|
||
|
|
||
| def _apply_grid_transform( |
There was a problem hiding this comment.
We do in-place ops where possible, plus a bit of refactoring. Important note is that the input image must be float. This is because the handling/casting can be done more efficiently outside of the method.
| return float_img | ||
|
|
||
|
|
||
| def _assert_grid_transform_inputs( |
There was a problem hiding this comment.
Minor clean ups to make the if statements clearer.
| if ndim > 4: | ||
| image = image.reshape((-1,) + shape[-3:]) | ||
| needs_unsquash = True | ||
| elif ndim == 3: | ||
| image = image.unsqueeze(0) | ||
| needs_unsquash = True | ||
| else: | ||
| needs_unsquash = False |
There was a problem hiding this comment.
Adopting the idiom from other places to handle squashing. I do the same in all methods where possible below.
There was a problem hiding this comment.
Do we need the special casing for ndim == 3 here? Is there actually a significant perf difference between reshape and unsqueeze in this case?
There was a problem hiding this comment.
This is not a perf. This is because _apply_grid_transform assumes batched input.
There was a problem hiding this comment.
As discussed offline, this is copied from something so not a new idiom that will be introduced here. I'll look into it in a follow-up.
| dtype = image.dtype if fp else torch.float32 | ||
| theta = torch.tensor(matrix, dtype=dtype, device=image.device).reshape(1, 2, 3) | ||
| grid = _affine_grid(theta, w=width, h=height, ow=width, oh=height) | ||
| output = _apply_grid_transform(image if fp else image.to(dtype), grid, interpolation.value, fill=fill) |
There was a problem hiding this comment.
Porting the code over from _FT.affine to be able to use the optimized private methods defined above. I do the same to all other kernels where possible below.
| output = image | ||
| new_width, new_height = _compute_affine_output_size(matrix, width, height) if expand else (width, height) |
There was a problem hiding this comment.
Here I don't adopt the squash idiom from other places because it would lead to more complex logic. I have concerns about whether the implementation followed here (maintained from main) actually handles properly all corner-cases (ill formed images with 0 elements), similar to the issue observed with pad.
@vfdev-5 Might be worth talking a look on your side to see if a mitigation is necessary similar to #6949 (aka sending the image through the kernel normally).
| if coeffs is not None and len(coeffs) != 8: | ||
| raise ValueError("Argument coeffs should have 8 float values") |
There was a problem hiding this comment.
What is coeffs in an affine transformation? That seems out of place here.
There was a problem hiding this comment.
It's the perspective coeff. I'm not writing new code, I'm porting the existing methods here. I don't think changes of changing the validation should be in scope on this PR because it will get really fast really quickly.
There was a problem hiding this comment.
Let me send a follow-up PR then.
| if ndim > 4: | ||
| image = image.reshape((-1,) + shape[-3:]) | ||
| needs_unsquash = True | ||
| elif ndim == 3: | ||
| image = image.unsqueeze(0) | ||
| needs_unsquash = True | ||
| else: | ||
| needs_unsquash = False |
There was a problem hiding this comment.
Do we need the special casing for ndim == 3 here? Is there actually a significant perf difference between reshape and unsqueeze in this case?
pmeier
left a comment
There was a problem hiding this comment.
Unless benchmarks show a perf regression somewhere, LGTM if CI is green!
Summary: * Clean up `_get_inverse_affine_matrix` and `_compute_affine_output_size` * Optimize `_apply_grid_transform` * Cleanup `_assert_grid_transform_inputs` * Fix bugs on `_pad_with_scalar_fill` & `crop_mask` and port `crop_image_tensor` * Call directly `_pad_with_scalar_fill` * Fix linter * Clean up `center_crop_image_tensor` * Fix comments. * Fixing rounding issues. * Bumping tolerance for rotate which is unrelated to this PR. * Fix tolerance threshold for RandomPerspective. * Clean up `_affine_grid` and `affine_image_tensor` * Clean up `rotate_image_tensor` * Fixing linter * Address code-review comments. Reviewed By: YosuaMichael Differential Revision: D41376279 fbshipit-source-id: bc21d68d6374b39d78ae74ba01e49633b2d8d1e2
Related to #6818
Clean ups:
_get_inverse_affine_matrix_assert_grid_transform_inputs_perspective_grid_affine_bounding_box_xyxy_affine_gridPerfs:
_compute_affine_output_size_apply_grid_transformAffected methods (to benchmark):
rotate_image_tensorperspective_image_tensorelastic_image_tensoraffine_image_tensorBenchmarks:
The PR improves all the kernels across the board. There is only one reported slowdown on
elastic_image_tensor cpu torch.float32but these 2 runs have high std. I did a few consequent runs and it looks good. I think the changes look like they give us a 10-20% speed boost. We will measure it more carefully on the placeholder ticket.Before the PR (main branch):
This PR:
cc @vfdev-5 @bjuncek @pmeier