replace tensor division with scalar division and tensor multiplication #6903

pmeier · 2022-11-03T21:46:49Z

As discussed in #6830 (comment), a tensor vision of a Python scalar is slower than inverting the Python scalar first and performing a tensor multiplication afterwards. The linked comment identified three places where we could use that optimization:

[------------------------------- posterize --------------------------------]
                                      |        main       |      perf-div   
1 threads: -----------------------------------------------------------------
      (3, 512, 512), float32, cpu     |   219 (+-  2) us  |   183 (+-  1) us
      (5, 3, 512, 512), float32, cpu  |  1208 (+- 66) us  |  1123 (+-180) us

Times are in microseconds (us).

[-------------------- convert_dtype (int -> float) --------------------]
                                    |     perf-div     |       main     
1 threads: -------------------------------------------------------------
      (3, 512, 512), uint8, cpu     |   95 (+-  2) us  |  132 (+-  3) us
      (5, 3, 512, 512), uint8, cpu  |  433 (+-  9) us  |  645 (+-  5) us

Times are in microseconds (us).

[----------------------------- adjust_hue -----------------------------]
                                    |     perf-div     |       main     
1 threads: -------------------------------------------------------------
      (3, 512, 512), uint8, cpu     |   15 (+-  1) ms  |   14 (+-  1) ms
      (5, 3, 512, 512), uint8, cpu  |   92 (+-  3) ms  |   90 (+-  4) ms

Times are in milliseconds (ms).

Performance improvement is significant for posterize and convert_dtype. For adjust_hue the change is within measuring tolerance. LMK if we still want this change there.

Apart from the ops above there are a few more places that divide by a Python scalar, but they are always accompanied by a floor rounding like

vision/torchvision/prototype/transforms/functional/_color.py

Line 418 in cb4413a

step = num_non_max_pixels.div_(255, rounding_mode="floor")

Since Tensor.mul does not have that option we need an additional .floor_() afterwards eliminating the gains.

cc @vfdev-5 @datumbox @bjuncek

NicolasHug · 2022-11-04T08:35:05Z

Are we sure it also provides the same results as a division?
I'm wondering because if this was a free win without trade-off, I assume it would have been implemented like that in core already?

pmeier · 2022-11-04T08:53:39Z

We have pretty extensive tests that make sure that it does. The only thing I needed to adapt was to allow some tolerances on the ConvertImageDtype consistency test (v1 vs. v2). But this is expected, since there might be some minor implementation differences between division and multiplying by the inverse.

TBH, the difference is really insignificant:

FAILED test/test_prototype_transforms_consistency.py::test_call_consistency[ConvertImageDtype-2] - AssertionError: Tensor image consistency check failed with: 

Tensor-likes are not equal!

Mismatched elements: 1392 / 2772 (50.2%)
Greatest absolute difference: 5.960464477539063e-08 at index (0, 0, 0, 0)
Greatest relative difference: 7.9162417294075e-08 at index (0, 0, 0, 31)

This failure is for torch.float32. The differences seem significant for torch.bfloat16, but this is only due to its abysmal precision in favor of a large value range. All differences are well below the already strict default tolerances of torch.testing.assert_close.

datumbox

LGTM, thnks!

torchvision/prototype/transforms/functional/_meta.py

…iplication (#6903) Summary: * replace tensor division with scalar division and tensor multiplication * fix consistency test tolerances Reviewed By: NicolasHug Differential Revision: D41265200 fbshipit-source-id: bb438d768ebe1137f84df559589ad19eb2e0fc9e

replace tensor division with scalar division and tensor multiplication

ac3ab81

pmeier added module: transforms Perf For performance improvements prototype labels Nov 3, 2022

pmeier requested review from vfdev-5 and datumbox November 3, 2022 21:46

facebook-github-bot added the cla signed label Nov 3, 2022

fix consistency test tolerances

d02d398

datumbox approved these changes Nov 4, 2022

View reviewed changes

datumbox reviewed Nov 4, 2022

View reviewed changes

torchvision/prototype/transforms/functional/_meta.py Show resolved Hide resolved

Merge branch 'main' into perf-div

9faf73a

pmeier merged commit 9b0da0c into pytorch:main Nov 4, 2022

pmeier deleted the perf-div branch November 4, 2022 09:54

datumbox mentioned this pull request Nov 8, 2022

Performance improvements for transforms v2 vs. v1 #6818

Closed

31 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replace tensor division with scalar division and tensor multiplication #6903

replace tensor division with scalar division and tensor multiplication #6903

pmeier commented Nov 3, 2022 •

edited by pytorch-bot bot

Loading

NicolasHug commented Nov 4, 2022

pmeier commented Nov 4, 2022 •

edited

Loading

datumbox left a comment

replace tensor division with scalar division and tensor multiplication #6903

replace tensor division with scalar division and tensor multiplication #6903

Conversation

pmeier commented Nov 3, 2022 • edited by pytorch-bot bot Loading

NicolasHug commented Nov 4, 2022

pmeier commented Nov 4, 2022 • edited Loading

datumbox left a comment

Choose a reason for hiding this comment

pmeier commented Nov 3, 2022 •

edited by pytorch-bot bot

Loading

pmeier commented Nov 4, 2022 •

edited

Loading