replace new_like with wrap_like #6718

pmeier · 2022-10-07T08:05:01Z

Throughout this comment I'm using Image as proxy for all of our features for simplicity

This is my take on reducing the overhead transforms v2 has. Currently, we are using the idiom

vision/torchvision/prototype/features/_image.py

Line 136 in 7eb5d7f

return Image.new_like(self, output)

everywhere to wrap a plain tensor into the image feature. Doing so results in multiple __torch_function__ calls as detailed in #6681. Similar to the constructor, the Image.new_like method accepts arbitrary data: Any as input and thus has to go through the constructor every time.

However, we never call it without a tensor input. Plus, whenever we pass dtype to Image.new_like, it is not to change the dtype of the tensor to be wrapped, but rather to retain it:

vision/torchvision/prototype/features/_bounding_box.py

Line 158 in 7eb5d7f

return BoundingBox.new_like(self, output, dtype=output.dtype)

Taking this one step further, this also means that the new_like name is somewhat misleading. Yes, one gets a new features.Image object, but unlike the torch.*_like methods, we don't get a new storage unless the dtype or device is changed.

This PR proposed to fix the above by refactoring Image.new_like to Image.wrap_like. Opposed to new_like, wrap_like only takes a tensor to be wrapped as well the metadata for the specific type, i.e. color_space for features.Image. This prevents the need to go through the constructor and results in no __torch_function__ calls at all:

import unittest.mock

import torch
from torchvision.prototype import features

image = features.Image(torch.rand(3, 16, 16))

with unittest.mock.patch(
    "torchvision.prototype.features._feature._Feature.__torch_function__", side_effect=AssertionError
):
    # This has to be `.new_like` on `main` and `.wrap_like` on the PR
    features.Image.wrap_like(image, torch.rand(3, 16, 16))

We can estimate the impact of this change on one classification training:

from time import perf_counter_ns

import torch
from torchvision.prototype import features

# @datumbox, @vfdev-5: please let me know if these assumptions don't reflect reality
# This comes from @vfdev-5's benchmarks
num_calls_per_sample = 20
# Number of samples in imagenet training set
num_samples_per_epoch = 1_200_000
num_epochs = 600
# The wrapping happens in the transforms pipeline, i.e. on each worker individually. 
# Thus, each worker only has a fraction of samples to process
num_processes = 8

input = features.Image(torch.rand(3, 512, 512))
output = torch.rand(3, 512, 512)


time_diffs = []
for _ in range(1000):
    time_diff_per_sample = 0
    for _ in range(num_calls_per_sample):
        start = perf_counter_ns()
        # This has to be `.new_like` on `main` and `.wrap_like` on the PR
        features.Image.wrap_like(input, output)
        stop = perf_counter_ns()
        time_diff_per_sample += stop - start
    time_diffs.append(time_diff_per_sample)

overhead_per_sample = float(torch.tensor(time_diffs).to(torch.float64).median()) * 1e-9
print(f"Overhead per sample: {overhead_per_sample*1e6:5.1f} µs")

estimated_overhead_per_training = overhead_per_sample * num_samples_per_epoch * num_epochs / num_processes
print(f"Estimated overhead per training: {estimated_overhead_per_training / 60 / 60:.1f} h")

Overhead per sample:  21.2 µs
Estimated overhead per training: 0.5 h

Although the overhead is quite low with roughly 20 µs per sample or 1 µs per call, the enormous amount of calls during a full training blows this up to a significant increase. However, running the same benchmark on main yields

Overhead per sample: 208.3 µs
Estimated overhead per training: 5.2 h

To put it in words, this PR achieves roughly a 10x reduction of the overhead. And the only thing we lose is the ability to pass arbitrary data to the wrapping function or change the dtype and device in the process. Both of which we don't do and I currently don't see a use case for it either.

Note that this doesn't affect the ability to pass arbitrary data to the constructor. This is still supported. Plus, in contrast to the proposed wrap_like function, the constructor may also process the metadata, like guessing the color_space if none is passed to features.Image, while wrap_like only takes the correct type or takes the value from the reference.

datumbox

LGTM, thanks @pmeier! I only have a couple of nit comments but nothing major.

@vfdev-5 my understanding is that it's still worth proceeding with some of the ideas from #6681 to reduce the __torch_function__ calls. Could you confirm that this PR doesn't affect the approach you are favouring to solve this?

torchvision/prototype/features/_bounding_box.py

torchvision/prototype/transforms/_meta.py

torchvision/prototype/features/_mask.py

torchvision/prototype/features/_image.py

vfdev-5

lgtm, let's move on !
Thanks @pmeier

Conflicts: torchvision/prototype/transforms/_auto_augment.py torchvision/prototype/transforms/_color.py torchvision/prototype/transforms/functional/_augment.py

…wrap-feature

pmeier · 2022-10-07T14:42:08Z

As discussed offline, there are multiple ways we can approach the interface design. Nothing is set in stone here. We'll move with what I have proposed with the strong possibility of refactoring later. The performance gain is too high to hold this up with bike shedding.

Summary: * replace new_like with wrap_like * fix videos * revert casting in favor of ignoring mypy Reviewed By: NicolasHug Differential Revision: D40427465 fbshipit-source-id: 04b854225fe6a886cbe468b1277a0b73ca273885

replace new_like with wrap_like

0c469ac

pmeier added module: transforms Perf For performance improvements prototype module: tv_tensors labels Oct 7, 2022

pmeier requested review from vfdev-5 and datumbox October 7, 2022 08:05

facebook-github-bot added the cla signed label Oct 7, 2022

Merge branch 'main' into wrap-feature

c9fd7a4

datumbox approved these changes Oct 7, 2022

View reviewed changes

torchvision/prototype/features/_bounding_box.py Show resolved Hide resolved

torchvision/prototype/transforms/_meta.py Outdated Show resolved Hide resolved

vfdev-5 reviewed Oct 7, 2022

View reviewed changes

torchvision/prototype/features/_mask.py Show resolved Hide resolved

vfdev-5 reviewed Oct 7, 2022

View reviewed changes

torchvision/prototype/features/_image.py Show resolved Hide resolved

pmeier mentioned this pull request Oct 7, 2022

add Video feature and kernels #6667

Merged

vfdev-5 approved these changes Oct 7, 2022

View reviewed changes

pmeier added 3 commits October 7, 2022 16:34

Merge branch 'main'

658b3f3

Conflicts: torchvision/prototype/transforms/_auto_augment.py torchvision/prototype/transforms/_color.py torchvision/prototype/transforms/functional/_augment.py

fix videos

e69f20a

Merge branch 'wrap-feature' of https://github.com/pmeier/vision into …

08722ef

…wrap-feature

revert casting in favor of ignoring mypy

fcada94

datumbox merged commit 4c049ca into pytorch:main Oct 7, 2022

vfdev-5 mentioned this pull request Oct 10, 2022

[proto] Reduce number of calls of __torch_function__ #6681

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

replace new_like with wrap_like #6718

replace new_like with wrap_like #6718

Uh oh!

pmeier commented Oct 7, 2022 •

edited

Loading

Uh oh!

datumbox left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vfdev-5 left a comment

Uh oh!

pmeier commented Oct 7, 2022

Uh oh!

Uh oh!

replace new_like with wrap_like #6718

replace new_like with wrap_like #6718

Uh oh!

Conversation

pmeier commented Oct 7, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

datumbox left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

vfdev-5 left a comment

Choose a reason for hiding this comment

Uh oh!

pmeier commented Oct 7, 2022

Uh oh!

Uh oh!

pmeier commented Oct 7, 2022 •

edited

Loading