Skip to content

replace new_like with wrap_like #6718

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 7, 2022
Merged

replace new_like with wrap_like #6718

merged 6 commits into from
Oct 7, 2022

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Oct 7, 2022

Throughout this comment I'm using Image as proxy for all of our features for simplicity


This is my take on reducing the overhead transforms v2 has. Currently, we are using the idiom

return Image.new_like(self, output)

everywhere to wrap a plain tensor into the image feature. Doing so results in multiple __torch_function__ calls as detailed in #6681. Similar to the constructor, the Image.new_like method accepts arbitrary data: Any as input and thus has to go through the constructor every time.

However, we never call it without a tensor input. Plus, whenever we pass dtype to Image.new_like, it is not to change the dtype of the tensor to be wrapped, but rather to retain it:

return BoundingBox.new_like(self, output, dtype=output.dtype)

Taking this one step further, this also means that the new_like name is somewhat misleading. Yes, one gets a new features.Image object, but unlike the torch.*_like methods, we don't get a new storage unless the dtype or device is changed.

This PR proposed to fix the above by refactoring Image.new_like to Image.wrap_like. Opposed to new_like, wrap_like only takes a tensor to be wrapped as well the metadata for the specific type, i.e. color_space for features.Image. This prevents the need to go through the constructor and results in no __torch_function__ calls at all:

import unittest.mock

import torch
from torchvision.prototype import features

image = features.Image(torch.rand(3, 16, 16))

with unittest.mock.patch(
    "torchvision.prototype.features._feature._Feature.__torch_function__", side_effect=AssertionError
):
    # This has to be `.new_like` on `main` and `.wrap_like` on the PR
    features.Image.wrap_like(image, torch.rand(3, 16, 16))

We can estimate the impact of this change on one classification training:

from time import perf_counter_ns

import torch
from torchvision.prototype import features

# @datumbox, @vfdev-5: please let me know if these assumptions don't reflect reality
# This comes from @vfdev-5's benchmarks
num_calls_per_sample = 20
# Number of samples in imagenet training set
num_samples_per_epoch = 1_200_000
num_epochs = 600
# The wrapping happens in the transforms pipeline, i.e. on each worker individually. 
# Thus, each worker only has a fraction of samples to process
num_processes = 8

input = features.Image(torch.rand(3, 512, 512))
output = torch.rand(3, 512, 512)


time_diffs = []
for _ in range(1000):
    time_diff_per_sample = 0
    for _ in range(num_calls_per_sample):
        start = perf_counter_ns()
        # This has to be `.new_like` on `main` and `.wrap_like` on the PR
        features.Image.wrap_like(input, output)
        stop = perf_counter_ns()
        time_diff_per_sample += stop - start
    time_diffs.append(time_diff_per_sample)

overhead_per_sample = float(torch.tensor(time_diffs).to(torch.float64).median()) * 1e-9
print(f"Overhead per sample: {overhead_per_sample*1e6:5.1f} µs")

estimated_overhead_per_training = overhead_per_sample * num_samples_per_epoch * num_epochs / num_processes
print(f"Estimated overhead per training: {estimated_overhead_per_training / 60 / 60:.1f} h")
Overhead per sample:  21.2 µs
Estimated overhead per training: 0.5 h

Although the overhead is quite low with roughly 20 µs per sample or 1 µs per call, the enormous amount of calls during a full training blows this up to a significant increase. However, running the same benchmark on main yields

Overhead per sample: 208.3 µs
Estimated overhead per training: 5.2 h

To put it in words, this PR achieves roughly a 10x reduction of the overhead. And the only thing we lose is the ability to pass arbitrary data to the wrapping function or change the dtype and device in the process. Both of which we don't do and I currently don't see a use case for it either.

Note that this doesn't affect the ability to pass arbitrary data to the constructor. This is still supported. Plus, in contrast to the proposed wrap_like function, the constructor may also process the metadata, like guessing the color_space if none is passed to features.Image, while wrap_like only takes the correct type or takes the value from the reference.

Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @pmeier! I only have a couple of nit comments but nothing major.

@vfdev-5 my understanding is that it's still worth proceeding with some of the ideas from #6681 to reduce the __torch_function__ calls. Could you confirm that this PR doesn't affect the approach you are favouring to solve this?

Copy link
Collaborator

@vfdev-5 vfdev-5 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, let's move on !
Thanks @pmeier

pmeier added 3 commits October 7, 2022 16:34
Conflicts:
	torchvision/prototype/transforms/_auto_augment.py
	torchvision/prototype/transforms/_color.py
	torchvision/prototype/transforms/functional/_augment.py
@pmeier
Copy link
Collaborator Author

pmeier commented Oct 7, 2022

As discussed offline, there are multiple ways we can approach the interface design. Nothing is set in stone here. We'll move with what I have proposed with the strong possibility of refactoring later. The performance gain is too high to hold this up with bike shedding.

@datumbox datumbox merged commit 4c049ca into pytorch:main Oct 7, 2022
facebook-github-bot pushed a commit that referenced this pull request Oct 17, 2022
Summary:
* replace new_like with wrap_like

* fix videos

* revert casting in favor of ignoring mypy

Reviewed By: NicolasHug

Differential Revision: D40427465

fbshipit-source-id: 04b854225fe6a886cbe468b1277a0b73ca273885
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants