Skip to content

Remove unnecessary dispatcher/registration code #4205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

NicolasHug
Copy link
Member

@NicolasHug NicolasHug commented Jul 23, 2021

I don't really understand the initial goal for these registrations, so I might be missing something. But it looks like we can remove them.

@NicolasHug NicolasHug changed the title WIP NOMRG Remove some dead code Remove unnecessary dispatcher/registration code Jul 25, 2021
@NicolasHug NicolasHug marked this pull request as ready for review July 25, 2021 08:44
Copy link
Contributor

@datumbox datumbox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These registrations should not be removed to allow C++ programs access the backwards passes. Please check the original PRs where I introduced them for more context why they are necessary.

@NicolasHug
Copy link
Member Author

These registrations should not be removed to allow C++ programs access the backwards passes

Thanks for the context. I think this should be documented, and at the very least it should be tested against, if this is a feature we want to explicitely support. Such doc / testing would have saved me a lot of time.

Also, I might be missing something but it seems somewhat contradictory with this PR #3143 were it is said that we don't want the backward passes to be public.

Please check the original PRs where I introduced them for more context why they are necessary

I did, but could not find anything relevant. Would you mind pointing me towards those?

@NicolasHug
Copy link
Member Author

For example this was added in #2926 for ps_roi_align, but I can't find any reference to this in the PR or in the related issues. Same for DeformConv in #2898.

@datumbox
Copy link
Contributor

@NicolasHug Unfortunately I got limited internet access at the moment and it's a bit hard to find all the PRs/issues where it was decided to support backwards. I just saw that this PR removes lots of code which we probably want to keep and wanted to flag this so that we don't accidentally merge and break stuff. If @fmassa is around, perhaps he can give you the background easier.

Concerning documentation and testing, that's definitely an area that the C++ codebase can be improved. The aforementioned PRs targeted upgrading the ops code to use the latest dispatcher mechanism, so we used the existing testing infra. Feel free to propose ways to improve the code.

@fmassa
Copy link
Member

fmassa commented Jul 28, 2021

I had a chat with Nicolas about this earlier today.

Those functions used to be there for properly supporting double-backwards, see https://github.com/pytorch/vision/pull/2366/files#r447547554 for more context.

Given that we don't provide a double-backwards implementation for our operators, it should be possible for PyTorch to automatically register this function somehow. That's why there is this TODO in the code.

Except if something changed in PyTorch since last year, I think those functions might be needed (at least for double-backwards), although I would have expected that we could have compilation errors without them.

cc @ezyang for more insights on if we still need those for double-backwards or if PyTorch already handles it for us

@NicolasHug
Copy link
Member Author

It's highly likely that I'm missing something but the only way I'm able to hit the "double backward on ... not supported" on master is by doing:

import torch
from torch.autograd import grad
import torchvision
from torchvision.ops import roi_align

t = torch.rand(2, 3, 4, 5, requires_grad=True)
boxes = torch.rand(2, 5, requires_grad=True)

out = roi_align(t, boxes, output_size=1)

grad_t = grad(out.sum(), t, create_graph=True)[0]
grad(grad_t.sum(), boxes, allow_unused=True)

i.e. asking for d/dboxes (droi_align / dt), which doesn't seem to make much sense.

If we ask for d/dt (droi_align/dt), which seems to be a more likely use-case:

grad_t = grad(out.sum(), t, create_graph=True)[0]
grad(grad_t.sum(), t, allow_unused=True)

we get None.

In this PR's branch, however, both second-order derivatives will fail with:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

If the original goal was to avoid a segfault in such cases, maybe we could still remove this code: getting an error might actually be better than getting None?

@ezyang
Copy link
Contributor

ezyang commented Jul 30, 2021

OK, so there's like, a bunch of stuff here.

First, you don't get an error for d/dt (droi_align/dt) because you passed allow_unused=True. Of course if you don't pass that kwarg, you get the error that "One of the differentiated Tensors appears to not have been used in the graph."

Second, this error is correct. The derivative for roi_align has no dependence on the input in question. Intuitively, you can see this is the case because all roi_align forwards is doing is indexing (OK, interpolating, really) out points from within the RoI. The gradient contribution to these points is independent of what the values in the points are (which would have been the case if we were doing a multiply, e.g.). So of course autograd is going to complain in this case.

Third, while it's true d/dboxes (droi_align / dt) probably doesn't make too much sense by itself, my favorite "ordinary" use of second order derivatives is gradient penalty. So imagine a standard use of MaskRCNN, but with gradient penalty (let's forget, for a moment, whether or not gradient penalty is actually useful in this case). The loss will involve computations from roi_align, which will in turn rely on bounding boxes from the Region Proposal Network, which itself contains a bunch of weights. To penalize large gradients on the weights in the RPN, we would have to differentiate through roi_align with respect to dboxes.

Finally, when @soulitzer nails pytorch/pytorch#62032 we will be allowed to get rid of this redundant code; this is not done yet, however!

@NicolasHug
Copy link
Member Author

Thanks for the details @ezyang and @datumbox @fmassa for your input.
I'll close this PR then, and I'll keep an eye on pytorch/pytorch#62032 to properly get rid of these once we have a better support mechanism.

@NicolasHug NicolasHug closed this Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants