-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Remove unnecessary dispatcher/registration code #4205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These registrations should not be removed to allow C++ programs access the backwards passes. Please check the original PRs where I introduced them for more context why they are necessary.
Thanks for the context. I think this should be documented, and at the very least it should be tested against, if this is a feature we want to explicitely support. Such doc / testing would have saved me a lot of time. Also, I might be missing something but it seems somewhat contradictory with this PR #3143 were it is said that we don't want the backward passes to be public.
I did, but could not find anything relevant. Would you mind pointing me towards those? |
@NicolasHug Unfortunately I got limited internet access at the moment and it's a bit hard to find all the PRs/issues where it was decided to support backwards. I just saw that this PR removes lots of code which we probably want to keep and wanted to flag this so that we don't accidentally merge and break stuff. If @fmassa is around, perhaps he can give you the background easier. Concerning documentation and testing, that's definitely an area that the C++ codebase can be improved. The aforementioned PRs targeted upgrading the ops code to use the latest dispatcher mechanism, so we used the existing testing infra. Feel free to propose ways to improve the code. |
I had a chat with Nicolas about this earlier today. Those functions used to be there for properly supporting double-backwards, see https://github.com/pytorch/vision/pull/2366/files#r447547554 for more context. Given that we don't provide a double-backwards implementation for our operators, it should be possible for PyTorch to automatically register this function somehow. That's why there is this Except if something changed in PyTorch since last year, I think those functions might be needed (at least for double-backwards), although I would have expected that we could have compilation errors without them. cc @ezyang for more insights on if we still need those for double-backwards or if PyTorch already handles it for us |
It's highly likely that I'm missing something but the only way I'm able to hit the "double backward on ... not supported" on import torch
from torch.autograd import grad
import torchvision
from torchvision.ops import roi_align
t = torch.rand(2, 3, 4, 5, requires_grad=True)
boxes = torch.rand(2, 5, requires_grad=True)
out = roi_align(t, boxes, output_size=1)
grad_t = grad(out.sum(), t, create_graph=True)[0]
grad(grad_t.sum(), boxes, allow_unused=True) i.e. asking for If we ask for grad_t = grad(out.sum(), t, create_graph=True)[0]
grad(grad_t.sum(), t, allow_unused=True) we get In this PR's branch, however, both second-order derivatives will fail with:
If the original goal was to avoid a segfault in such cases, maybe we could still remove this code: getting an error might actually be better than getting |
OK, so there's like, a bunch of stuff here. First, you don't get an error for Second, this error is correct. The derivative for roi_align has no dependence on the input in question. Intuitively, you can see this is the case because all roi_align forwards is doing is indexing (OK, interpolating, really) out points from within the RoI. The gradient contribution to these points is independent of what the values in the points are (which would have been the case if we were doing a multiply, e.g.). So of course autograd is going to complain in this case. Third, while it's true d/dboxes (droi_align / dt) probably doesn't make too much sense by itself, my favorite "ordinary" use of second order derivatives is gradient penalty. So imagine a standard use of MaskRCNN, but with gradient penalty (let's forget, for a moment, whether or not gradient penalty is actually useful in this case). The loss will involve computations from roi_align, which will in turn rely on bounding boxes from the Region Proposal Network, which itself contains a bunch of weights. To penalize large gradients on the weights in the RPN, we would have to differentiate through roi_align with respect to dboxes. Finally, when @soulitzer nails pytorch/pytorch#62032 we will be allowed to get rid of this redundant code; this is not done yet, however! |
Thanks for the details @ezyang and @datumbox @fmassa for your input. |
I don't really understand the initial goal for these registrations, so I might be missing something. But it looks like we can remove them.