-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add keypoint transform computation #1118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Thanks for the PR!
Adding support for keypoints and masks / boxes is a larger discussion altogether, and we have not yet reached a consensus on what the API should look like.
As such, we are not going to be accepting this PR in it's current state.
Ideally, we should first have a discussion on an issue to propose an API, and their drawbacks.
We have done this back and forth for a while now, and there has been not really a great option that stood up as being generic and simpler than the others.
In particular, I think that we should not be specializing for keypoints
/ masks
/ boxes
/ etc in the transform code. This makes maintenance much harder if we want to extend it for more types.
One thing I was leaning towards was to have a base class for data types, that would know how to transform themselves, so that the implementations in transforms
just call the method .resize
of the object, for example, and everything works, similarly to how I did in maskrcnn-benchmark
. This way, all you need to do is to check that your object inherits from the base DataType
. This allows adding support for more data types very easily, and keeps the implementation simple.
But I'm not yet clear on all the details of how things should be implemented, and I haven't had the chance to think about it more yet as I'm working on some other things now (i.e., video).
Thanks again for the PR, but without some more design discussion we will be unable to move forward with this PR.
Codecov Report
@@ Coverage Diff @@
## master #1118 +/- ##
==========================================
- Coverage 64.90% 64.65% -0.26%
==========================================
Files 68 68
Lines 5417 5497 +80
Branches 835 850 +15
==========================================
+ Hits 3516 3554 +38
- Misses 1645 1680 +35
- Partials 256 263 +7
Continue to review full report at Codecov.
|
Hi @fmassa , I understand this kind of functionality needs proper discusion before getting implemented. In the meantime, what is the best way of doing data augmentation for detection dataset (in my case VOCDetection) ? Also, should I close the PR, or can it be useful in the future ? Best, |
Hi Pierre, Currently, data augmentation for detection is currently handled in https://github.com/pytorch/vision/blob/master/references/detection/transforms.py , and is on purpose outside of the importable part of torchvision so that we provide some sort of functionality, but don't commit on an API just yet. You can extend it there for your needs. About the design discussion for the transforms, you can maybe open an issue and we can have the discussion there. I've already exposed some points in my reply to your PR, and there are a number of other things to be discussed as well, so if you could open an issue with a potential proposal and the points we discussed here it would be a great start. |
This is draft for #523
I'm gonna need that for data augmentation on a object detection problem, using Pascal VOC Detection.
This is a first draft, should mostly work, but has not been properly tested yet (just some quick visual check making sure the transformed bounding boxes look ok). I'll try adding some unit testing once I've figured out how to do so in python. I'm just trying to gather some early feedback.
This should not break any interface, so previous code should still work with no difference in behaviour.