Skip to content

Add keypoint transform computation #1118

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

gheaeckkseqrz
Copy link

This is draft for #523
I'm gonna need that for data augmentation on a object detection problem, using Pascal VOC Detection.

This is a first draft, should mostly work, but has not been properly tested yet (just some quick visual check making sure the transformed bounding boxes look ok). I'll try adding some unit testing once I've figured out how to do so in python. I'm just trying to gather some early feedback.

This should not break any interface, so previous code should still work with no difference in behaviour.

Copy link
Member

@fmassa fmassa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Thanks for the PR!

Adding support for keypoints and masks / boxes is a larger discussion altogether, and we have not yet reached a consensus on what the API should look like.

As such, we are not going to be accepting this PR in it's current state.

Ideally, we should first have a discussion on an issue to propose an API, and their drawbacks.
We have done this back and forth for a while now, and there has been not really a great option that stood up as being generic and simpler than the others.

In particular, I think that we should not be specializing for keypoints / masks / boxes / etc in the transform code. This makes maintenance much harder if we want to extend it for more types.

One thing I was leaning towards was to have a base class for data types, that would know how to transform themselves, so that the implementations in transforms just call the method .resize of the object, for example, and everything works, similarly to how I did in maskrcnn-benchmark. This way, all you need to do is to check that your object inherits from the base DataType. This allows adding support for more data types very easily, and keeps the implementation simple.

But I'm not yet clear on all the details of how things should be implemented, and I haven't had the chance to think about it more yet as I'm working on some other things now (i.e., video).

Thanks again for the PR, but without some more design discussion we will be unable to move forward with this PR.

@codecov-io
Copy link

codecov-io commented Jul 15, 2019

Codecov Report

Merging #1118 into master will decrease coverage by 0.25%.
The diff coverage is 72.12%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1118      +/-   ##
==========================================
- Coverage   64.90%   64.65%   -0.26%     
==========================================
  Files          68       68              
  Lines        5417     5497      +80     
  Branches      835      850      +15     
==========================================
+ Hits         3516     3554      +38     
- Misses       1645     1680      +35     
- Partials      256      263       +7     
Impacted Files Coverage Δ
torchvision/transforms/transforms.py 76.91% <72.12%> (-4.63%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b73a48...aebe983. Read the comment docs.

@gheaeckkseqrz
Copy link
Author

Hi @fmassa ,

I understand this kind of functionality needs proper discusion before getting implemented.
I'd like to get involved in those design discussion if possible. Where / How is that hapening ?

In the meantime, what is the best way of doing data augmentation for detection dataset (in my case VOCDetection) ?

Also, should I close the PR, or can it be useful in the future ?

Best,
Pierre

@fmassa
Copy link
Member

fmassa commented Jul 15, 2019

Hi Pierre,

Currently, data augmentation for detection is currently handled in https://github.com/pytorch/vision/blob/master/references/detection/transforms.py , and is on purpose outside of the importable part of torchvision so that we provide some sort of functionality, but don't commit on an API just yet.

You can extend it there for your needs.

About the design discussion for the transforms, you can maybe open an issue and we can have the discussion there. I've already exposed some points in my reply to your PR, and there are a number of other things to be discussed as well, so if you could open an issue with a potential proposal and the points we discussed here it would be a great start.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants