-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Negative Samples in Faster RCNN training results in NaN RPN_BOX_REG Loss #2144
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, having degenerate boxes (i.e., boxes for which If you don't have any boxes in your image, you should do instead the following, as explained in the release notes https://github.com/pytorch/vision/releases/tag/v0.6.0 boxes = torch.empty((0, 4), dtype=torch.float32) here is a more complete example vision/test/test_models_detection_negative_samples.py Lines 16 to 21 in f9ef235
|
Is there a particular reason to set image_id = 4? Or is this arbitrary? |
/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/_utils.py in decode(self, rel_codes, boxes)
174 box_sum += val
175 pred_boxes = self.decode_single(
--> 176 rel_codes.reshape(box_sum, -1), concat_boxes
177 )
178 return pred_boxes.reshape(box_sum, -1, 4)
RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous @fmassa But it throws this to me... |
More closely follows pytorch/vision#2144 and https://github.com/pytorch/vision/releases/tag/v0.6.0 for negative samples.
Uh oh!
There was an error while loading. Please reload this page.
Overview:
I updated torch and torchvision to the latest builds. A cool update was that now negative samples could be included in RCNN training. However, I end up getting a NaN value for loss_rpn_box_reg when I provide negative samples.
I was training a Pedestrian Detector. Based on my custom dataset input, if a label wasn't provided, I would use it as a negative sample. This is the code snippet I used.
The training seems to work fine if I replace the following line:
with
So i'm guessing it's because both xmin/ymin and xmax/ymax are equal.
Setup:
Torch : 1.5.0
Torchvision: 0.6.0
Nvidia - 440.33
Cuda-10.2
The text was updated successfully, but these errors were encountered: