Negative Samples in Faster RCNN training results in NaN RPN_BOX_REG Loss #2144

praneet195 · 2020-04-24T23:00:57Z

Overview:
I updated torch and torchvision to the latest builds. A cool update was that now negative samples could be included in RCNN training. However, I end up getting a NaN value for loss_rpn_box_reg when I provide negative samples.

I was training a Pedestrian Detector. Based on my custom dataset input, if a label wasn't provided, I would use it as a negative sample. This is the code snippet I used.

    def __getitem__(self, idx):
        img_path , x1 , y1 , x2 ,y2 , label = self.imgs[idx].split(",")
        img = Image.open(img_path).convert("RGB")
        boxes = []
        if label:
            pos = np.asarray([[y1,y2],[x1,x2]]).astype(np.float)
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])
            labels = torch.ones((1,), dtype=torch.int64)
            iscrowd = torch.zeros((1,), dtype=torch.int64)
        else:
            boxes.append([0.0,0.0,0.0,0.0])
            labels = torch.zeros((1,), dtype=torch.int64)
            iscrowd = torch.zeros((0,), dtype=torch.int64)
        # convert everything into a torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd
        if self.transforms is not None:
            img, target = self.transforms(img, target)
        return img, target

The training seems to work fine if I replace the following line:

boxes.append([0.0,0.0,0.0,0.0])

with

boxes.append([0.0,0.0,0.1,0.1])

So i'm guessing it's because both xmin/ymin and xmax/ymax are equal.

Setup:
Torch : 1.5.0
Torchvision: 0.6.0
Nvidia - 440.33
Cuda-10.2

The text was updated successfully, but these errors were encountered:

fmassa · 2020-04-30T11:02:02Z

Yes, having degenerate boxes (i.e., boxes for which xmin <= xmax or ymin <= ymax) does yield NaN during training.

If you don't have any boxes in your image, you should do instead the following, as explained in the release notes https://github.com/pytorch/vision/releases/tag/v0.6.0

boxes = torch.empty((0, 4), dtype=torch.float32)

here is a more complete example

vision/test/test_models_detection_negative_samples.py

Lines 16 to 21 in f9ef235

    
           boxes = torch.zeros((0, 4), dtype=torch.float32) 
        
           negative_target = {"boxes": boxes, 
        
                              "labels": torch.zeros(0, dtype=torch.int64), 
        
                              "image_id": 4, 
        
                              "area": (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]), 
        
                              "iscrowd": torch.zeros((0,), dtype=torch.int64)}

ruotaozhang · 2020-11-06T21:06:59Z

Is there a particular reason to set image_id = 4? Or is this arbitrary?

LogicNg · 2021-07-18T10:33:33Z

Yes, having degenerate boxes (i.e., boxes for which xmin <= xmax or ymin <= ymax) does yield NaN during training.

If you don't have any boxes in your image, you should do instead the following, as explained in the release notes https://github.com/pytorch/vision/releases/tag/v0.6.0
boxes = torch.empty((0, 4), dtype=torch.float32)
here is a more complete example

vision/test/test_models_detection_negative_samples.py

Lines 16 to 21 in f9ef235

boxes = torch.zeros((0, 4), dtype=torch.float32)

negative_target = {"boxes": boxes,

"labels": torch.zeros(0, dtype=torch.int64),

"image_id": 4,

"area": (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]),

"iscrowd": torch.zeros((0,), dtype=torch.int64)}

/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/_utils.py in decode(self, rel_codes, boxes)
    174             box_sum += val
    175         pred_boxes = self.decode_single(
--> 176             rel_codes.reshape(box_sum, -1), concat_boxes
    177         )
    178         return pred_boxes.reshape(box_sum, -1, 4)

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

@fmassa But it throws this to me...

fmassa · 2021-08-13T13:19:57Z

@ruotaozhang

Is there a particular reason to set image_id = 4? Or is this arbitrary?

It's arbitrary

@LogicNg This error should have been fixed in #3205 , could you try a newer version of torchvision?

More closely follows pytorch/vision#2144 and https://github.com/pytorch/vision/releases/tag/v0.6.0 for negative samples.

vincentqb assigned fmassa Apr 25, 2020

fmassa closed this as completed Apr 30, 2020

fmassa added module: models question topic: object detection labels Apr 30, 2020

bw4sz mentioned this issue Aug 16, 2021

Tiles without annotations are not supported weecology/DeepForest#216

Closed

bw4sz added a commit to weecology/DeepForest that referenced this issue Jul 17, 2024

Update dataset.py

5b8164b

More closely follows pytorch/vision#2144 and https://github.com/pytorch/vision/releases/tag/v0.6.0 for negative samples.

bw4sz mentioned this issue Jul 17, 2024

Make negative labels more closely match torchvision release notes. weecology/DeepForest#721

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Negative Samples in Faster RCNN training results in NaN RPN_BOX_REG Loss #2144

Negative Samples in Faster RCNN training results in NaN RPN_BOX_REG Loss #2144

praneet195 commented Apr 24, 2020 •

edited

Loading

fmassa commented Apr 30, 2020

Uh oh!

ruotaozhang commented Nov 6, 2020

Uh oh!

LogicNg commented Jul 18, 2021 •

edited

Loading

Uh oh!

fmassa commented Aug 13, 2021

Uh oh!

Negative Samples in Faster RCNN training results in NaN RPN_BOX_REG Loss #2144

Negative Samples in Faster RCNN training results in NaN RPN_BOX_REG Loss #2144

Comments

praneet195 commented Apr 24, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

fmassa commented Apr 30, 2020

Uh oh!

ruotaozhang commented Nov 6, 2020

Uh oh!

LogicNg commented Jul 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmassa commented Aug 13, 2021

Uh oh!

praneet195 commented Apr 24, 2020 •

edited

Loading

LogicNg commented Jul 18, 2021 •

edited

Loading