Skip to content

Negative Samples in Faster RCNN training results in NaN RPN_BOX_REG Loss #2144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
praneet195 opened this issue Apr 24, 2020 · 4 comments
Closed

Comments

@praneet195
Copy link

praneet195 commented Apr 24, 2020

Overview:
I updated torch and torchvision to the latest builds. A cool update was that now negative samples could be included in RCNN training. However, I end up getting a NaN value for loss_rpn_box_reg when I provide negative samples.

I was training a Pedestrian Detector. Based on my custom dataset input, if a label wasn't provided, I would use it as a negative sample. This is the code snippet I used.

    def __getitem__(self, idx):
        img_path , x1 , y1 , x2 ,y2 , label = self.imgs[idx].split(",")
        img = Image.open(img_path).convert("RGB")
        boxes = []
        if label:
            pos = np.asarray([[y1,y2],[x1,x2]]).astype(np.float)
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])
            labels = torch.ones((1,), dtype=torch.int64)
            iscrowd = torch.zeros((1,), dtype=torch.int64)
        else:
            boxes.append([0.0,0.0,0.0,0.0])
            labels = torch.zeros((1,), dtype=torch.int64)
            iscrowd = torch.zeros((0,), dtype=torch.int64)
        # convert everything into a torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd
        if self.transforms is not None:
            img, target = self.transforms(img, target)
        return img, target

The training seems to work fine if I replace the following line:

boxes.append([0.0,0.0,0.0,0.0])

with

boxes.append([0.0,0.0,0.1,0.1])

So i'm guessing it's because both xmin/ymin and xmax/ymax are equal.

Setup:
Torch : 1.5.0
Torchvision: 0.6.0
Nvidia - 440.33
Cuda-10.2

@fmassa
Copy link
Member

fmassa commented Apr 30, 2020

Yes, having degenerate boxes (i.e., boxes for which xmin <= xmax or ymin <= ymax) does yield NaN during training.

If you don't have any boxes in your image, you should do instead the following, as explained in the release notes https://github.com/pytorch/vision/releases/tag/v0.6.0

boxes = torch.empty((0, 4), dtype=torch.float32)

here is a more complete example

boxes = torch.zeros((0, 4), dtype=torch.float32)
negative_target = {"boxes": boxes,
"labels": torch.zeros(0, dtype=torch.int64),
"image_id": 4,
"area": (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]),
"iscrowd": torch.zeros((0,), dtype=torch.int64)}

@ruotaozhang
Copy link

Is there a particular reason to set image_id = 4? Or is this arbitrary?

@LogicNg
Copy link

LogicNg commented Jul 18, 2021

Yes, having degenerate boxes (i.e., boxes for which xmin <= xmax or ymin <= ymax) does yield NaN during training.

If you don't have any boxes in your image, you should do instead the following, as explained in the release notes https://github.com/pytorch/vision/releases/tag/v0.6.0

boxes = torch.empty((0, 4), dtype=torch.float32)

here is a more complete example

boxes = torch.zeros((0, 4), dtype=torch.float32)
negative_target = {"boxes": boxes,
"labels": torch.zeros(0, dtype=torch.int64),
"image_id": 4,
"area": (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0]),
"iscrowd": torch.zeros((0,), dtype=torch.int64)}

/opt/conda/lib/python3.7/site-packages/torchvision/models/detection/_utils.py in decode(self, rel_codes, boxes)
    174             box_sum += val
    175         pred_boxes = self.decode_single(
--> 176             rel_codes.reshape(box_sum, -1), concat_boxes
    177         )
    178         return pred_boxes.reshape(box_sum, -1, 4)

RuntimeError: cannot reshape tensor of 0 elements into shape [0, -1] because the unspecified dimension size -1 can be any value and is ambiguous

@fmassa But it throws this to me...

@fmassa
Copy link
Member

fmassa commented Aug 13, 2021

@ruotaozhang

Is there a particular reason to set image_id = 4? Or is this arbitrary?

It's arbitrary

@LogicNg This error should have been fixed in #3205 , could you try a newer version of torchvision?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants