Skip to content

Add CarlaStereo, Kitti2012Stereo, and Kitti2015Stereo datasets for Stereo(#6269) #6311

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Aug 2, 2022

Conversation

TeodorPoncu
Copy link
Contributor

@TeodorPoncu TeodorPoncu commented Jul 25, 2022

Broken down PR(#6269) as per @NicolasHug suggestion

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TeodorPoncu ,

I have a few comments but they're mostly minor or about conventions. The PR looks great in general, nice job!

Apart from the comments below, we'll need to add these classes to datasets.rst in order to look at the rendered docs.

imgs,
dsp_maps,
valid_masks,
) = self.transforms(imgs, dsp_maps, valid_masks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on our previous discussion #6269 (comment) : I'm still not sure this is the signature we want, but I can see how flattening all of it would be a bit verbose as well.

Perhaps the best way forward here is to keep it as-is for now, and re-evaluate when we look at the implementation of the transforms.

CC @datumbox perhaps you already have an opinion about this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually mostly wrapped up writing the transform pipeline yesterday, and I have mixed feelings about the signature now. Since tuples are immutable, sometimes you've got to do a lot of unpacking and repacking inside the transform, so I'd actually be more inclined to use the flattened signature, even though it comes at the cost of increased verbosity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug I'm missing context to provide an educated opinion. I understand that you are dealing with the legacy datasets and you have to "hack them together" to progress with the Depth Perception work, so in that sense I agree with your "keep as-is and re-evaluate" proposal.

I suspect that there are some incompatibilities with the current approach and the new Transforms API but that's not because you have multiple images/masks/maps but because left/right inputs need to be processed together rather than independently. I will probably need to understand more how this works on this task but it's an interesting use-case that we will need to support and might challenge some of the assumptions we made about the transforms and especially the fact the one that suggests that input can be transformed independently. cc @pmeier @vfdev-5

Copy link
Contributor Author

@TeodorPoncu TeodorPoncu Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To provide some context as to the specific use-case:

class RandomHorizontalFlip(torch.nn.Module):
    def __init__(self, p: float = 0.5) -> None:
        super().__init__()
        self.p = p

    def forward(
        self,
        images: Tuple[Tensor, Tensor],
        disparities: Tuple[Tensor, Tensor],
        masks: Tuple[Tensor, Tensor],
    ) -> Tuple[Tuple, Tuple, Tuple]:

        img_left, img_right = images
        dsp_left, dsp_right = disparities
        mask_left, mask_right = masks

        if dsp_right is not None and torch.rand(1) > self.p:
            # horizontal flip at pixel level
            img_left, img_right = F.hflip(img_left), F.hflip(img_right)
            dsp_left, dsp_right = F.hflip(dsp_left), F.hflip(dsp_right)
            # mask flipping flow is heavily skewed towards how we handle
            # generating pseudo-masks via the transform compose
            if mask_left is not None and mask_right is not None:
                mask_left, mask_right = F.hflip(mask_left), F.hflip(mask_right)
            # flip the two stereo channels
            return (
                (mask_right, mask_left),
                (dsp_right, dsp_left),
                (mask_right, mask_left)
            )

        return images, disparities, masks

The transform requires some inputs that for some datasets might not exist, namely the right disparity map. This is due to the fact, that we perform a horizontal pixel flip, the new target is not the transformed version of the original target but the opposite channel transformed target. Therefor the transform needs to have access to all the information.

Were it not for this transform, we could have just abided by the previous flow template.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the bigger picture view, here is some extra context:

class Compose(torch.nn.Module):
    def __init__(self, transforms):
        super().__init__()
        self.transforms = transforms

    def forward(self, images, disparities, masks):
        for t in self.transforms:
            images, disparities, masks = t(images, disparities, masks)
        return images, disparities, masks

Flattening everything would require making transforms that support the following forward scheme:

    def forward(self, img_left, img_right, dsp_left, dsp_right, mask_left, mask_right):
        for t in self.transforms:
            img_left, img_right, dsp_left, dsp_right, mask_left, mask_right = t(img_left, img_right, dsp_left, dsp_right, mask_left, mask_right)
        return img_left, img_right, dsp_left, dsp_right, mask_left, mask_right

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truthfully, a disparity map is actually just a flow map with a single channel / 0-flow on the y-axis. The same constraints that apply to the transforms in our optical flow reference scripts apply to these as well.

Bilinear resizing will still work as intended because both flows and disparities are measured in pixels and not some other real-world distance metric.

Judging by what several other paper repos provide in terms of augmentation pipelines (1 , 2) it seems that the same approach is being followed.

Typically in monocular depth-estimation the same problems are posed and although lambertian surfaces are an important issue, the assumption that color shifts should affect the depth-map are not taken into account. Both in the unsupervised and supervised setting we consider such alterations something that should not affect the true world model / estimation, but they are solely camera related / a regularization method.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case I agree that we can merge this PR to unblock and re-visit this considering the new transforms API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK sounds like the new API in principle supports both the multiple inputs (named or not) and special types like disparity maps. I don't see immediately an incompatibility but we should keep an eye for similar requests as this is a useful feedback for the new API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@datumbox @NicolasHug since, since this PR is non-blocking for the model training, I could try dog-fooding the new transforms API and include in this PR parts of the reference script in order for us to get a clearer idea how everything would mix in with the new API and have a better picture of how the dataset should be structured.

Is that ok with you guys?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We literally just merged a bunch of updates today at #6305. It might be a bit premature to dogfood it (we'll definitely take upon your offer a bit later though) but you can already have a look how it works on main branch at torchvision.prototype.transforms.

Copy link
Contributor

@jdsgomes jdsgomes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Regarding the open discussion I think we agree that this can be merged and revisited. I would just wait for @NicolasHug approval since he made most comments/suggestions.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the comments @TeodorPoncu , I made another pass below

Would you mind adding the datasets to datasets.rst so that we can check the rendered docs as well? Thanks!

Comment on lines +48 to +49
self._images = [] # type: ignore
self._disparities = [] # type: ignore
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to add type: ignore here? I'd suggest to type the rest, or not type anything at all (which is by far my prefered way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left the #type: ingore because mypy starts throwing errors about empty lists since it cannot infer their dtype.

torchvision/datasets/_stereo_matching.py:48: error: Need type annotation for "_images" (hint: "_images: List[<type>] = ...") [var-annotated] self._images = []

torchvision/datasets/_stereo_matching.py:49: error: Need type annotation for "_disparities" (hint: "_disparities: List[<type>] = ...") [var-annotated] self._disparities = []

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @TeodorPoncu , one last cosmetic comment regarding the dogs but other than that LGTM! I'll approve now so that you can merge when it's addressed. Nice work!

Comment on lines 257 to 260
"""Return example at given index.
Args:
index(int): The index of the example to retrieve
Returns:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not render properly in the docs: https://output.circle-artifacts.com/output/job/4dcd20b4-ad55-48a3-a0c9-474bfaae3aa3/artifacts/0/docs/generated/torchvision.datasets.Kitti2015Stereo.html#torchvision.datasets.Kitti2015Stereo.__getitem__

I think you just need to skip lines like so:

        """Return example at given index.

        Args:
            index(int): The index of the example to retrieve

        Returns:

Same for the other datasets :)

Comment on lines 176 to 177
"""Return example at given index.
Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The line skip should be between these two as well:

Suggested change
"""Return example at given index.
Args:
"""Return example at given index.
Args:

@TeodorPoncu TeodorPoncu merged commit 6bea4ef into main Aug 2, 2022
@github-actions
Copy link

github-actions bot commented Aug 2, 2022

Hey @TeodorPoncu!

You merged this PR, but no labels were added. The list of valid labels is available at https://github.com/pytorch/vision/blob/main/.github/process_commit.py

@NicolasHug NicolasHug changed the title Splitting Stereo Dataset PR(#6269) Add CarlaStereo, Kitti2012Stereo, and Kitti2015Stereo datasets for Stereo(#6269) Aug 2, 2022
facebook-github-bot pushed a commit that referenced this pull request Aug 2, 2022
Summary:
* Broken down PR(#6269). Added an additional dataset

* Removed some types. Store None instead of "". Merged test util functions.

* minor mypy fixes. minor doc fixes

* reformated docstring

* Added additional line-skips

Reviewed By: NicolasHug

Differential Revision: D38351752

fbshipit-source-id: 376714fcdd49cb474670ce8e6e959507a517ee46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants