-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Add crestereo dataset #6269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add crestereo dataset #6269
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @TeodorPoncu , I just gave it a very brief look, I'll do a more in-depth review later
) | ||
|
||
|
||
def read_pfm_file(file_path: str) -> np.array: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we use or modify our existing https://github.com/pytorch/vision/blob/main/torchvision/datasets/_optical_flow.py#L477:L477 for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, technically based on the header file information, .pfm files could store 1 or 3 channels for (H, W) information. During some initial tests I saw that most usually it was the case that they were storing just 1 channel of information, so the reshape in _optical_flow should be dynamic.
There's also the data slicing. Technically, if correct data is provided and the correct are files are linked to the dataset, slicing an extra channel data[:2, :, :]
would yield no issues. However, the default that exists in the test utils make_pfm_file implicitly creates .pfm files with 3 channels.
data = data.reshape(h, w, 3).transpose(2, 0, 1) # <--- move to something like data.reshape(h, w, c)
data = np.flip(data, axis=1) # flip on h dimension
data = data[:2, :, :]
return data.astype(np.float32)
I believe we could add an additional argument in the _read_pfm from stereo, something like slice channels such that: data=data[:slice_channels, :, :]
and set it with a default value of 2 such that we do not break backwards compatibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that we can modify the existing then, if so that would be preferable.
If you go down this path, please do the change in a separate PR. @NicolasHug would it make sense to move the util out of the optical_flow module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the details - I agree, let's just do the necessary changes and move it to dataset/utils.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed offline with @TeodorPoncu we decided to move the read_pfm_file
to datasets.utils
in a separate PR and add an extra argument in a BC way to handle the different scenarios. Would you agree with this approach @NicolasHug ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol didn't see your comment and basically suggested what you did :D
test/datasets_utils.py
Outdated
|
||
@test_all_configs | ||
@ test_all_configs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there's a few formatting issues like these ones. We try to avoid formatting changes in PRs like these, because they can add noise to git blame
As you can see there are linting issues: https://app.circleci.com/pipelines/github/pytorch/vision/18822/workflows/40d854f9-fd3b-43cb-9471-9eb90d4c93ac/jobs/1522929
We have instructions for applying formatters here: https://github.com/pytorch/vision/blob/main/CONTRIBUTING.md#formatting
…sion into add-crestereo-dataset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @TeodorPoncu , thanks for the PR, you add quite a lot of stereo dataset which is nice!
My TLDR for this review is:
- For download functionality, we may limit to only download what is needed and also check if the file exist before download
- For some of the file_path, we should avoid using
replace
- Some suggestion where we might be able to refactor a bit
test/test_datasets.py
Outdated
for split in ("tree", "shapenet", "reflective", "hole"): | ||
with self.create_dataset(split=split) as (dataset, _): | ||
for left, right, disparity, valid_mask in dataset: | ||
left_array = np.array(left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test logic inside the for loop seems very similar among all the stereo matching dataset, could we have a method for this so we minimize duplicate code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, most definitely. Would datasets_utils.py
in the test directory be a good place where I can add that kind of method?
disparities is a Tuple of (``np.ndarray``, ``np.ndarray``) with shape (1, H, W) | ||
valid_masks is a Tuple of (``np.ndarray``, ``np.ndarray``) with shape (H, W) | ||
|
||
In some cases, when a dataset does not provide disparties, the ``disparities`` and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT small typo disparties -> disparities
|
||
Args: | ||
root (str): Root directory of the dataset. | ||
split (str): The split of the dataset to use. One of ``"tree"``, ``"shapenet"``, ``"reflective"``, ``"hole"`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: From my experience before, usually the parameter split
options are train, test, val
or some other that indicate functionality of the dataset. In this case, it is more of a filter of the image type of the dataset, just wondering if we still should use the term split
or other parameter name.
Any opinion on this @NicolasHug ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's best to restrict split
to the usual (test, train, val,...)
Maybe this question will remove the problem: do we actually need to provide this parameter? Or can we just provide the entire datasets with all "categories"?
As a side note I can't find a reference to these in the paper or the repo, so I can't suggest a name ATM. Do You have thoughts @TeodorPoncu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Synthetic dataset used in training the `CREStereo <https://arxiv.org/pdf/2203.11483.pdf>`_ architecture."""
I think it should've been layed out in the first line of the docstring. I don't think we have to provide necessarily, as the authors use it just for training and nothing more. However there are some usecases in transfer learning / domain adaptation where I believe people would be interested in having finer control over the splits.
There's a "all" split, which basicly includes all 4 splits. I believe we could rename that to training if we don't ditch the granular split approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My take on all this is that unless we have a clear, specific, and immediate need for a feature in torchvision, we don't need to implement it now. I agree the granularity might be useful eventually, but if we don't need it ATM, it's best to leave that kind of problems for the future (should they ever pop-up).
So, in order to simplify this work further, I'd suggest to drop it altogether. At least for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no strong opinion whether or not to remove it. But if we dont remove it, I think we should give different name like categories
or something else, but not split.
raise FileNotFoundError("No images found in {}".format(root)) | ||
|
||
if split == "train": | ||
disparity_maps_left = sorted(glob(str(root / "disp_noc" / "*.png"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Just wondering if we can refactor by having something like get_disparity_map_files
in the parent class, and the child class only need to pass the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely think we can do something like that. For images as well. I'll check to see wether or not there can be any edge cases where that might not work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this commit should solve those concerns @YosuaMichael. It also tackles the .replace
cases.
if not os.path.exists(file_path): | ||
return None, None | ||
|
||
# disparity decoding as per Sintel instructions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you give some link / reference regarding this instructions?
|
||
disparity_maps_left = [file_path.replace(p, "disparity").replace(".png", ".pfm") for file_path in imgs_left] | ||
disparity_maps_right = [ | ||
file_path.replace(p, "disparity").replace(".png", ".pfm") for file_path in imgs_right |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should try to avoid replace
as much as possible for a path
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like you suggested above, I think we can wrap that up in the parent class somehow, where we would avoid replace all-together.
intrinsics = json.load(f) | ||
fx = intrinsics["camera_settings"][0]["intrinsic_settings"]["fx"] | ||
# inverse of depth-from-disparity equation | ||
disparity = (fx * 6.0 * 100) / depth.astype(np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT could we give name to these constant 6.0
and 100
? (by assigning a variable to them)
self._images = imgs | ||
|
||
disparity_maps_left = list(p.replace("left", "left_disp") for p in imgs_left) | ||
disparity_maps_right = list(p.replace("right", "right_disp") for p in imgs_left) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as before, we should avoid using replace
as much as possible for file_path
test/datasets_utils.py
Outdated
left_array = np.array(left) | ||
right_array = np.array(right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could rely on transforms.functional.get_dimensions()
to avoid converting into numpy arrays. It can handle tensors and PIL images.
test/datasets_utils.py
Outdated
assert len(disparity.shape) == 3 | ||
assert len(valid_mask.shape) == 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NIT or just FYI: this is equivalent to array.ndim
test/datasets_utils.py
Outdated
left_array = np.array(left) | ||
right_array = np.array(right) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here regarding get_dimensions()
torchvision/datasets/__init__.py
Outdated
StereoETH3D, | ||
StereoFallingThings, | ||
StereoKitti2012, | ||
StereoKitti2015, | ||
StereoMiddlebury2014, | ||
StereoSceneFlow, | ||
StereoSintel, | ||
CREStereo, | ||
InStereo2k, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is something we can bikeshed on at the very-end but I'm commenting so that we don't forget:
Historically we had the Kitti
dataset (for classification) and then I added KittiFlow
for optical flow. According to that (arbitrary) convention, the new datasets should probably be KittiStereo
, SintelStereo
, etc.
imgs, | ||
dsp_maps, | ||
valid_masks, | ||
) = self.transforms(imgs, dsp_maps, valid_masks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we certain we want the transforms to have this signature? In optical flow, we flatten all the input instead of passing tuples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flattening everything would result in having to write very verbose function headers for transforms (left_img, right_img, left_disp, right_disp, left_mask, right_mask)
.
I don't have any strong feelings towards this. The trade-off for tuples is that you have to do Tuple unpacking / manipulation inside the transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the current transform it will be kinda temporary (put in reference like optical_flow), and I have no strong opinion on this.
For the future transform API, I think it will be good to package left and right together (either stacked as tensor or maybe tuple). And the reason is because the new transforms accept *input
as shown: https://github.com/pytorch/vision/blob/main/torchvision/prototype/transforms/_transform.py#L22 and in this case we can't apply transform separately on left and right image.
img_right = self._read_img(self._images[index][1]) | ||
|
||
dsp_map_left, valid_mask_left = self._read_disparity(self._disparities[index][0]) | ||
dsp_map_right, valid_mask_right = self._read_disparity(self._disparities[index][1]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From our discussions in #6259 (comment) I was under the impression that we don't have a clear idea about the usefulness of the right disparity map and valid_mask. Does it really make sense to keep them and handling them here then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In CREStereo there's an augmentation being used which basically horizontally flips the the image / disparity pixels, and then flips the left and right channels between them.
So in order to reproduce that augmentation procedure, we'd need to have access to the left and right disparity maps as well in order to perform the switch.
I made mock implementation of that transform and if the right mask doesn't exist the augmentation just returns its inputs.
valid_masks, | ||
) = self.transforms(imgs, dsp_maps, valid_masks) | ||
|
||
return imgs[0], imgs[1], dsp_maps[0], valid_masks[0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In optical flow, we don't return the valid_mask
if the dataset doesn't have a built-in one (unless it gets generated by a transforms).
I'm not saying this is the best design by any means, but there is value in consistency across the library for such things. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it would just streamline transform definition / writing for the end user. Input / Output is isometric, they just have to handle how that input is being processed inside the transform.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this, I also prefer for the dataset to always output same number of output (always has valid_mask although it can be None). It feel less "unexpected" to me.
One design question I have for valid_mask is when the dataset dont provide any file on valid_mask, should we generate the valid_mask on _read_disparity_files
or should we just leave it None? (in this case we let the transform to define the valid_mask)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposal I've included in the PR is to basically make a mask full of ones when we don't get any information regarding that. Since the usage of the mask is to indicate on which pixels in the disparity map we should compute the loss a None
would be equivalent to evaluating the loss on the entire outputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One design question I have for valid_mask is when the dataset dont provide any file on valid_mask
I think we should let the transform define it, like we do for OF datasets.
I agree that we should try to let the dataset always return the same number of outputs. This is why we let flow
or disparity
be None
in "test" splits, where it's not available, as it's only available in "train".
However, in the case of valid_mask
which may not always exist, it doesn't seem expected to always return it if it is not defined by the dataset. Taken in isolation, why should a dataset return something it doesn't actually have? This was the reason we decided not to return the valid_flow_masks
in datasets that don't provide them (again, I'm not claiming this is a perfect solution).
As pointed out by @TeodorPoncu , it does make the writing of the transforms a tiny bit trickier. But this is why we provide the presets in our training recipes, which are able to handle all of this for the user, so this is mostly transparent and they don't need to write if/else-y
code.
I would still err on the side of consistency with the rest of our API for this, as this is one of the most important aspect of a frictionless user experience
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TeodorPoncu , these are all good points TBH.
Speaking from anecdotal experience most of the people that I know who use datasets from torchvision don't even look at the training reference scripts / know that they exist.
Ah, that's a shame. Perhaps this is because we didn't do a great job at documenting these in the past. Hopefully the new doc revamp will help raising awareness about them.
Dataset centric way or Task centric way
I agree with your analysis. For OF datasets we opted for the dataset-centric way. I don't think that there is a clear winner a-priori, but now that we already went this way for OF datasets, honouring consistency is critical.
To me personally, transforms imply modifying / altering data, not necessarily creating or extracting information and as a personal preference I'd avoid using transforms that transform data types / generate information.
I tend to agree as well. We still decided to generate the fake data in the transforms rather in the datasets, because generating it in the datasets has various drawbacks:
-
it bounds to user to one specific generation, which may not be the one they want. For example in optical flow we generate a fake
valid_mask
depending on a given threshold, but the default threshold may not suit all users:(we could make this a parameter of the dataset class, but it's not ideal either)
-
it generates a lot of useless data and thus uses memory, while some users may not actually need the
valid_mask
at all. So it's best to avoid generating it on their behalf.
Another argument that I have for consistent shape outputs from getitem is such that you can guarantee transform associativity as often as possible.
I agree this is important, and we can have that regardless of the design :) In the OF datasets, we don't return (or generate fake) values that don't exist, but we still require all the transforms to have the same signature: img1, img2, flow, valid_mask
:
- https://github.com/pytorch/vision/blob/add-crestereo-dataset/torchvision/datasets/_optical_flow.py#L65:L65
vision/torchvision/datasets/_optical_flow.py
Lines 242 to 245 in 1dd1753
transforms (callable, optional): A function/transform that takes in ``img1, img2, flow, valid_flow_mask`` and returns a transformed version. ``valid_flow_mask`` is expected for consistency with other datasets which return a built-in valid mask, such as :class:`~torchvision.datasets.KittiFlow`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only comment would be that it would be difficult to wrap multiple datasets together under some sort of wrapper if they don't have a unified return shape.
Multiset training is something that is being used in CREStereo. So if other users want to use other combinations, we would still force users to generate masks for compatibility reasons with other datasets / loss masking.
I agree that binding the user to the way he generates his data is not an ideal thing, however I think that when we force our users to make their own data we should provide them with out-of-the boxy idiomatic ways to do so as well.
Another thing that crosses my mind is that by having it in the dataset we provide a "standard" way of how the data should be interpreted which would allow better reproducibility among users?
Maybe something like target_transform
args in other datasets would be a good fit? In our case mask_generation_trasnform
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe all of these concerns are covered by the existing design we have for optical flow datasets. I provided references above but I'm happy to provide more details offline/in person if you wish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it's not necessary. I do agree that datasets should remain pure and we should not make any assumptions regarding masks. My final question would be w.r.t. to datasets (such as Kitti2015) for which some authors (i.e. Raft-STEREO) have some basic methods of creating a mask.
If paper authors / dataset authors provide a method for computing a valid mask method (i.e CREStereo) should we consider that a valid way of generating masks and add it into our dataset implementations or if there's no occlusion / valid map file should we just not return anything at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these mask generations seem to be relevant at the "application/task" level rather than at the dataset level, so I would suggest to provide the same logic, but in the transforms of our training references instead (i.e. the dataset would not return it)
We made a similar decision for OF models: in the original code from the author's repo, the mask was generated within the dataset class
but we decided to move that to the transforms to keep the datasets clear out of any task-related assumption:
vision/references/optical_flow/transforms.py
Lines 29 to 40 in 1dd1753
class MakeValidFlowMask(torch.nn.Module): | |
# This transform generates a valid_flow_mask if it doesn't exist. | |
# The flow is considered valid if ||flow||_inf < threshold | |
# This is a noop for Kitti and HD1K which already come with a built-in flow mask. | |
def __init__(self, threshold=1000): | |
super().__init__() | |
self.threshold = threshold | |
def forward(self, img1, img2, flow, valid_flow_mask): | |
if flow is not None and valid_flow_mask is None: | |
valid_flow_mask = (flow.abs() < self.threshold).all(axis=0) | |
return img1, img2, flow, valid_flow_mask |
warnings.warn( | ||
"\nSplit 'test' has only no calibration settings, ignoring calibration argument.", RuntimeWarning | ||
) | ||
else: | ||
if split != "test": | ||
calibration = "perfect" | ||
warnings.warn( | ||
f"\nSplit '{split}' has calibration settings, however None was provided as an argument." | ||
f"\nSetting calibration to 'perfect' for split '{split}'. Available calibration settings are: 'perfect', 'imperfect', 'both'.", | ||
RuntimeWarning, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For these kinds of user usage mistakes, we typically raise ValueError
instead of warning. It's more obvious to users, and we try to prevent them from doing things they might not understand.
|
||
Args: | ||
root (str): Root directory of the dataset. | ||
split (str): The split of the dataset to use. One of ``"tree"``, ``"shapenet"``, ``"reflective"``, ``"hole"`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it's best to restrict split
to the usual (test, train, val,...)
Maybe this question will remove the problem: do we actually need to provide this parameter? Or can we just provide the entire datasets with all "categories"?
As a side note I can't find a reference to these in the paper or the repo, so I can't suggest a name ATM. Do You have thoughts @TeodorPoncu ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @TeodorPoncu , I made another pass but I didn't review everything yet (e.g. the tests).
Considering how big this PR is, I wonder if it would make sense to split it into smaller ones to ease reviwing? E.g. perhaps submit a first PR with 1 or 2 datasets and their corresponding tests, that would be representative of the rest of the implementation. Then, reviewing the remaining datasets would be a lot easier.
Also, I just saw that we seem to already have some datasets for stereo: LFWPairs and PhotoTour. Have you looked into the design of these?
Also it looks like the docs are missing: we'll need to add the classes to docs/source/datasets.rst
so that we can see the rendered docs
Thank you!
self._disparities += disparities | ||
|
||
def _read_disparity(self, file_path: str) -> Tuple: | ||
disparity = np.array(Image.open(file_path), dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here an everywhere else where there is a similar use of np.array(pil_img)
: we should try to use np.asarray
so that the input is only copied when needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PhotoTour is a slightly different task, but I'll have a look into it. Same for LFWPairs.
At a glance it seems PhotoTour doesn't have to apply any transforms on the target / gt. LFWPairs has a separate transform for targets, which I believe is not necessarily the best way to go for the CREStereo augmentation pipeline requirements.
test/test_datasets.py
Outdated
pfm_path = os.path.join(scene_dir, "disp0GT.pfm") | ||
datasets_utils.make_fake_pfm_file(h=100, w=100, file_name=pfm_path) | ||
paths.append(pfm_path) | ||
return paths |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need these path or return anything
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a first skim through the tests
test/test_datasets.py
Outdated
num_examples = 2 if config["split"] == "train" else 3 | ||
|
||
split_name = "two_view_training" if config["split"] == "train" else "two_view_test" | ||
split_dir = os.path.join(eth3d_dir, split_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: you might consider using pathlib. see example here: https://github.com/pytorch/vision/blob/main/test/test_datasets.py#L119
Summary: * Broken down PR(#6269). Added an additional dataset * Removed some types. Store None instead of "". Merged test util functions. * minor mypy fixes. minor doc fixes * reformated docstring * Added additional line-skips Reviewed By: NicolasHug Differential Revision: D38351752 fbshipit-source-id: 376714fcdd49cb474670ce8e6e959507a517ee46
I close this PR since it is replaced by few other smaller PRs already |
Added a Stereo Matching Dataset interface similar to the Optical Flow one. I believe we will need to have a renaming of datasets based on the task, at least for 2.5D related, as we might have naming clashes if we plan on adding Depth Estimation as well. Since datasets such as
Kitti
orFallingThings
orSceneFlow
can be used for multiple of these tasks.I've also set the outputs of the disparity map and valid mask to (1, H, W), (H, W) to be aligned with the way the Flow datasets output a flow in shape (2, H, W) and a mask in (H, W) where possible.