Skip to content

add HMDB51 and UCF101 datasets as well as prototype for new style video decoding #5335

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pmeier
Copy link
Collaborator

@pmeier pmeier commented Feb 2, 2022

This adds HMDB51 as first video dataset (see #4541). It is used for prototyping video decoder datapipes initially implemented by @bjuncek in #4838. You can test with the following snippet

from torchvision.prototype import datasets

dataset = datasets.load("hmdb51")
dataset = datasets.utils.KeyframeDecoder(dataset)

for sample in dataset:
    print(sample)
    break

Apart from that the biggest change is the added meta dictionary attribute to the EncodedData feature. The rationale here is that each dataset might provide very different meta data for each file and this is hard to standardize. We might be able to have some common attributes like path, but I would still leave the option open for arbitrary meta data.

@facebook-github-bot
Copy link

facebook-github-bot commented Feb 2, 2022

💊 CI failures summary and remediations

As of commit 07d78b2 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build unittest_prototype (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun)

�[31m============================== �[31m�[1m5 ...0.26s�[0m�[31m ===============================�[0m
    raise TypeError(f"{cls} is not a generic class")
TypeError: <class 'torch.utils.data.datapipes.iter.grouping.ShardingFilterIterDataPipe'> is not a generic class
------ generated xml file: /home/circleci/project/test-results/junit.xml -------
=========================== short test summary info ============================
ERROR test/test_prototype_builtin_datasets.py - TypeError: <class 'torch.util...
ERROR test/test_prototype_datasets_api.py - TypeError: <class 'torch.utils.da...
ERROR test/test_prototype_datasets_utils.py - TypeError: <class 'torch.utils....
ERROR test/test_prototype_models.py - TypeError: <class 'torch.utils.data.dat...
ERROR test/test_prototype_transforms_functional.py - TypeError: <class 'torch...
!!!!!!!!!!!!!!!!!!! Interrupted: 5 errors during collection !!!!!!!!!!!!!!!!!!!!
�[31m============================== �[31m�[1m5 errors�[0m�[31m in 0.26s�[0m�[31m ===============================�[0m


Exited with code exit status 2


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 2, 2022

@bjuncek I've also ported UCF101 from #4848 so you can close it and take over this PR.

f"{url_root}/hmdb51_org.rar",
sha256="9e714a0d8b76104d76e932764a7ca636f929fff66279cda3f2e326fa912a328e",
)
videos._preprocess = self._extract_videos_archive
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug The archive is a rar of rars so using a single extract=True won't cut it. We need the full extraction since reading from rar archives is rather slow and with this we get a significant performance increase.

Another option would be to use this "recursive extraction" by default when setting extract=True.

This was linked to issues Feb 3, 2022
@pmeier pmeier changed the title add HMDB51 dataset and prototype for new style video decoding add HMDB51 and UCF101 datasets as well as prototype for new style video decoding Feb 7, 2022
@pmeier pmeier deleted the branch pytorch:revamp-prototype-features-transforms February 11, 2022 14:51
@pmeier pmeier closed this Feb 11, 2022
@bjuncek
Copy link
Contributor

bjuncek commented Feb 11, 2022

@pmeier I see that you closed this; should I switch to the new PR?

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 11, 2022

This probably happened because I deleted the branch that this was supposed to be merged into. I'll fix, keep working on this.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 15, 2022

For whatever reason, GitHub does not let me re-open this. I've send the PR again in #5422.

@datumbox
Copy link
Contributor

@pmeier It's because the revamp branch is deleted.

@pmeier
Copy link
Collaborator Author

pmeier commented Feb 15, 2022

Yeah, I figured that, but it doesn't let me select a new merge target like main either. This is possible for open PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UCF101 HMDB51
4 participants