-
Notifications
You must be signed in to change notification settings - Fork 7.1k
add HMDB51 and UCF101 datasets as well as prototype for new style video decoding #5335
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add HMDB51 and UCF101 datasets as well as prototype for new style video decoding #5335
Conversation
💊 CI failures summary and remediationsAs of commit 07d78b2 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
f"{url_root}/hmdb51_org.rar", | ||
sha256="9e714a0d8b76104d76e932764a7ca636f929fff66279cda3f2e326fa912a328e", | ||
) | ||
videos._preprocess = self._extract_videos_archive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug The archive is a rar of rars so using a single extract=True
won't cut it. We need the full extraction since reading from rar archives is rather slow and with this we get a significant performance increase.
Another option would be to use this "recursive extraction" by default when setting extract=True
.
@pmeier I see that you closed this; should I switch to the new PR? |
This probably happened because I deleted the branch that this was supposed to be merged into. I'll fix, keep working on this. |
For whatever reason, GitHub does not let me re-open this. I've send the PR again in #5422. |
@pmeier It's because the revamp branch is deleted. |
Yeah, I figured that, but it doesn't let me select a new merge target like |
This adds HMDB51 as first video dataset (see #4541). It is used for prototyping video decoder datapipes initially implemented by @bjuncek in #4838. You can test with the following snippet
Apart from that the biggest change is the added
meta
dictionary attribute to theEncodedData
feature. The rationale here is that each dataset might provide very different meta data for each file and this is hard to standardize. We might be able to have some common attributes likepath
, but I would still leave the option open for arbitrary meta data.