Skip to content

Add BOBSL, ISL-HS, Sign-BD datasets to the datasets table and references.bi #28

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Mar 25, 2024

Conversation

cleong110
Copy link
Contributor

Closes #27 if accepted

"#signers": 37,
"license": "non-commercial authorized academics",
"licenseUrl": "https://www.bbc.co.uk/rd/projects/extol-dataset",
"contact": "Samuel Albanie albanie[AT]robots.ox.ac.uk"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i’d change the contact to a real email address

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, copy-pasted that from the project website, good suggestion

address = {Cham},
doi = {10.1007/978-3-031-19833-5_39},
abstract = {Recently, sign language researchers have turned to sign language interpreted TV broadcasts, comprising (i) a video of continuous signing and (ii) subtitles corresponding to the audio content, as a readily available and large-scale source of training data. One key challenge in the usability of such data is the lack of sign annotations. Previous work exploiting such weakly-aligned data only found sparse correspondences between keywords in the subtitle and individual signs. In this work, we propose a simple, scalable framework to vastly increase the density of automatic annotations. Our contributions are the following: (1)~we significantly improve previous annotation methods by making use of synonyms and subtitle-signing alignment; (2)~we show the value of pseudo-labelling from a sign recognition model as a way of sign spotting; (3)~we propose a novel approach for increasing our annotations of known and unknown classes based on in-domain exemplars; (4)~on the BOBSL BSL sign language corpus, we increase the number of confident automatic annotations from 670K to 5M. We make these annotations publicly available to support the sign language research community.},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think for making this file not huge, we decided to not include abstracts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. Next time I can add that as an "exclude" field from my BetterBibTex plugin on Zotero

"video:RGB",
"text:English"
],
"language": "British Sign Language (BSL)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be "British"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

"video:RGB",
"gloss:ISL-HandShapes"
],
"language": "Irish Sign Language (ISL)",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be "Irish"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

],
"language": "Irish Sign Language (ISL)",
"#items": 23,
"#samples": "468 videos available, 58,114 images extracted to show 23 handshapes",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While here I'd say that more details are better, please consider how this is displayed:
I think less information should be present, but if you still wanted to include the entire text, I'd say narrow it to
"468 videos → 58,114 images → 23 handshapes"
image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, needs to be concise for the table

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Document several datasets noticed during my reading: SignBD, Irish Sign Language, ArASL2018, BOBSL
2 participants