-
Notifications
You must be signed in to change notification settings - Fork 0
Towards Privacy-Aware Sign Language Translation at Scale #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Previously I wrote this summary:
In Rust et al's 2024 work \cite{rustPrivacyAwareSignLanguage2024}, they propose a self-supervised method based on Masked Auto-encoding as well as a new Linguistic-Supervised Pretraining, that makes no assumptions about model architecture. They use this in conjunction with a Hierarchichal transformer, pretrained on a number of large-scale sign language datasets including Youtube-ASL\cite{uthusYouTubeASLLargeScaleOpenDomain2023}, How2Sign\cite{duarteHow2SignLargeScaleMultimodal2021}, and a new dataset they release known as DailyMoth-70h. Results on How2Sign were significantly increased from previous SOTA such as \cite{tarres_sign_2023} and \cite{uthusYouTubeASLLargeScaleOpenDomain2023} ]\cite{linGlossFreeEndtoEndSign2023}
|
citation: since it's not published yet, arxiv is the way to go.
|
Let's build our prompt!
|
Colin's Commentary: |
My second version of the summary, without any ChatGPT input |
Conversation with ChatGPT: https://chatgpt.com/share/48910d3d-458a-4602-9bd2-25ea559818c9. It provided some suggestions |
Fixing a few issues (I'm actually not sure they will release models) and synthesizing a bit, we get:
@rust2024PrivacyAwareSign introduce a two-stage privacy-aware method for sign language translation (SLT) at scale, termed Self-Supervised Video Pretraining for Sign Language Translation (SSVP-SLT).
The first stage involves self-supervised pretraining of a Hiera vision transformer on large unannotated video datasets [@ryali2023HieraVisionTransformer; @dataset:uthus2023YoutubeASL].
In the second stage, the vision model's outputs are fed into a multilingual language model (T5) for finetuning on the How2Sign dataset [@raffel2020T5Transformer; @dataset:duarte2020how2sign].
To mitigate privacy risks, the framework employs facial obfuscation.
Additionally, the authors release DailyMoth-70h, a new 70-hour ASL dataset from [The Daily Moth](https://www.dailymoth.com/).
SSVP-SLT achieves state-of-the-art performance on How2Sign [@dataset:duarte2020how2sign].
|
Merged! |
Very interesting paper which does pretrain-then-finetune, with all the benefits that provides. Less need for data/annotations in the target language/task essentially
dataset:
. Exclude wordy abstracts. (better BibTex extension to Zotero can exclude keys){}
in the bibtexWriting/style:
The text was updated successfully, but these errors were encountered: