-
Notifications
You must be signed in to change notification settings - Fork 0
SignBERT+ (and SignBERT) #14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Here's what the authors have to say about the difference between SignBERT and SignBERT+ |
OK what is the thing to know about these two papers? Self-supervised pretraining with SL-specific prior, basically. They're incorporating domain knowledge. tl;dr Self-supervised pose sequence pretraining specifically designed for SLP. Then you can use that pretrained encoder and finetune on downstream tasks like Isolated SLR, Continuous SLR, or SLT. They do try it out on all three, including Sign2Text. They attribute "S2T setting" to N. Cihan Camgoz, S. Hadfield, O. Koller, H. Ney, and R. Bowden, “Neural sign language translation,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 7784–7793. |
Inputs: 2D Pose sequences from MMPose, 133 keypoints. |
Datasets:
"During the pre-training stage, the utilized data includes the training data from all aforementioned sign datasets, along with other collected data from [84], [85]. In total, the pre-training data volume is 230,246 videos." [84] H. Hu, W. Zhou, J. Pu, and H. Li, “Global-local enhancement network for NMFs-aware sign language recognition,” ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), vol. 17, no. 3, pp. 1–18, 2021. [85] A. Duarte, S. Palaskar, L. Ventura, D. Ghadiyaram, K. DeHaan, F. Metze, J. Torres, and X. Giro-i Nieto, “How2sign: a large-scale multimodal dataset for continuous american sign language,” in IEEE/CVF International Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 2735–2744. S. Yuan, Q. Ye, G. Garcia-Hernando, and T.-K. Kim, “The 2017 hands in the million challenge on 3D hand pose estimation,” arXiv, pp. 1–7, 2017. |
Official Citation from IEEE, I am using
|
Official citation for NMFs-CSL dataset, but using our normal key style
|
Looking for the official citation for HANDS17:
Also, HANDS2019 is a thing.
|
Oh, and here's the official citation for SignBERT, taken from https://openaccess.thecvf.com/content/ICCV2021/html/Hu_SignBERT_Pre-Training_of_Hand-Model-Aware_Representation_for_Sign_Language_Recognition_ICCV_2021_paper.html
I will use |
Pretraining strategy is in section 3.2. They randomly pick some portion of the pose tokens, and do one of:
What's a "token"? |
Apparently they use MANO hand model in the decoder? "hand-model-aware decoder" they say, and cite https://dl.acm.org/doi/abs/10.1145/3130800.3130883 |
As far as I can tell they treat each pose in the sequence as a token. So if the pose estimation gives them 30 poses for 30 frames that is 30 tokens. |
OK, I think it's time. Let's build our initial summary and prompt ChatGPT for help. Here's my initial version, which I add to the "pose-to-text" section.
|
Building my prompt:
|
Resulting ChatGPT conversation: https://chatgpt.com/share/1cf76e17-b778-49c4-9887-d12770fa922a. Main gist of the suggestions is to
|
Metrics:
|
OK, rewriting/synthesizing... |
merged |
Given that SignBERT+ is a direct "sequel" of SignBERT, I think it could be good to do them as one PR.
https://ieeexplore.ieee.org/document/10109128 SignBERT+
https://openaccess.thecvf.com/content/ICCV2021/html/Hu_SignBERT_Pre-Training_of_Hand-Model-Aware_Representation_for_Sign_Language_Recognition_ICCV_2021_paper.html SignBERT
Checklist
dataset:
. Exclude wordy abstracts. (better BibTex extension to Zotero can exclude keys){}
in the bibtexPR:
git merge master
on branchWriting/style:
Additional:
The text was updated successfully, but these errors were encountered: