-
Notifications
You must be signed in to change notification settings - Fork 13
Update BosphorusSign #49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
AmitMY
merged 4 commits into
sign-language-processing:master
from
cleong110:dataset/BosphorusSign_update
May 29, 2024
Merged
Changes from 2 commits
Commits
Show all changes
4 commits
Select commit
Hold shift + click to select a range
7446e7d
CDL: updating BosphorusSign details
cleong110 b2253a9
Merge branch 'master' into dataset/BosphorusSign_update
cleong110 19be425
Merge branch 'master' into dataset/BosphorusSign_update
cleong110 f24c113
CDL: filling out BosphorusSign items, samples, license, contact
cleong110 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://aclanthology.org/L16-1220.pdf
Where is this info from? this is the number for Bosphorus22K, no?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Table 3 of the 22k paper lists statistics for both datasets.

Edit: https://arxiv.org/pdf/2004.01283 is the link for 22k
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm frankly not sure where the original figure of 636 lexicon and 24,161 clips comes from, so I went with the info from the updated citation. Presumably if we went through the dataset access process now and specifically asked for BosphorusSign, not BosphorusSign22k, this is what we'd get?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the original BosphorusSign citation the number given is 855, not 636 or 595, we have:
formed by 10 signers, giving a wide variance to the data."
What I presume happened is that between the two papers they decided to trim down the "publicly available" data to 595 signs.
Edit: and of course 855 is listed in table 3 of the BosphorusSign22k paper as well, as the overall lexicon size rather than the publicly available subset.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It also seems that for whatever reason the "when completed... 10 signers" did not happen, as the newer citation lists only 6, and has this to say:
"this dataset" I interpreted to mean that BosphorusSign, meaning that both BosphorusSign and BosphorusSign22k have the same number of signers, namely 6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the question here is whether to go with overall stats, or stats for the "publicly available" subset I suppose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think overall stats are more "correct" to use. thanks for checking!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm then in that case I'm not sure what to put for "number of clips". Because Table 3 only has "-" for that. Looking through both papers here's the candidates:
I think I will just compromise and list it in the JSON with a little note?