Skip to content

Update BosphorusSign #49

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions src/datasets/BosphorusSign.json
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,13 @@
"name": "BosphorusSign",
"year": 2016,
"publication": "dataset:camgoz-etal-2016-bosphorussign",
"url": "https://www.cmpe.boun.edu.tr/pilab/BosphorusSign/bosphorusSign_en.html"
"url": "https://ogulcanozdemir.github.io/bosphorussign22k/"
},
"features": [],
"features": ["video:RGBD", "pose:Kinectv2", "gloss"],
"notes": "Kinect v2",
"language": "Turkish",
"#items": 636,
"#samples": "24,161 Samples",
"#items": 595,
"#samples": "22,670 Samples",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BosphorusSign Turkish Sign Language corpus, which consists of 855 sign and p

https://aclanthology.org/L16-1220.pdf

Where is this info from? this is the number for Bosphorus22K, no?

Copy link
Contributor Author

@cleong110 cleong110 May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Table 3 of the 22k paper lists statistics for both datasets.
333638579-10e469cc-8511-41f0-99a6-5416479ba427

Edit: https://arxiv.org/pdf/2004.01283 is the link for 22k

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm frankly not sure where the original figure of 636 lexicon and 24,161 clips comes from, so I went with the info from the updated citation. Presumably if we went through the dataset access process now and specifically asked for BosphorusSign, not BosphorusSign22k, this is what we'd get?

Copy link
Contributor Author

@cleong110 cleong110 May 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original BosphorusSign citation the number given is 855, not 636 or 595, we have:

  • "The corpus contains 855 signs" in the conclusion section
  • Table 2 talks about modalities/features
  • Table 1 talks about other datasets
  • "We have collected 855 signs and phrase samples..." in the introduction section
  • "When completed, the corpus will have at least six repetitions of each sign per-
    formed by 10 signers, giving a wide variance to the data."

What I presume happened is that between the two papers they decided to trim down the "publicly available" data to 595 signs.

Edit: and of course 855 is listed in table 3 of the BosphorusSign22k paper as well, as the overall lexicon size rather than the publicly available subset.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also seems that for whatever reason the "when completed... 10 signers" did not happen, as the newer citation lists only 6, and has this to say:

Our dataset is based on the BosphorusSign (Cam-
goz et al., 2016c) corpus which was collected with the pur-
pose of helping both linguistic and computer science com-
munities. It contains isolated videos of Turkish Sign Lan-
guage glosses from three different domains: Health, finance
and commonly used everyday signs. Videos in this dataset
were performed by six native signers, as shown in Figure
1, which makes this dataset valuable for user independent
sign language studies.

"this dataset" I interpreted to mean that BosphorusSign, meaning that both BosphorusSign and BosphorusSign22k have the same number of signers, namely 6.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the question here is whether to go with overall stats, or stats for the "publicly available" subset I suppose.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think overall stats are more "correct" to use. thanks for checking!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm then in that case I'm not sure what to put for "number of clips". Because Table 3 only has "-" for that. Looking through both papers here's the candidates:

  • 1257, the figure directly above in the table, from HospiSign. That seems unlikely. This dataset has way more signs, signers, etc.
  • 22670, the figure directly below. But that's the reduced publicly available set.
  • 855 signs6 signers/sign4 repetitions/signer = 20520?

I think I will just compromise and list it in the JSON with a little note?

"#signers": 6,
"license": "Not Published",
"licenseUrl": null
Expand Down
Loading