-
Notifications
You must be signed in to change notification settings - Fork 165
Stt adapter awstranscribe #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stt adapter awstranscribe #110
Conversation
Awesome, thanks @Gribbs , for now I'd say:
|
Hey @Gribbs - I'll take a skim through the implementation too, but having run it locally with a test video / transcript from AWS, it seems to be working really well, nice one! I don't seem to be able to download the resultant edited transcript, but I'm guessing that's either a bug with Firefox, or something else, not specific to this code change. cc @pietrop are you aware of anything? I'll do further digging if not and file a separate issue. |
Thanks @chrishutchinson , haven't had a chance to try it in the PR yet but I did a quick test in the demo app https://bbc.github.io/react-transcript-editor ( I am able to download draftJs and plain text of current demo in Chrome but not in firefox. Firefox console says
which is something to do with Which at the moment is implemented this way getEditorContent = (exportFormat) => {
return this.refs.timedTextEditor.getEditorContent(exportFormat);
} But could be refactored as a callback? and not user Do you get the same error? (I am aware the demo is a different version number compared to master, and compared to the PR) |
Thanks for this PR @Gribbs!
Test In "entityRanges": [{
"start": 13.03,
"end": 13.22,
"confidence": 1,
"text": "There",
"offset": 0,
"length": 5,
"key": "ico58i6"
}, { and because the key is randomly generated it cannot be deterministically tested by jest. "key": expect.any(String) As explained in the guide, you could do a find and replace in your code editor, if it supports regex for you should be able to see detailed Travis CI log of the test failing here Your tests should then pass. Console errors I get this error in the console when adding a AWS transcript
as a quick fix might be enough to do some type casting and add a <span
data-start={ data.start.toString() }
data-end={ data.end.toString() } Adding transcript first and then media - error
Altho to be fair, might not be specific of this branch - just tested on master and got the same. I think it might be connected with issue #87 and PR #88 and somehow most likely the bug might have been re-introduced since. But since it's effecting master as well, I wouldn't necessary aim to figure this out as part of this PR. can be a separate issue/PR. example usage import amazonTranscribeToDraft from'./index';
import amazonTranscribeTedTalkTranscript from './sample/autoEdit2TedTalkTranscript.sample.json';
console.log(amazonTranscribeToDraft(amazonTranscribeTedTalkTranscript)); eslint TL;DR:would you be able to:
The other errors, since they are across the master branch can look into it with @jamesdools |
Thanks @pietrop pietrop. Ive added the eslint back in to dev dependencies. I also was getting sporadic "Received NaN for the
compared to bbck-kaldi for example, which looks like:
and thought this might've been causing it originally even though I was casting to a number. I've had to do some awkward processing with punctuation start and end times since there is no start and end time provided by Transcribe. I've now added an addition .toFixed() method so my decimal places don't go on too long:
For now I'm not getting the error. For the testing, I've added the
and
and
But nothing seems to give me an indication what the problem is. If you can help with that it would be appreciated |
Thank for the changes @Gribbs , eslint I had missed that in the top part error message - noticed it in the travis error message
tests |
Alright sounds good! Thanks again for your help! |
I noticed a bug, seems like punctuation such as commas is added is as it's own word? It might be more straightforward to append it to the previous word as it doesn't come with start and end time informations. eg {
"alternatives": [
{
"confidence": null,
"content": ","
}
],
"type": "punctuation"
}, if double clicking the punctuation, eg
and just to confirm and I can see it get renders as with start and end time as <span data-start="NaN" data-end="NaN" data-confidence="high" data-prev-times="" data-entity-key="c0a6o9g" class="Word">
<span data-offset-key="64k16-22-0"><span data-text="true">,</span>
</span> What could be an easy fix for this @Gribbs ? |
I thought I’d tested that. Maybe my last change with toFixed() has caused it? I was accounting for this by using the previous word end-time as the the next punctuation word start time + a tiny amount of time. |
I’ll take another look. I’ve just left the house so will look at it again this evening |
ok no worries, with the change below, commas are also included into having start and end time const groupWordsInParagraphs = (words) => {
const results = [];
let paragraph = {
words: [],
text: []
};
words.forEach((word, index) => {
// if word type is punctuation
const content = word.alternatives[0].content;
let previousWord = {};
if (word.type === 'punctuation' && /[.?!]/.test(content)) {
previousWord = words[index - 1]; //assuming here the very first word is never punctuation
paragraph.words.push(normalizedWord(word, previousWord));
paragraph.text.push(content);
results.push(paragraph);
// reset paragraph
paragraph = {
words: [],
text: []
};
} else if (word.type === 'punctuation' && /[,?!]/.test(content)) {
previousWord = words[index - 1]; //assuming here the very first word is never punctuation
paragraph.words.push(normalizedWord(word, previousWord));
paragraph.text.push(content);
} else {
paragraph.words.push(normalizedWord(word, previousWord));
paragraph.text.push(content);
}
});
return results;
}; altho it looks a little off, coz treating them like words there are then extra spaces before commas and full stops so there might be a better way to do it, and append punctuation to the previous word. |
I am going to have closer look next week |
I think your right. The spaces around the punctuation have been annoying me too. I’ll work on appending it to the previous word. That should simplify the code a bit too |
Hi @pietrop I've completed the work to append punctuation items to the previous word which seems to work well. I've pushed the changes to this branch. I've also added some additional tests in index.test.js file so it should pass ok. I'm skipping the original problematic test for now with describe.skip until I can work out what the issue with that test was. |
oh! I see this pull request has now been closed. Would you like me to put it on separate pull request? |
Hi @Gribbs, yes, was reviewing it in branch |
Is your Pull Request request related to another issue in this repository ?
Yes. Original issue #108
Describe what the PR does
The PR add an Adapter to support Amazon Transcribe https://aws.amazon.com/transcribe/
State whether the PR is ready for review or whether it needs extra work
Ready to review for merging.
Additional context
The PR doesn't cater for Speakers at this point (although the Transcribe service does support Speakers).
I haven't done an example-usage.js file in there or or .test.js but I do have an example Transcribe json file in there.
Steps to test the changes:
yarn start or npm start. I have added the option 'Amazon Transcribe' in src/select-stt-json-type.js