-
Notifications
You must be signed in to change notification settings - Fork 165
Integration with Speechmatics #38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks @rkpujari , The idea is to make it easier to create adapters and converters to import and export data into the We are planning to write some more documentation and guides on how to do that. Should have something in the next couple of weeks. For now the adapter folder is in But, yes, will update this issue with more info soon. |
Hi,
Thanks for the update. The documentation will be very useful for me. I
added the adapter for Speechmatics already though I need to make some
enhancements to the editor and working on that. I need to show speaker
names (option to turn on/off), readjust timestamp, highlights words having
low "confidence", export the transcript to Word format, font size change,
add spell check (and suggestions for alternate words) for newly added text
in the transcript etc. Thanks for your help.
regards,
Rama
…On Fri, 14 Dec 2018 at 20:46, Pietro ***@***.***> wrote:
@rkpujari <https://github.com/rkpujari> we have a first draft of a PR #51
<#51> with some
instructions on how to create adapters for various STT services.
We are still tweaking it and reviewing it for clarity etc.. but feedback
welcome if this is something you are currently looking at.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ArYbVTBXaMOy7mo1IW8R0aGW-4Qj0kbwks5u48DGgaJpZM4Y7qzP>
.
|
If you want to do a Pull Request with the Speechmatics adapter we can add it to the main component. We are also working on
|
Good to know that you are working on those features (interested to know if
that can be expected very soon or may take time). regarding the pull
request for Speechmatics adapter, I am not an expert React developer. I
copied the BBC Kaldi adapter and made minor changes to it to get it
working. attached the index.js file (converted to text file to stop mail
scanning).
Thanks,
Rama
On Tue, 18 Dec 2018 at 15:01, Pietro ***@***.***> wrote:
If you want to do a Pull Request with the Speechmatics adapter we can add
it to the main component.
We are also working
- on having the possibility to turn speaker names and time stamps on
or off.
- readjust time stamps ( word timings after text has been edited)
- hilight words having low confidence ( below 6)
- export transcript ( for now as plain text)
....
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ArYbVewb6JXaoFBm1z_XkiMEJPrmswoXks5u6LYNgaJpZM4Y7qzP>
.
/**
* Convert BBC Kaldi json
```
{
"action": "audio-transcribe",
"retval": {
"status": true,
"wonid": "octo:2692ea33-d595-41d8-bfd5-aa7f2d2f89ee",
"punct": "There is a day. About ten years ago when ...",
"words": [
{
"start": 13.02,
"confidence": 0.68,
"end": 13.17,
"word": "there",
"punct": "There",
"index": 0
},
{
"start": 13.17,
"confidence": 0.61,
"end": 13.38,
"word": "is",
"punct": "is",
"index": 1
},
```
*
* into
*
```
const blocks = [
{
text: 'Hello',
type: 'paragraph',
data: {
speaker: 'Foo',
},
entityRanges: [],
},
{
text: 'World',
type: 'paragraph',
data: {
speaker: 'Bar',
},
entityRanges: [],
},
];
```
*
*/
import generateEntitiesRanges from '../generate-entities-ranges/index.js';
/**
* groups words list from kaldi transcript based on punctuation.
* @todo To be more accurate, should introduce an honorifics library to do the splitting of the words.
* @param {array} words - array of words opbjects from kaldi transcript
*/
const groupWordsInParagraphs = (words) => {
const results = [];
let paragraph = { words: [], text: [] };
words.forEach((word) => {
var endTime = parseInt(word.time) + parseInt(word.duration);
const tmpWord = {
text: word.name,
start: word.time,
end: endTime,
};
// if word contains punctuation
if (/[.?!]/.test(word.name)) {
paragraph.words.push(tmpWord);
paragraph.text.push(word.name);
results.push(paragraph);
// reset paragraph
paragraph = { words: [], text: [] };
} else {
paragraph.words.push(tmpWord);
paragraph.text.push(word.name);
}
});
return results;
};
const SpeechmaticsToDraft = (SpeechmaticsJson) => {
const results = [];
let tmpWords;
console.log("entered speechmatics adaptor");
tmpWords = SpeechmaticsJson.words;
const wordsByParagraphs = groupWordsInParagraphs(tmpWords);
wordsByParagraphs.forEach((paragraph) => {
const draftJsContentBlockParagraph = {
text: paragraph.text.join(' '),
type: 'paragraph',
data: {
speaker: 'TBC',
},
// the entities as ranges are each word in the space-joined text,
// so it needs to be compute for each the offset from the beginning of the paragraph and the length
entityRanges: generateEntitiesRanges(paragraph.words, 'text'), // wordAttributeName
};
// console.log(JSON.stringify(draftJsContentBlockParagraph,null,2))
results.push(draftJsContentBlockParagraph);
});
return results;
};
export default SpeechmaticsToDraft;
|
ok, that's great, thanks for sharing this, I think with your code example, and the docs from speechmatics it should be possible to add it as a separate adapter, following the guide in the PR #51 - Guide: How to Create an Adapter - Draft |
In terms of time estimate @rkpujari as an update we just got the hide/show time-codes and speaker label functionality working in PR #56 and we should be able to add that to master soon. re-aligning text I reckon end of January / February, to give a conservative estimate🤞 Altho this might not be needed for most use cases, where users are just correcting and exporting text (?) |
Thanks a lot for your help. Looking forward for the latest version.
regarding realigning the text, sometimes particular section or words of
transcript could be inaccurate due to poor audio quality segment (or due to
an issue with the speech conversion tool) and in that case if user manually
replaces the transcript with new text, then the user should be able to
associate the timestamp to the newly added text in the transcript (so that
it gets highlighted when audio is played and vice versa). I don't have such
audio and sample transcript for such audio and so I am not sure how
transcript will look like in that case and how editor works in that case. I
understand your comment that the use case may not be needed since user will
be correcting and exporting it. need to speak with my product owner to get
more details. I have seen the new UI for the editor. looks very good and
there is more space for the transcript now. Thanks once again.
regards,
Rama
…On Thu, 20 Dec 2018 at 19:54, Pietro ***@***.***> wrote:
In terms of time estimate @rkpujari <https://github.com/rkpujari> as an
update we just got the hide/show time-codes and speaker label functionality
working in PR #56 <#56>
and we should be able to add that to master soon.
re-aligning text I reckon end of January / February, to give a
conservative estimate🤞
*Altho this might not be needed for most use cases, where users are just
correcting and exporting text (?)*
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#38 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ArYbVcTFCsMegKqTMa3E5C39MfCNI3i8ks5u652TgaJpZM4Y7qzP>
.
|
Thanks @rkpujari , Yes, that is correct, if the user changes incorrect words, (especially if they completely delete a word or a paragraph) then the time-codes will no longer be associated with the text (and they can no longer click on that text to jump to the corresponding point in the media). And just to clarify, we are exploring ways to realign text, I've written some notes on some progress we have made on that front here #30 we should be able to try this out within the editor in the new year to be able to do some more comprehensive test, and see if it's a valid solution. |
I'm also very interested in this project and in the speechmatics adapter for this tool. What are your current plans for officially adding the support? I've already wrote a speechmatics adapter for getting to know the code. I would happily share and provide a pull request if you are interested. |
Thanks for reaching out @murezzda, yes a PR for the Speechmatics adapter would be great, thanks! You can see this guide for how to add a new adapter as well as the contributing section section). And feel free to reach out with any questions you might have. |
Addressed in PR #94 |
from version |
The editor looks great. Thanks for this. I am looking for Speechmatics integration in this. Any plans to add that one? if not, some documentation on how to integrate it into this editor will be helpful. Thanks in advance.
The text was updated successfully, but these errors were encountered: