Skip to content

Speechmatics adapter #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 13, 2019
Merged

Speechmatics adapter #94

merged 6 commits into from
Feb 13, 2019

Conversation

murezzda
Copy link
Contributor

@murezzda murezzda commented Feb 8, 2019

Is your Pull Request request related to another issue in this repository ?
This is related to #38.

Describe what the PR does
Adds a STT adapter for speechmatics.

State whether the PR is ready for review or whether it needs extra work
Tests are not implemented yet.

Additional context
Added sample transcript from speechmatics of the Demo TED Talk.

@murezzda
Copy link
Contributor Author

PR is ready for review.

Added the following functionality:

-Speechmatics transcripts can now be loaded.
-Speechmatics speakers are added to paragraphs.

@pietrop
Copy link
Contributor

pietrop commented Feb 13, 2019

Thanks @murezzda , I've a question in Speechmatics, speaker diarization is always provided or is it optional? (eg does it have extra cost?)

was looking at their API docs but it's unclear.

Copy link
Contributor

@pietrop pietrop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

@pietrop
Copy link
Contributor

pietrop commented Feb 13, 2019

Looks good, as agreed, you can leave grouping the text by speaker segments for another PR.

closes #38

@pietrop pietrop merged commit 80da1f6 into bbc:master Feb 13, 2019
@murezzda
Copy link
Contributor Author

Hi @pietrop

Thanks @murezzda , I've a question in Speechmatics, speaker diarization is always provided or is it optional? (eg does it have extra cost?)

was looking at their API docs but it's unclear.

Diarization is always provided by their cloud service by default and it does not cost extra. But it can be disabled by an argument. Maybe we should check for this in the next pr, good point.

@murezzda murezzda deleted the speechmatics-adapter branch February 14, 2019 08:17
@pietrop
Copy link
Contributor

pietrop commented Feb 14, 2019

Yes I think it might be good to make it optional. ( in separate PR)

Eg In BBC Kaldi adapter because speaker diarization is a separate optional attribute, what the adapter does is to check if the segments have been provided or not.
If they have it uses it. If not it has a fallback to do the parsing without.

I reckon this might be a good approach as could make the component adapter a bit more flexible. Eg someone wants to use it in a context where they already have run transcriptions through STT but didn’t think of getting the speaker diarization info etc..

@Shizen-no-ko
Copy link

Hello, I would like to ask about the speechmatics adapter. I believe the json that speechmatics is returning these days might well be different to what this adapter was written for? It is erroring straight away on tmpWords = curatePunctuation(speechmaticsJson.words); because speechmatics json does not return a "words" key. I am presuming the best way would be for me to pre-filter the json I am getting back with its array of "alternatives", and create a "word" key?

@Shizen-no-ko
Copy link

Hello,
I have created a function to revert speechmatics V2 SaaS Json into V1 Json. I have had to use the npm module rather than the repo, as many components/modules are deprecated. However, if I leave punctuation marks in my conversion then the speechmatics adapter errors-out on the regEx test. But if I change my function to remove all punctuation marks, then your adapter works. Would you have any idea why? I'm stuck.

const curatePunctuation = (words) => {
const curatedWords = [];
words.forEach((word) => {
if (/[.?!]/.test(word.name)) { <------- Here

Also, this index.js file is presumably created by the minified code in the npm module. Is there any way of accessing this, to make logs/edits?

Sorry for the newbie questions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants