-
Notifications
You must be signed in to change notification settings - Fork 165
Adding speech to text adapter for Google cloud platform #167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
packages/stt-adapters/gcp/index.js
Outdated
* @param nanoSecond | ||
* @returns {number} | ||
*/ | ||
const computeTimeInSeconds = (startSecond, nanoSecond) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more of a question, I haven't seen the spec of the google STT schema, in a word like
"startTime": {
"seconds": "24",
"nanos": 600000000
},
I am assuming seconds
and nanos
need to be combined to get, in this case, the startTime
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea @pietrop , GCP is providing seconds in the following format , and if the second starts at exactly 24.00 then it return nothing for nanos attribute. The method is to handle this workflow and to compute the exact time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @sshniro
Good shout breaking using the text grouping(?) from the API, if punctuation is an optional param.
- I've run the tests and is all good✅
- I've also tried it locally importing the sample json ✅
Minor tweaks, I've also left some comments in the code
- It be good to rename
gcp
to something more consistent with the other adapters eggoogle-stt
google-cloud-stt
,google-cloud-platform
etc.. - in
gcpStt.sample.js
adding draftJs in the name makes it easier to spot at a glance that it's the draftJs data structure for the tests. eg something likegoogleSttToDraftJs.sample.js
Other that it's looking good!
Co-Authored-By: Pietro <[email protected]>
Co-Authored-By: Pietro <[email protected]>
Co-Authored-By: Pietro <[email protected]>
@pietrop I have added the changes requested in the following comment. |
Awesome, thanks @sshniro ! Out of curiosity, what's your use case for this component? |
Hi @pietrop :) I was inspired by this following paper, I have the problem making soo much filler words (Eg , so, and, ehh) during screen cast/ video tutorials. So I wanted to build a opensource editor for voice. Did a basic search but couldn't find an opensource equivalent. So decided to create one. And by doing a initial research I found out Google is pretty good at transcribing audio than the opensource counterparts. So the idea is to automatically transcribe the video content and let the user to crop/replace words in the editor. The removed text should be automatically removed from the audio content as well. Replace / Re-arrange a word is a bit easy. The paper talks about speech synthesis by using phoneme and to completely modify the words. If time permits I'm planning to attempt it and see. |
Very interesting, in a similar domain we are also working on a tool to edit audio/video interviews, at the moment is more around generating rough cuts, rather then removing filler words, but it sounds like there might be some overlap. https://github.com/bbc/digital-paper-edit-client you can see the demo here https://bbc.github.io/digital-paper-edit-client The idea is that
if that makes sense? |
Oh this is so cool ! Thanks for pointing me to this repository. I Will go through the issues and see if I can contribute to some features. As per the issues its more towards AWS I presume? |
It follows a modular architecture so there's a React client that in theory is not super opinionated about the backend. And the backend can be an API server or wrapped inside an electron app to package for mac, linux, and windows as desktop app. The |
Is your Pull Request request related to another issue in this repository ?
Fix for #152
Describe what the PR does
The PR converts google speech to text response to Draft Js format.
State whether the PR is ready for review or whether it needs extra work
Completed
Additional Context
Google's STT response is similar to IBM's response format, so this PR follows the similar pattern for formatting the text. For example the text is broken into smaller chunks, which mostly resembles a full sentence. There for the content is not broken by punctuation as punctuation is an additional attribute which should be specifically requested from the API.