Adding speech to text adapter for Google cloud platform #167

sshniro · 2019-07-18T09:31:55Z

Is your Pull Request request related to another issue in this repository ?
Fix for #152

Describe what the PR does
The PR converts google speech to text response to Draft Js format.

State whether the PR is ready for review or whether it needs extra work
Completed

Additional Context
Google's STT response is similar to IBM's response format, so this PR follows the similar pattern for formatting the text. For example the text is broken into smaller chunks, which mostly resembles a full sentence. There for the content is not broken by punctuation as punctuation is an additional attribute which should be specifically requested from the API.

demo/select-stt-json-type.js

packages/stt-adapters/gcp/index.js

pietrop · 2019-07-19T13:44:06Z

packages/stt-adapters/gcp/index.js

+ * @param nanoSecond
+ * @returns {number}
+ */
+const computeTimeInSeconds = (startSecond, nanoSecond) => {


more of a question, I haven't seen the spec of the google STT schema, in a word like

"startTime": { "seconds": "24", "nanos": 600000000 },

I am assuming seconds and nanos need to be combined to get, in this case, the startTime?

Yea @pietrop , GCP is providing seconds in the following format , and if the second starts at exactly 24.00 then it return nothing for nanos attribute. The method is to handle this workflow and to compute the exact time.

pietrop

Thanks for the PR @sshniro
Good shout breaking using the text grouping(?) from the API, if punctuation is an optional param.

I've run the tests and is all good✅
I've also tried it locally importing the sample json ✅

Minor tweaks, I've also left some comments in the code

It be good to rename gcp to something more consistent with the other adapters eg google-stt google-cloud-stt, google-cloud-platform etc..
in gcpStt.sample.js adding draftJs in the name makes it easier to spot at a glance that it's the draftJs data structure for the tests. eg something like googleSttToDraftJs.sample.js

Other that it's looking good!

packages/stt-adapters/index.js

Co-Authored-By: Pietro <[email protected]>

…raftJs.sample.js

sshniro · 2019-07-19T17:17:25Z

@pietrop I have added the changes requested in the following comment.
#167 (review)

pietrop · 2019-07-19T17:20:45Z

Awesome, thanks @sshniro !

Out of curiosity, what's your use case for this component?

sshniro · 2019-07-19T17:36:28Z

Hi @pietrop :)

I was inspired by this following paper,
https://gfx.cs.princeton.edu/pubs/Jin_2017_VTI/Jin2017-VoCo-paper.pdf

I have the problem making soo much filler words (Eg , so, and, ehh) during screen cast/ video tutorials. So I wanted to build a opensource editor for voice. Did a basic search but couldn't find an opensource equivalent. So decided to create one. And by doing a initial research I found out Google is pretty good at transcribing audio than the opensource counterparts.

So the idea is to automatically transcribe the video content and let the user to crop/replace words in the editor. The removed text should be automatically removed from the audio content as well.

Replace / Re-arrange a word is a bit easy. The paper talks about speech synthesis by using phoneme and to completely modify the words. If time permits I'm planning to attempt it and see.

pietrop · 2019-07-19T17:51:27Z

Very interesting, in a similar domain we are also working on a tool to edit audio/video interviews, at the moment is more around generating rough cuts, rather then removing filler words, but it sounds like there might be some overlap.

https://github.com/bbc/digital-paper-edit-client

you can see the demo here https://bbc.github.io/digital-paper-edit-client

The idea is that

You could create an automatically generated transcript
Correct it if needed using @bbc/react-transcript-editor - transcript correction example in demo
you can then create a programme script, and highlight/annotate your material, and or use text selection to assemble a new programme/story/paper edit - programme script/paper-editing example in demo
and quickly review an audio/video version without needing to export
when done, you can export in an editing software to continue with your edit.

if that makes sense?

sshniro · 2019-07-19T18:01:29Z

Oh this is so cool ! Thanks for pointing me to this repository. I Will go through the issues and see if I can contribute to some features. As per the issues its more towards AWS I presume?

pietrop · 2019-07-19T18:28:30Z

It follows a modular architecture so there's a React client that in theory is not super opinionated about the backend. And the backend can be an API server or wrapped inside an electron app to package for mac, linux, and windows as desktop app. The README Project Architecture section does a better job at describing this. So yeah, larger project, and a variety of different kind of issues/tickets.

Adding speech to text adapter for Google cloud platform

1174c8b