PSA: Realtime audio frontend demo for macOS

This is a working-out-of-the-box demo for realtime speech recognition on macOS with wav2letter++

This is based on my C API in https://github.com/facebookresearch/wav2letter/issues/326
There's a `src` dir in the w2l_cli tarball with the frontend source (`w2l_cli.cpp`) and scripts/instructions for building this all from scratch.

to install:
```
wget https://talonvoice.com/research/w2l_cli.tar.gz
tar -xf w2l_cli.tar.gz && rm w2l_cli.tar.gz
cd w2l_cli
wget https://talonvoice.com/research/epoch186-ls3_14.tar.gz
tar -xf epoch186-ls3_14.tar.gz && rm epoch186-ls3_14.tar.gz
```

to run:
```./bin/w2l emit epoch186-ls3_14/model.bin epoch186-ls3_14/tokens.txt```

Then speak, and you should see emissions (letter predictions) in the terminal output after you speak, for example:

```
$ ./bin/w2l emit epoch186-ls3_14/model.bin epoch186-ls3_14/tokens.txt 
helow|world
this|is|a|test|of|wave|to|leter
```

Language model decoding is also wired up via `./bin/w2l decode am tokens lm lexicon`, but as per #326 it segfaults right now when setting up the Trie.

There are more pretrained english acoustic models at https://talonvoice.com/research/ you can try as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PSA: Realtime audio frontend demo for macOS #327

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PSA: Realtime audio frontend demo for macOS #327

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions