Skip to content

PSA: Realtime audio frontend demo for macOS #327

@lunixbochs

Description

@lunixbochs

This is a working-out-of-the-box demo for realtime speech recognition on macOS with wav2letter++

This is based on my C API in #326
There's a src dir in the w2l_cli tarball with the frontend source (w2l_cli.cpp) and scripts/instructions for building this all from scratch.

to install:

wget https://talonvoice.com/research/w2l_cli.tar.gz
tar -xf w2l_cli.tar.gz && rm w2l_cli.tar.gz
cd w2l_cli
wget https://talonvoice.com/research/epoch186-ls3_14.tar.gz
tar -xf epoch186-ls3_14.tar.gz && rm epoch186-ls3_14.tar.gz

to run:
./bin/w2l emit epoch186-ls3_14/model.bin epoch186-ls3_14/tokens.txt

Then speak, and you should see emissions (letter predictions) in the terminal output after you speak, for example:

$ ./bin/w2l emit epoch186-ls3_14/model.bin epoch186-ls3_14/tokens.txt 
helow|world
this|is|a|test|of|wave|to|leter

Language model decoding is also wired up via ./bin/w2l decode am tokens lm lexicon, but as per #326 it segfaults right now when setting up the Trie.

There are more pretrained english acoustic models at https://talonvoice.com/research/ you can try as well.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions