Whisper Real-Time Speech-to-Text

A simple terminal-based application that transcribes your speech in real-time using OpenAI's Whisper model. No API key needed - runs completely locally!

Features

🎤 Real-time speech transcription using Whisper
📝 Local processing - no internet required
🔊 Automatic speech detection
🌍 English language support
💻 Cross-platform (Windows, macOS, Linux)
⚡ GPU acceleration support (if CUDA available)
🎯 Multiple model sizes (tiny to large)

Prerequisites

Python 3.8 or higher
A working microphone
~1GB RAM (for tiny model), more for larger models

Installation

Navigate to the project directory:

cd simple\ stt

Install dependencies:

pip install -r requirements.txt

The first run will download the Whisper model (~150MB for tiny model).

Usage

Run the application:

python app.py

The application will:

Load the Whisper model (first run downloads it)
Activate your microphone
Display transcription as you speak
Update every few seconds with processed audio chunks
Show the complete transcript when you stop

Press Ctrl+C to stop transcribing.

Model Sizes

Choose the model that works best for you:

Model	Speed	Accuracy	VRAM
tiny	⚡⚡⚡	Good	1GB
base	⚡⚡	Better	1GB
small	⚡	Great	2GB
medium	~1x	Excellent	5GB
large	0.5x	Best	10GB

To change model size, edit line 145 in app.py:

model_size = "tiny"  # Change to "base", "small", etc.

How It Works

Audio Capture: Captures 16-bit PCM audio from your microphone at 16kHz
Buffering: Accumulates audio in chunks
Processing: Sends ~3-second chunks to Whisper for transcription
Display: Shows transcribed text in real-time

Troubleshooting

"No module named pyaudio"

On Windows:

pip install pipwin
pipwin install pyaudio

On macOS:

brew install portaudio
pip install pyaudio

On Linux (Ubuntu/Debian):

sudo apt-get install portaudio19-dev python3-dev
pip install pyaudio

"No module named torch"

Install PyTorch (includes CUDA support if available):

pip install torch torchaudio

Microphone not detected

List available audio devices:

python -c "import pyaudio; p = pyaudio.PyAudio(); [print(f'[{i}] {p.get_device_info_by_index(i)[\"name\"]}') for i in range(p.get_device_count())]"

Slow transcription

The first run is slower (model is downloaded and loaded)
Use the "tiny" model for faster processing (or "base" for better quality)
If you have CUDA/GPU, ensure PyTorch is installed with CUDA support

GPU Acceleration

If you have NVIDIA GPU with CUDA:

pip install torch torchvision torcaudio --index-url https://download.pytorch.org/whl/cu118

The app will automatically detect and use CUDA if available.

License

MIT - Uses OpenAI's Whisper model which is available under MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper Real-Time Speech-to-Text

Features

Prerequisites

Installation

Usage

Model Sizes

How It Works

Troubleshooting

"No module named pyaudio"

"No module named torch"

Microphone not detected

Slow transcription

GPU Acceleration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper Real-Time Speech-to-Text

Features

Prerequisites

Installation

Usage

Model Sizes

How It Works

Troubleshooting

"No module named pyaudio"

"No module named torch"

Microphone not detected

Slow transcription

GPU Acceleration

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages