A simple terminal-based application that transcribes your speech in real-time using OpenAI's Whisper model. No API key needed - runs completely locally!
- 🎤 Real-time speech transcription using Whisper
- 📝 Local processing - no internet required
- 🔊 Automatic speech detection
- 🌍 English language support
- 💻 Cross-platform (Windows, macOS, Linux)
- ⚡ GPU acceleration support (if CUDA available)
- 🎯 Multiple model sizes (tiny to large)
- Python 3.8 or higher
- A working microphone
- ~1GB RAM (for tiny model), more for larger models
- Navigate to the project directory:
cd simple\ stt- Install dependencies:
pip install -r requirements.txtThe first run will download the Whisper model (~150MB for tiny model).
Run the application:
python app.pyThe application will:
- Load the Whisper model (first run downloads it)
- Activate your microphone
- Display transcription as you speak
- Update every few seconds with processed audio chunks
- Show the complete transcript when you stop
Press Ctrl+C to stop transcribing.
Choose the model that works best for you:
| Model | Speed | Accuracy | VRAM |
|---|---|---|---|
| tiny | ⚡⚡⚡ | Good | 1GB |
| base | ⚡⚡ | Better | 1GB |
| small | ⚡ | Great | 2GB |
| medium | ~1x | Excellent | 5GB |
| large | 0.5x | Best | 10GB |
To change model size, edit line 145 in app.py:
model_size = "tiny" # Change to "base", "small", etc.- Audio Capture: Captures 16-bit PCM audio from your microphone at 16kHz
- Buffering: Accumulates audio in chunks
- Processing: Sends ~3-second chunks to Whisper for transcription
- Display: Shows transcribed text in real-time
On Windows:
pip install pipwin
pipwin install pyaudioOn macOS:
brew install portaudio
pip install pyaudioOn Linux (Ubuntu/Debian):
sudo apt-get install portaudio19-dev python3-dev
pip install pyaudioInstall PyTorch (includes CUDA support if available):
pip install torch torchaudioList available audio devices:
python -c "import pyaudio; p = pyaudio.PyAudio(); [print(f'[{i}] {p.get_device_info_by_index(i)[\"name\"]}') for i in range(p.get_device_count())]"- The first run is slower (model is downloaded and loaded)
- Use the "tiny" model for faster processing (or "base" for better quality)
- If you have CUDA/GPU, ensure PyTorch is installed with CUDA support
If you have NVIDIA GPU with CUDA:
pip install torch torchvision torcaudio --index-url https://download.pytorch.org/whl/cu118The app will automatically detect and use CUDA if available.
MIT - Uses OpenAI's Whisper model which is available under MIT license