Skip to content

Parswanadh/simple-stt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Whisper Real-Time Speech-to-Text

A simple terminal-based application that transcribes your speech in real-time using OpenAI's Whisper model. No API key needed - runs completely locally!

Features

  • 🎤 Real-time speech transcription using Whisper
  • 📝 Local processing - no internet required
  • 🔊 Automatic speech detection
  • 🌍 English language support
  • 💻 Cross-platform (Windows, macOS, Linux)
  • ⚡ GPU acceleration support (if CUDA available)
  • 🎯 Multiple model sizes (tiny to large)

Prerequisites

  • Python 3.8 or higher
  • A working microphone
  • ~1GB RAM (for tiny model), more for larger models

Installation

  1. Navigate to the project directory:
cd simple\ stt
  1. Install dependencies:
pip install -r requirements.txt

The first run will download the Whisper model (~150MB for tiny model).

Usage

Run the application:

python app.py

The application will:

  1. Load the Whisper model (first run downloads it)
  2. Activate your microphone
  3. Display transcription as you speak
  4. Update every few seconds with processed audio chunks
  5. Show the complete transcript when you stop

Press Ctrl+C to stop transcribing.

Model Sizes

Choose the model that works best for you:

Model Speed Accuracy VRAM
tiny ⚡⚡⚡ Good 1GB
base ⚡⚡ Better 1GB
small Great 2GB
medium ~1x Excellent 5GB
large 0.5x Best 10GB

To change model size, edit line 145 in app.py:

model_size = "tiny"  # Change to "base", "small", etc.

How It Works

  1. Audio Capture: Captures 16-bit PCM audio from your microphone at 16kHz
  2. Buffering: Accumulates audio in chunks
  3. Processing: Sends ~3-second chunks to Whisper for transcription
  4. Display: Shows transcribed text in real-time

Troubleshooting

"No module named pyaudio"

On Windows:

pip install pipwin
pipwin install pyaudio

On macOS:

brew install portaudio
pip install pyaudio

On Linux (Ubuntu/Debian):

sudo apt-get install portaudio19-dev python3-dev
pip install pyaudio

"No module named torch"

Install PyTorch (includes CUDA support if available):

pip install torch torchaudio

Microphone not detected

List available audio devices:

python -c "import pyaudio; p = pyaudio.PyAudio(); [print(f'[{i}] {p.get_device_info_by_index(i)[\"name\"]}') for i in range(p.get_device_count())]"

Slow transcription

  • The first run is slower (model is downloaded and loaded)
  • Use the "tiny" model for faster processing (or "base" for better quality)
  • If you have CUDA/GPU, ensure PyTorch is installed with CUDA support

GPU Acceleration

If you have NVIDIA GPU with CUDA:

pip install torch torchvision torcaudio --index-url https://download.pytorch.org/whl/cu118

The app will automatically detect and use CUDA if available.

License

MIT - Uses OpenAI's Whisper model which is available under MIT license

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages