This project is an open-source, modular system designed to continuously capture audio, transcribe it using a local speech-to-text engine, store the transcripts, and provide a web interface for browsing and future AI-driven analysis.
The system is 3 different components that work together:
graph TD
A[Microphone] --> LA[LiveKit Agent]
LA -- Audio Stream --> VB[VoxBox Transcription API]
VB -- Transcribed Text --> LA
LA -- Stores Transcript --> DB[SQLite Database]
FS[Flask Web Server] -- Reads Transcripts --> DB
FS -- Serves UI --> UserBrowser[User's Browser]
FS -- Summarization --> OLLAMA[Ollama]
subgraph Audio_Input_Processing
A
LA
VB
end
subgraph Data_Storage_Access
DB
FS
end
subgraph User_Interface
UserBrowser
end
subgraph AI_Analysis
OLLAMA
end
style LA fill:#f9f,stroke:#333,stroke-width:2px
style VB fill:#ccf,stroke:#333,stroke-width:2px
style DB fill:#cfc,stroke:#333,stroke-width:2px
style FS fill:#ffc,stroke:#333,stroke-width:2px
style OLLAMA fill:#fcc,stroke:#333,stroke-width:2px
- File:
agent.py
- Functionality:
- Captures live audio from the environment using a LiveKit Agent.
- Utilizes Silero VAD for voice activity detection.
- Streams audio to the local VoxBox service for transcription.
- Receives transcribed text from VoxBox.
- Saves finalized transcripts to the SQLite database.
- Manages session information in the database to keep transcripts separate.
- Directory:
vox-box/
- Functionality:
- A separate Python service (installed via
pip install vox-box
). - Hosts a local instance of the
Systran/faster-whisper-small
model (via Hugging Face). - Exposes an OpenAI-compatible API endpoint (typically
http://localhost:5002/v1
) that the LiveKit Agent sends audio to. - Returns transcribed text to the agent.
- Requires its own virtual environment due to dependency conflicts.
- A separate Python service (installed via
- Default File:
voice_transcripts.db
- Schema & ORM:
app/database/models.py
(SQLAlchemy models:User
,Session
,Transcript
) - Functionality:
- Stores all user information, recording sessions, and transcription results.
- Uses UUIDs as primary keys for all tables.
- Includes support for metadata such as confidence scores, timestamps, language, and custom JSON fields.
- Managed via SQLAlchemy ORM, with helper functions in
app/database/helpers.py
andapp/database/__init__.py
.
- File:
app/web_app.py
(run viarun_webapp.py
or directly) - Functionality:
- Provides a web interface (HTML templates in
app/templates/
) to:- Browse all saved recording sessions (
/
). - View detailed transcripts for a specific session (
/session/<session_id>
). - Copy/paste transcript text.
- Browse all saved recording sessions (
- Includes an analysis page (
/analyze/<session_id>
) that:- Retrieves the combined transcript text for a session.
- Checks if a summary already exists for the session in the
Session.summary
database field. - If a cached summary exists, it's displayed directly.
- If no cached summary is found, it uses a local Ollama instance (via the
openai
library) to generate a new summary. This new summary is then saved to theSession.summary
field in the database for future requests and displayed.
- Exposes an API endpoint (
/api/transcripts/<session_id>
) to fetch transcripts for a session in JSON format.
- Provides a web interface (HTML templates in
- Service: Uses a locally running Ollama instance.
- Connection: The Flask Web Server connects to Ollama using the
openai
Python client.- Default Ollama API Base URL:
http://localhost:11434/v1
(configurable viaOLLAMA_BASE_URL
). - API Key: Required by the
openai
client, can be any string for local Ollama (configurable viaOLLAMA_API_KEY
, defaults to "ollama").
- Default Ollama API Base URL:
- Functionality:
- Provides text summarization for transcripts on the
/analyze
page. - Caching: Generated summaries are cached in the
Session.summary
field in the database. If a summary for a session is requested, the cached version is used if available; otherwise, a new summary is generated by Ollama and then stored for subsequent requests. This reduces redundant LLM processing. - Default Model:
gemma3:4b
(configurable viaOLLAMA_MODEL
). - The system is designed to be extensible for other AI-driven insights in the future.
- Provides text summarization for transcripts on the
scripts/manage_db.py
:- A command-line utility for database operations:
create
: Initializes the database schema (usesapp/database/create_tables.py
).reset
: Drops all existing tables and recreates the schema.seed
: Populates the database with sample data for testing.
- A command-line utility for database operations:
app/database/create_tables.py
:- Contains the core logic to create database tables based on SQLAlchemy models.
- Environment Configuration:
- Uses a
.env
file (see.env.example
) for settings likeDATABASE_URL
,OLLAMA_BASE_URL
,OLLAMA_API_KEY
, andOLLAMA_MODEL
.
- Uses a
- Python 3.10+
- SQLite (comes with Python)
- Ollama installed and running (see Ollama website). Ensure the desired model (e.g.,
gemma3:4b
) is pulled:ollama pull gemma3:4b
. - Separate virtual environments for the main application and VoxBox are recommended.
# Clone the repository (if you haven't already)
# git clone <repository-url>
# cd <repository-directory>
# Create and activate a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install dependencies (includes 'openai' for Ollama)
pip install -r requirements.txt
# Copy and configure environment variables
cp .env.example .env
# Edit .env to set DATABASE_URL (default is sqlite:///voice_transcripts.db)
# and Ollama settings if needed:
# OLLAMA_BASE_URL=http://localhost:11434/v1
# OLLAMA_API_KEY=ollama
# OLLAMA_MODEL=gemma3:4b
# Create database tables
python scripts/manage_db.py create
# Optionally seed the database
python scripts/manage_db.py seed
cd vox-box
# Create and activate a separate virtual environment
python3 -m venv venv
source venv/bin/activate
# Install VoxBox
pip install -r requirements.txt # This installs the 'vox-box' package
cd .. # Return to project root
-
Start Ollama: Ensure your Ollama service is running. If you installed Ollama as a desktop application, it might start automatically. Otherwise, you may need to start it via the command line:
ollama serve
Ensure the model you intend to use (e.g.,
gemma3:4b
) is available:ollama list
. If not, pull it:ollama pull gemma3:4b
. Keep this service running. -
Start VoxBox: Open a new terminal, navigate to the
vox-box
directory, activate its virtual environment, and run:# In vox-box/ directory, with vox-box venv active vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 5002
Keep this service running.
-
Start the LiveKit Agent: In another terminal (at the project root, with the main application's venv active):
python agent.py console
-
Start the Flask Web Server: In yet another terminal (at the project root, with the main application's venv active):
python run_webapp.py
Or for development mode:
flask --app app.web_app run --debug --port 5000
Access the UI at
http://localhost:5000
.
agent.py
: Core logic for audio capture and initial transcript handling.vox-box/
: Contains setup for the local transcription server.app/database/
: SQLAlchemy models, database initialization, and helper functions.app/web_app.py
: Flask application, routes, and UI logic.app/templates/
: HTML templates for the web interface.scripts/
: Database management scripts.