A Python application to fetch, index, and search through your Gmail newsletters using semantic search and natural language queries.
Once you fetched emails, the program can run entirely locally, on a laptop, without external connexions.
This application consists of two main flows:
-
Email Processing Pipeline:
- Authenticates with Gmail
- Fetches newsletters
- Extracts and cleans text content
- Generates embeddings using Ollama
- Stores in ChromaDB for semantic search
-
Search Interface:
- Natural language query understanding
- Semantic search in the email database
- Smart response generation
- Interactive CLI interface
- Python 3.10+
- Docker and Docker Compose
- Ollama
- Google Cloud Platform account
- Tested on MacOS only
-
Clone the repository
git clone https://github.com/gtaverne/newsletterz cd newsletterz
-
Install Python dependencies
pip install -r requirements.txt
-
Start ChromaDB
cd docker docker-compose up -d
-
Install Ollama and required models
# Install Ollama from https://ollama.ai # Pull required models ollama pull llama3 ollama pull qwen2.5-coder:32b ollama pull mxbai-embed-large
-
Set up Google Cloud Platform
- Create a new project in GCP Console
- Enable the Gmail API
- Create OAuth 2.0 credentials
- Download the credentials JSON file
- Place it in
secrets/credentials.json
-
Environment Variables
cp .env.template .env # Edit .env with your configuration
-
First-time setup: Fetch and index emails
python -m src.email.email_processor
This will:
- Authenticate with Gmail
- Fetch your newsletters
- Process and store them in ChromaDB
-
Search your emails
python -m src.interface.dialog_interface
Example queries:
- "What are the latest AI trends from McKinsey?"
- "Show me cloud computing articles from big tech companies"
- "Summarize what consulting firms say about digital transformation"
The application is structured into several key components:
src/email/
: Email fetching and processingsrc/search/
: Search and query processingsrc/interface/
: CLI interfacetests/
: Test suitesdocker/
: Docker configuration for ChromaDB
-
Running tests
pytest tests/
-
Code style
black . flake8
-
ChromaDB Connection
- Ensure Docker is running
- Check if ChromaDB container is healthy
- Default port is 8183
-
Gmail Authentication
- First-time auth requires browser access
- Token is stored locally for future use
- Check credentials.json path
-
Ollama
- Ensure Ollama service is running
- Check model downloads
- Default port is 11434
MIT