A RAG-based chatbot with memory for the openBIS documentation, powered by LangGraph and Ollama.
This project provides an intelligent chatbot that can answer questions about openBIS using Retrieval Augmented Generation (RAG) with conversation memory. The chatbot remembers previous interactions within a session and provides contextually aware responses.
- RAG-powered responses: Uses openBIS documentation for accurate, up-to-date answers
- Conversation memory: Remembers user names, previous questions, and context using LangGraph
- Session management: Maintains separate conversations with unique session IDs
- Clean responses: Filters out internal reasoning for user-friendly output
- Multi-interface: Available as both CLI and web interface
- Persistent storage: Conversation history stored in SQLite database
- Scraper: Scrapes content from the openBIS documentation website
- Processor: Processes the scraped content for use in RAG
- Conversation Engine: LangGraph-based engine with memory and RAG integration
- Web Interface: Browser-based chat interface with session management
- CLI Interface: Command-line chat interface with memory
- Python 3.8 or higher
- Ollama with the following models:
nomic-embed-text
(for embeddings)qwen3
(for chat)
The project uses the following key dependencies:
- LangGraph: For conversation flow and memory management
- LangChain: For LLM integration and message handling
- Flask: For the web interface
- SQLite: For persistent conversation storage
- Ollama: For local LLM inference
git clone https://github.com/yourusername/openbis-chatbot.git
cd openbis-chatbot
pip install -e .
This installs the package in development mode, allowing you to make changes to the code and have them reflected immediately.
pip install openbis-chatbot
Note: This option will be available once the package is published to PyPI.
The simplest way to use the chatbot is with a single command:
python -m openbis_chatbot
This will:
- Check if processed data already exists in the
data/processed
directory - If it exists, start the chatbot with that data
- If not, automatically scrape the openBIS documentation, process it, and then start the chatbot
If you need more control, you can still run each component separately:
python -m openbis_chatbot scrape --url https://openbis.readthedocs.io/en/latest/ --output ./data/raw
python -m openbis_chatbot process --input ./data/raw --output ./data/processed
python -m openbis_chatbot query --data ./data/processed
The CLI now includes conversation memory features:
- Remembers your name and previous questions within a session
- Type
clear
to start a new conversation - Type
exit
orquit
to end the session - Use
--session-id <id>
to continue a previous conversation
python -m openbis_chatbot --web
This will start a web server on http://localhost:5000 where you can interact with the chatbot through a browser.
The web interface includes:
- Session persistence: Conversations continue across page refreshes
- Clear chat button: Start fresh conversations anytime
- Memory indicators: See conversation length and token usage in browser console
- Responsive design: Works on desktop and mobile devices
Alternatively, you can use the provided script:
python scripts/run_web.py
Or customize the web interface with additional parameters:
python -m openbis_chatbot.web.cli --data ./data/processed --host 127.0.0.1 --port 5000
--url URL The base URL of the ReadtheDocs site
--output OUTPUT The directory to save the scraped content to
--version VERSION The specific version to scrape (e.g., 'en/latest')
--delay DELAY The delay between requests in seconds (default: 0.5)
--max-pages MAX_PAGES The maximum number of pages to scrape
--verbose Enable verbose logging
--input INPUT The directory containing the scraped content
--output OUTPUT The directory to save the processed content to
--min-chunk-size MIN_CHUNK_SIZE
The minimum size of a chunk in characters (default: 100)
--max-chunk-size MAX_CHUNK_SIZE
The maximum size of a chunk in characters (default: 1000)
--chunk-overlap CHUNK_OVERLAP
The overlap between chunks in characters (default: 50)
--verbose Enable verbose logging
--data DATA The directory containing the processed content
--model MODEL The Ollama model to use for chat (default: qwen3)
--memory-db PATH Path to SQLite database for conversation memory
--session-id ID Session ID to continue a previous conversation
--verbose Enable verbose logging
--data DATA The directory containing the processed content (default: ./data/processed)
--host HOST The host to run the web interface on (default: 0.0.0.0)
--port PORT The port to run the web interface on (default: 5000)
--model MODEL The Ollama model to use for chat (default: qwen3)
--top-k TOP_K The number of chunks to retrieve (default: 5)
--debug Enable debug mode
The scraper works by:
- Starting from the base URL of the openBIS documentation site
- Downloading the HTML content of each page
- Extracting links to other pages on the same domain
- Following those links to scrape more pages
- Saving the content of each page to a text file
The processor works by:
- Reading the scraped content from text files
- Chunking the content into smaller pieces
- Generating embeddings for each chunk using Ollama's embedding model
- Saving the chunks and their embeddings to JSON and CSV files
The conversation engine works by:
- State Management: Maintains conversation state using LangGraph's StateGraph
- Memory Persistence: Stores conversation history in SQLite using LangGraph checkpoints
- RAG Integration: Retrieves relevant chunks based on user queries
- Context Assembly: Combines conversation history, RAG context, and current query
- Response Generation: Uses Ollama's chat model with full conversation context
- Response Cleaning: Removes internal reasoning tags for clean user output
- Session Management: Maintains separate conversations with unique session IDs
- Conversation History: Remembers both user messages and assistant responses
- Session Isolation: Different sessions don't share memory
- Token Management: Automatically limits conversation length (20 messages max)
- Persistent Storage: Conversations survive application restarts
- Context Awareness: Assistant remembers its own previous offers and responses
The web interface works by:
- Starting a Flask web server
- Serving a responsive HTML/CSS/JavaScript chat interface
- Handling API requests from the frontend
- Using the query engine to generate responses
- Returning the responses to the frontend in JSON format
chatBIS/
├── src/chatBIS/ # Main package
│ ├── scraper/ # Web scraping components
│ ├── processor/ # Content processing components
│ ├── query/ # Query and conversation engine
│ │ ├── conversation_engine.py # LangGraph-based conversation engine
│ │ ├── query.py # RAG query engine
│ │ └── cli.py # CLI interface with memory
│ ├── web/ # Web interface
│ └── utils/ # Utility functions
├── tests/ # Test suite
├── scripts/ # Utility scripts
├── data/ # Data directory
│ ├── raw/ # Scraped content
│ └── processed/ # Processed chunks and embeddings
├── docs/ # Documentation
│ └── presentations/ # Project presentations
└── requirements.txt # Python dependencies
- Average tokens per exchange: ~800-900 tokens
- Memory overhead: ~100-200 tokens for conversation history
- RAG context: ~500-600 tokens per query
- Conversation limit: 20 messages (10 exchanges) per session
- Storage: SQLite database for persistent conversation history
This project is licensed under the MIT License - see the LICENSE file for details.