A secure, vector-based memory server for Claude Desktop using sqlite-vec
and sentence-transformers
. This MCP server provides persistent semantic memory capabilities that enhance AI coding assistants by remembering and retrieving relevant coding experiences, solutions, and knowledge.
- π Semantic Search: Vector-based similarity search using 384-dimensional embeddings
- πΎ Persistent Storage: SQLite database with vector indexing via
sqlite-vec
- π·οΈ Smart Organization: Categories and tags for better memory organization
- π Security First: Input validation, path sanitization, and resource limits
- β‘ High Performance: Fast embedding generation with
sentence-transformers
- π§Ή Auto-Cleanup: Intelligent memory management and cleanup tools
- π Rich Statistics: Comprehensive memory database analytics
- π§ͺ Well Tested: 95%+ test coverage with comprehensive test suite
Component | Technology | Purpose |
---|---|---|
Vector DB | sqlite-vec | Vector storage and similarity search |
Embeddings | sentence-transformers/all-MiniLM-L6-v2 | 384D text embeddings |
MCP Framework | FastMCP | High-level tools-only server |
Dependencies | uv script headers | Self-contained deployment |
Security | Custom validation | Path/input sanitization |
Testing | pytest + coverage | Comprehensive test suite |
vector-memory-mcp/
βββ main.py # π Main MCP server entry point
βββ run_tests.py # π§ͺ Primary test runner
βββ README.md # π This documentation
β
βββ src/ # π¦ Core package modules
β βββ __init__.py # Package initialization & exports
β βββ models.py # Data models & configuration
β βββ security.py # Security validation & sanitization
β βββ embeddings.py # Sentence-transformers wrapper
β βββ memory_store.py # SQLite-vec operations
β
βββ .gitignore # Git exclusions
This project is organized for clarity and ease of use:
main.py
- Start here! Main server entry pointsrc/
- Core implementation (security, embeddings, memory store)
New here? Start with main.py
and examples/claude_desktop_config.example.json
- Python 3.8+
- uv package manager
- Claude Desktop app
-
Download the server:
# Clone or download the project git clone <repository-url> cd vector-memory-mcp # Make main script executable (Unix/macOS) chmod +x main.py
-
Test the server:
# Run comprehensive tests python run_tests.py # Test with sample working directory python main.py --working-dir ./test-memory
-
Configure Claude Desktop:
Open Claude Desktop Settings β Developer β Edit Config, and add:
{ "mcpServers": { "vector-memory": { "command": "python", "args": ["/absolute/path/to/main.py", "--working-dir", "/your/project/path"] } } }
-
Restart Claude Desktop and look for the MCP integration icon.
Store coding experiences, solutions, and insights:
Please store this memory:
Content: "Fixed React useEffect infinite loop by adding dependency array with [userId, apiKey]. The issue was that the effect was recreating the API call function on every render."
Category: bug-fix
Tags: ["react", "useEffect", "infinite-loop", "hooks"]
Find relevant memories using natural language:
Search for: "React hook dependency issues"
See what you've stored recently:
Show me my 10 most recent memories
View memory database statistics:
Show memory database statistics
Clean up old, unused memories:
Clear memories older than 30 days, keep max 1000 total
Category | Use Cases |
---|---|
code-solution |
Working code snippets, implementations |
bug-fix |
Bug fixes and debugging approaches |
architecture |
System design decisions and patterns |
learning |
New concepts, tutorials, insights |
tool-usage |
Tool configurations, CLI commands |
debugging |
Debugging techniques and discoveries |
performance |
Optimization strategies and results |
security |
Security considerations and fixes |
other |
Everything else |
The server supports these configuration options:
# Example usage with custom settings
python server.py --working-dir /path/to/project
your-project/
βββ memory/
β βββ vector_memory.db # SQLite database with vectors
βββ src/ # Your project files
βββ other-files...
- Max memory content: 10,000 characters
- Max total memories: 10,000 entries
- Max search results: 50 per query
- Max tags per memory: 10 tags
- Path validation: Blocks suspicious characters
# Store a useful code pattern
"Implemented JWT refresh token logic using axios interceptors"
# Store a debugging discovery
"Memory leak in React was caused by missing cleanup in useEffect"
# Store architecture decisions
"Chose Redux Toolkit over Context API for complex state management because..."
# Store team conventions
"Team coding style: always use async/await instead of .then() chains"
# Store deployment procedures
"Production deployment requires running migration scripts before code deploy"
# Store infrastructure knowledge
"AWS RDS connection pooling settings for high-traffic applications"
# Store learning insights
"Understanding JavaScript closures: inner functions have access to outer scope"
# Store performance discoveries
"Using React.memo reduced re-renders by 60% in the dashboard component"
# Store security learnings
"OWASP Top 10: Always sanitize user input to prevent XSS attacks"
The server uses sentence-transformers to convert your memories into 384-dimensional vectors that capture semantic meaning:
Query | Finds Memories About |
---|---|
"authentication patterns" | JWT, OAuth, login systems, session management |
"database performance" | SQL optimization, indexing, query tuning, caching |
"React state management" | useState, Redux, Context API, state patterns |
"API error handling" | HTTP status codes, retry logic, error responses |
- 0.9+ similarity: Extremely relevant, almost exact matches
- 0.8-0.9: Highly relevant, strong semantic similarity
- 0.7-0.8: Moderately relevant, good contextual match
- 0.6-0.7: Somewhat relevant, might be useful
- <0.6: Low relevance, probably not helpful
The get_memory_stats
tool provides comprehensive insights:
{
"total_memories": 247,
"memory_limit": 10000,
"usage_percentage": 2.5,
"categories": {
"code-solution": 89,
"bug-fix": 67,
"learning": 45,
"architecture": 23,
"debugging": 18,
"other": 5
},
"recent_week_count": 12,
"database_size_mb": 15.7,
"health_status": "Healthy"
}
- Sanitizes all user input to prevent injection attacks
- Removes control characters and null bytes
- Enforces length limits on all content
- Validates and normalizes all file paths
- Prevents directory traversal attacks
- Blocks suspicious character patterns
- Limits total memory count and individual memory size
- Prevents database bloat and memory exhaustion
- Implements cleanup mechanisms for old data
- Uses parameterized queries exclusively
- No dynamic SQL construction from user input
- SQLite WAL mode for safe concurrent access
# Check if uv is installed
uv --version
# Check dependencies
python server.py --working-dir ./test
# Check permissions
chmod +x server.py
- Verify absolute paths in configuration
- Check Claude Desktop logs:
~/Library/Logs/Claude/
- Restart Claude Desktop after config changes
- Test server manually before configuring Claude
- Verify sentence-transformers model downloaded successfully
- Check database file permissions in memory/ directory
- Try broader search terms
- Review memory content for relevance
- Run
get_memory_stats
to check database health - Use
clear_old_memories
to clean up old entries - Consider increasing hardware resources for embedding generation
Run the server manually to see detailed logs:
python server.py --working-dir ./debug-test
Store multiple related memories efficiently:
# Store a complete debugging session
memories = [
"Bug: React component not re-rendering when props change",
"Solution: Props were objects, needed deep comparison or React.memo",
"Root cause: JavaScript object reference equality vs value equality",
"Prevention: Use primitive props or implement proper memoization"
]
# Store each with appropriate tags
for memory in memories:
store_memory(memory, "debugging", ["react", "performance", "props"])
Use tags to organize by project:
["project-alpha", "frontend", "react"]
["project-beta", "backend", "node"]
["project-gamma", "devops", "docker"]
["javascript", "react", "hooks"]
["python", "django", "orm"]
["aws", "lambda", "serverless"]
["authentication", "security", "jwt"]
["performance", "optimization", "caching"]
["testing", "unit-tests", "mocking"]
"Code review insight: Extract validation logic into separate functions for better testability and reusability"
"Sprint retrospective: Using feature flags reduced deployment risk and enabled faster rollbacks"
"Technical debt: UserService class has grown too large, needs refactoring into smaller domain-specific services"
Based on testing with various dataset sizes:
Memory Count | Search Time | Storage Size | RAM Usage |
---|---|---|---|
1,000 | <50ms | ~5MB | ~100MB |
5,000 | <100ms | ~20MB | ~200MB |
10,000 | <200ms | ~40MB | ~300MB |
Tested on MacBook Air M1 with sentence-transformers/all-MiniLM-L6-v2
This is a standalone MCP server designed for personal/team use. For improvements:
- Fork the server.py file
- Modify as needed for your use case
- Test thoroughly with your specific requirements
- Share improvements via GitHub discussions
This project is released under the MIT License. See the embedded license in the server.py file for details.
- sqlite-vec: Alex Garcia's excellent SQLite vector extension
- sentence-transformers: Nils Reimers' semantic embedding library
- FastMCP: Anthropic's high-level MCP framework
- Claude Desktop: For providing the MCP integration platform
Built for developers who want persistent AI memory without the complexity of dedicated vector databases.