-
Notifications
You must be signed in to change notification settings - Fork 272
Description
Overview
Create a comprehensive MCP Server for IBM Granite Embedding Models supporting semantic search, document retrieval, similarity analysis, and vector database integration with enterprise-grade capabilities for enhanced user intent understanding.
Server Specifications
Server Details
- Name:
granite-embedding-server
- Language: Python 3.11+
- Location:
mcp-servers/python/granite_embedding_server/
- Purpose: Provide semantic embedding capabilities via IBM Granite embedding models
Supported Models
From IBM Granite Embedding Models on Hugging Face:
- Text Embeddings: granite-embedding-768, granite-embedding-1024
- Code Embeddings: granite-code-embedding-v1, granite-multilingual-code
- Document Embeddings: granite-document-embedding, granite-legal-embedding
- Multilingual: granite-multilingual-embedding, granite-cross-lingual
Provider Support
- Ollama: Local inference with embedding models
- watsonx.ai: IBM's enterprise embedding services
- Hugging Face: Direct access to Granite embedding models
- Custom Endpoints: Integration with vector databases and search engines
Vector Database Integration
- ChromaDB: Open-source vector database
- Pinecone: Managed vector database service
- Weaviate: Open-source vector search engine
- Qdrant: Vector similarity search engine
- Elasticsearch: Full-text and vector search
- Custom: Generic vector database integration
Tools Provided
1. generate_embeddings
Generate high-quality semantic embeddings for text and documents
@dataclass
class EmbeddingGenerationRequest:
texts: List[str] # Text content to embed
model: str = "granite-embedding-768"
provider: str = "ollama"
embedding_type: str = "semantic" # semantic, code, document, multilingual
normalize_embeddings: bool = True
batch_size: int = 32
pooling_strategy: str = "mean" # mean, cls, max
output_format: str = "numpy" # numpy, list, tensor
2. semantic_search
Perform semantic search across document collections
@dataclass
class SemanticSearchRequest:
query: str
model: str = "granite-embedding-768"
provider: str = "watsonx"
collection_id: str # Vector database collection
top_k: int = 10
similarity_threshold: float = 0.7
include_scores: bool = True
include_metadata: bool = True
rerank_results: bool = False
3. compute_similarity
Calculate semantic similarity between texts
@dataclass
class SimilarityRequest:
text_pairs: List[Tuple[str, str]] # Pairs of texts to compare
model: str = "granite-embedding-1024"
provider: str = "huggingface"
similarity_metric: str = "cosine" # cosine, euclidean, dot_product
batch_processing: bool = True
include_matrix: bool = False # Full similarity matrix
4. cluster_documents
Document clustering based on semantic similarity
@dataclass
class DocumentClusteringRequest:
documents: List[str]
model: str = "granite-document-embedding"
provider: str = "ollama"
num_clusters: Optional[int] = None # Auto-determine if None
clustering_algorithm: str = "kmeans" # kmeans, hierarchical, dbscan
min_cluster_size: int = 2
include_cluster_summaries: bool = True
5. find_duplicates
Detect near-duplicate documents using embeddings
@dataclass
class DuplicateDetectionRequest:
documents: List[str]
model: str = "granite-embedding-768"
provider: str = "watsonx"
similarity_threshold: float = 0.95
deduplication_strategy: str = "threshold" # threshold, clustering, hierarchical
preserve_originals: bool = True
include_similarity_scores: bool = True
6. build_vector_index
Create and manage vector database indexes
@dataclass
class VectorIndexRequest:
documents: List[Dict[str, Any]] # Documents with metadata
model: str = "granite-embedding-1024"
provider: str = "huggingface"
vector_db: str = "chromadb" # chromadb, pinecone, weaviate, qdrant
index_name: str
metadata_fields: List[str]
batch_size: int = 100
create_if_not_exists: bool = True
7. recommend_content
Content recommendation based on user preferences
@dataclass
class ContentRecommendationRequest:
user_profile: str # User preferences or history
candidate_items: List[str] # Items to recommend from
model: str = "granite-embedding-768"
provider: str = "ollama"
recommendation_strategy: str = "similarity" # similarity, collaborative, hybrid
top_k: int = 10
diversity_factor: float = 0.2 # Balance between relevance and diversity
8. analyze_text_relationships
Analyze semantic relationships within text collections
@dataclass
class TextRelationshipRequest:
texts: List[str]
model: str = "granite-multilingual-embedding"
provider: str = "watsonx"
relationship_types: List[str] = ["similarity", "clustering", "outliers"]
visualization: bool = True
include_statistics: bool = True
output_format: str = "graph" # graph, matrix, hierarchical
Implementation Requirements
Directory Structure
mcp-servers/python/granite_embedding_server/
├── src/
│ └── granite_embedding_server/
│ ├── __init__.py
│ ├── server.py
│ ├── providers/
│ │ ├── __init__.py
│ │ ├── ollama_embedding.py
│ │ ├── watsonx_embedding.py
│ │ ├── huggingface_embedding.py
│ │ └── custom_endpoint.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── granite_embedding_models.py
│ │ └── model_capabilities.py
│ ├── vector_db/
│ │ ├── __init__.py
│ │ ├── chromadb_client.py
│ │ ├── pinecone_client.py
│ │ ├── weaviate_client.py
│ │ ├── qdrant_client.py
│ │ └── base_client.py
│ ├── processing/
│ │ ├── __init__.py
│ │ ├── embedding_processor.py
│ │ ├── similarity_calculator.py
│ │ ├── clustering.py
│ │ └── search_engine.py
│ ├── tools/
│ │ ├── __init__.py
│ │ ├── embedding_tools.py
│ │ ├── search_tools.py
│ │ ├── similarity_tools.py
│ │ └── recommendation_tools.py
│ └── utils/
│ ├── __init__.py
│ ├── text_preprocessing.py
│ ├── vector_operations.py
│ └── visualization.py
├── tests/
├── requirements.txt
├── README.md
└── examples/
├── semantic_search.py
├── document_clustering.py
└── recommendation_system.py
Dependencies
# requirements.txt
mcp>=1.0.0
transformers>=4.35.0
torch>=2.1.0
sentence-transformers>=2.2.2
numpy>=1.24.0
scipy>=1.11.0
scikit-learn>=1.3.0
chromadb>=0.4.0
pinecone-client>=2.2.0
weaviate-client>=3.25.0
qdrant-client>=1.7.0
elasticsearch>=8.11.0
faiss-cpu>=1.7.4
plotly>=5.17.0
umap-learn>=0.5.4
requests>=2.31.0
pydantic>=2.5.0
ollama>=0.1.7
ibm-watson-machine-learning>=1.0.325
Configuration
# config.yaml
providers:
ollama:
base_url: "http://localhost:11434"
embedding_models_enabled: true
timeout: 180
watsonx:
url: "https://us-south.ml.cloud.ibm.com"
apikey: "${WATSONX_API_KEY}"
project_id: "${WATSONX_PROJECT_ID}"
embedding_endpoint: "/ml/v1/embeddings"
huggingface:
api_key: "${HF_API_KEY}"
cache_dir: "./hf_embedding_cache"
device: "auto"
models:
default_text: "granite-embedding-768"
default_code: "granite-code-embedding-v1"
default_document: "granite-document-embedding"
default_multilingual: "granite-multilingual-embedding"
vector_databases:
chromadb:
persist_directory: "./chromadb_data"
collection_prefix: "granite_"
pinecone:
api_key: "${PINECONE_API_KEY}"
environment: "${PINECONE_ENVIRONMENT}"
index_dimension: 768
weaviate:
url: "${WEAVIATE_URL}"
api_key: "${WEAVIATE_API_KEY}"
qdrant:
host: "${QDRANT_HOST}"
port: 6333
api_key: "${QDRANT_API_KEY}"
processing:
max_batch_size: 64
max_text_length: 8192
default_similarity_threshold: 0.7
enable_gpu: true
search:
default_top_k: 10
max_results: 1000
enable_reranking: true
clustering:
algorithms: ["kmeans", "hierarchical", "dbscan", "spectral"]
max_clusters: 100
min_cluster_size: 2
Usage Examples
Generate Document Embeddings
# Generate embeddings for a collection of documents
result = await mcp_client.call_tool("generate_embeddings", {
"texts": [
"Artificial intelligence is transforming healthcare",
"Machine learning algorithms improve diagnostic accuracy",
"Deep learning models process medical imaging data"
],
"model": "granite-embedding-768",
"provider": "ollama",
"embedding_type": "document",
"normalize_embeddings": True,
"output_format": "numpy"
})
Semantic Search
# Search for relevant documents
result = await mcp_client.call_tool("semantic_search", {
"query": "AI applications in medical diagnosis",
"model": "granite-embedding-1024",
"provider": "watsonx",
"collection_id": "medical_papers",
"top_k": 15,
"similarity_threshold": 0.75,
"include_scores": True,
"rerank_results": True
})
Document Similarity Analysis
# Compare document similarity
result = await mcp_client.call_tool("compute_similarity", {
"text_pairs": [
("Research paper on neural networks", "Deep learning study"),
("Climate change analysis", "Weather prediction model"),
("Financial market trends", "Stock price forecasting")
],
"model": "granite-embedding-768",
"provider": "huggingface",
"similarity_metric": "cosine",
"batch_processing": True
})
Document Clustering
# Cluster related documents
result = await mcp_client.call_tool("cluster_documents", {
"documents": [
"Introduction to machine learning algorithms",
"Neural network architecture design",
"Climate change impact studies",
"Weather forecasting techniques",
"Financial risk assessment models",
"Investment portfolio optimization"
],
"model": "granite-document-embedding",
"provider": "ollama",
"clustering_algorithm": "kmeans",
"num_clusters": 3,
"include_cluster_summaries": True
})
Build Vector Index
# Create vector database index
result = await mcp_client.call_tool("build_vector_index", {
"documents": [
{
"text": "Research paper content...",
"title": "AI in Healthcare",
"author": "Dr. Smith",
"category": "medical"
},
# ... more documents
],
"model": "granite-embedding-1024",
"provider": "watsonx",
"vector_db": "chromadb",
"index_name": "research_papers",
"metadata_fields": ["title", "author", "category"],
"batch_size": 50
})
Content Recommendation
# Recommend content based on user profile
result = await mcp_client.call_tool("recommend_content", {
"user_profile": "Interested in AI, machine learning, and data science applications",
"candidate_items": [
"Advanced neural network architectures",
"Statistical analysis methods",
"Computer vision applications",
"Natural language processing",
"Database optimization techniques"
],
"model": "granite-embedding-768",
"provider": "huggingface",
"top_k": 3,
"diversity_factor": 0.3
})
Duplicate Detection
# Find near-duplicate documents
result = await mcp_client.call_tool("find_duplicates", {
"documents": [
"Machine learning is a subset of artificial intelligence",
"ML represents a branch of AI technology",
"Natural language processing enables human-computer interaction",
"NLP allows computers to understand human language",
"Deep learning uses neural networks for pattern recognition"
],
"model": "granite-embedding-768",
"provider": "ollama",
"similarity_threshold": 0.85,
"include_similarity_scores": True
})
Advanced Features
- Multi-modal Embeddings: Support for text, code, and document embeddings
- Cross-lingual Search: Multilingual semantic understanding
- Hierarchical Clustering: Multi-level document organization
- Temporal Embeddings: Time-aware semantic representations
- Domain Adaptation: Fine-tuned embeddings for specific domains
- Federated Search: Search across multiple vector databases
Enterprise Features
- High-Performance Computing: GPU acceleration for large-scale embedding
- Scalable Architecture: Handle millions of documents
- API Integration: RESTful endpoints for enterprise applications
- Security: Encrypted embedding storage and transmission
- Monitoring: Performance metrics and usage analytics
- Compliance: Data governance and audit trails
Vector Database Ecosystem
- Multi-database Support: Work with various vector database platforms
- Index Optimization: Efficient vector storage and retrieval
- Distributed Storage: Scale across multiple database instances
- Backup and Recovery: Data persistence and disaster recovery
- Migration Tools: Move embeddings between database systems
Performance Optimizations
- Batch Processing: Efficient bulk embedding generation
- Caching: Intelligent caching of embeddings and search results
- Model Quantization: Optimized models for deployment
- Parallel Processing: Multi-threaded embedding computation
- Memory Management: Efficient memory usage for large datasets
Acceptance Criteria
- Python MCP server with 8+ Granite embedding model tools
- Support for all major Granite embedding models
- Multi-provider integration (Ollama, watsonx.ai, Hugging Face)
- Semantic search and similarity analysis capabilities
- Document clustering and duplicate detection
- Vector database integration (ChromaDB, Pinecone, Weaviate, Qdrant)
- Content recommendation system
- Batch embedding generation and processing
- Multi-format output support (NumPy, lists, tensors)
- Performance optimization for large-scale processing
- Comprehensive test suite with sample documents (>90% coverage)
- Complete documentation with semantic search examples
Priority
High - Essential for modern AI applications requiring semantic understanding
Use Cases
- Semantic search and information retrieval
- Document similarity and clustering
- Content recommendation systems
- Duplicate detection and deduplication
- Knowledge base organization
- Customer support automation
- Legal document analysis
- Research paper categorization
- Code similarity and search
- Multi-language content processing
- E-commerce product matching
- Enterprise knowledge management