Skip to content

Create IBM Granite Embedding Models MCP Server #911

@crivetimihai

Description

@crivetimihai

Overview

Create a comprehensive MCP Server for IBM Granite Embedding Models supporting semantic search, document retrieval, similarity analysis, and vector database integration with enterprise-grade capabilities for enhanced user intent understanding.

Server Specifications

Server Details

  • Name: granite-embedding-server
  • Language: Python 3.11+
  • Location: mcp-servers/python/granite_embedding_server/
  • Purpose: Provide semantic embedding capabilities via IBM Granite embedding models

Supported Models

From IBM Granite Embedding Models on Hugging Face:

  • Text Embeddings: granite-embedding-768, granite-embedding-1024
  • Code Embeddings: granite-code-embedding-v1, granite-multilingual-code
  • Document Embeddings: granite-document-embedding, granite-legal-embedding
  • Multilingual: granite-multilingual-embedding, granite-cross-lingual

Provider Support

  • Ollama: Local inference with embedding models
  • watsonx.ai: IBM's enterprise embedding services
  • Hugging Face: Direct access to Granite embedding models
  • Custom Endpoints: Integration with vector databases and search engines

Vector Database Integration

  • ChromaDB: Open-source vector database
  • Pinecone: Managed vector database service
  • Weaviate: Open-source vector search engine
  • Qdrant: Vector similarity search engine
  • Elasticsearch: Full-text and vector search
  • Custom: Generic vector database integration

Tools Provided

1. generate_embeddings

Generate high-quality semantic embeddings for text and documents

@dataclass
class EmbeddingGenerationRequest:
    texts: List[str]  # Text content to embed
    model: str = "granite-embedding-768"
    provider: str = "ollama"
    embedding_type: str = "semantic"  # semantic, code, document, multilingual
    normalize_embeddings: bool = True
    batch_size: int = 32
    pooling_strategy: str = "mean"  # mean, cls, max
    output_format: str = "numpy"  # numpy, list, tensor

2. semantic_search

Perform semantic search across document collections

@dataclass
class SemanticSearchRequest:
    query: str
    model: str = "granite-embedding-768"
    provider: str = "watsonx"
    collection_id: str  # Vector database collection
    top_k: int = 10
    similarity_threshold: float = 0.7
    include_scores: bool = True
    include_metadata: bool = True
    rerank_results: bool = False

3. compute_similarity

Calculate semantic similarity between texts

@dataclass
class SimilarityRequest:
    text_pairs: List[Tuple[str, str]]  # Pairs of texts to compare
    model: str = "granite-embedding-1024"
    provider: str = "huggingface"
    similarity_metric: str = "cosine"  # cosine, euclidean, dot_product
    batch_processing: bool = True
    include_matrix: bool = False  # Full similarity matrix

4. cluster_documents

Document clustering based on semantic similarity

@dataclass
class DocumentClusteringRequest:
    documents: List[str]
    model: str = "granite-document-embedding"
    provider: str = "ollama"
    num_clusters: Optional[int] = None  # Auto-determine if None
    clustering_algorithm: str = "kmeans"  # kmeans, hierarchical, dbscan
    min_cluster_size: int = 2
    include_cluster_summaries: bool = True

5. find_duplicates

Detect near-duplicate documents using embeddings

@dataclass
class DuplicateDetectionRequest:
    documents: List[str]
    model: str = "granite-embedding-768"
    provider: str = "watsonx"
    similarity_threshold: float = 0.95
    deduplication_strategy: str = "threshold"  # threshold, clustering, hierarchical
    preserve_originals: bool = True
    include_similarity_scores: bool = True

6. build_vector_index

Create and manage vector database indexes

@dataclass
class VectorIndexRequest:
    documents: List[Dict[str, Any]]  # Documents with metadata
    model: str = "granite-embedding-1024"
    provider: str = "huggingface"
    vector_db: str = "chromadb"  # chromadb, pinecone, weaviate, qdrant
    index_name: str
    metadata_fields: List[str]
    batch_size: int = 100
    create_if_not_exists: bool = True

7. recommend_content

Content recommendation based on user preferences

@dataclass
class ContentRecommendationRequest:
    user_profile: str  # User preferences or history
    candidate_items: List[str]  # Items to recommend from
    model: str = "granite-embedding-768"
    provider: str = "ollama"
    recommendation_strategy: str = "similarity"  # similarity, collaborative, hybrid
    top_k: int = 10
    diversity_factor: float = 0.2  # Balance between relevance and diversity

8. analyze_text_relationships

Analyze semantic relationships within text collections

@dataclass
class TextRelationshipRequest:
    texts: List[str]
    model: str = "granite-multilingual-embedding"
    provider: str = "watsonx"
    relationship_types: List[str] = ["similarity", "clustering", "outliers"]
    visualization: bool = True
    include_statistics: bool = True
    output_format: str = "graph"  # graph, matrix, hierarchical

Implementation Requirements

Directory Structure

mcp-servers/python/granite_embedding_server/
├── src/
│   └── granite_embedding_server/
│       ├── __init__.py
│       ├── server.py
│       ├── providers/
│       │   ├── __init__.py
│       │   ├── ollama_embedding.py
│       │   ├── watsonx_embedding.py
│       │   ├── huggingface_embedding.py
│       │   └── custom_endpoint.py
│       ├── models/
│       │   ├── __init__.py
│       │   ├── granite_embedding_models.py
│       │   └── model_capabilities.py
│       ├── vector_db/
│       │   ├── __init__.py
│       │   ├── chromadb_client.py
│       │   ├── pinecone_client.py
│       │   ├── weaviate_client.py
│       │   ├── qdrant_client.py
│       │   └── base_client.py
│       ├── processing/
│       │   ├── __init__.py
│       │   ├── embedding_processor.py
│       │   ├── similarity_calculator.py
│       │   ├── clustering.py
│       │   └── search_engine.py
│       ├── tools/
│       │   ├── __init__.py
│       │   ├── embedding_tools.py
│       │   ├── search_tools.py
│       │   ├── similarity_tools.py
│       │   └── recommendation_tools.py
│       └── utils/
│           ├── __init__.py
│           ├── text_preprocessing.py
│           ├── vector_operations.py
│           └── visualization.py
├── tests/
├── requirements.txt
├── README.md
└── examples/
    ├── semantic_search.py
    ├── document_clustering.py
    └── recommendation_system.py

Dependencies

# requirements.txt
mcp>=1.0.0
transformers>=4.35.0
torch>=2.1.0
sentence-transformers>=2.2.2
numpy>=1.24.0
scipy>=1.11.0
scikit-learn>=1.3.0
chromadb>=0.4.0
pinecone-client>=2.2.0
weaviate-client>=3.25.0
qdrant-client>=1.7.0
elasticsearch>=8.11.0
faiss-cpu>=1.7.4
plotly>=5.17.0
umap-learn>=0.5.4
requests>=2.31.0
pydantic>=2.5.0
ollama>=0.1.7
ibm-watson-machine-learning>=1.0.325

Configuration

# config.yaml
providers:
  ollama:
    base_url: "http://localhost:11434"
    embedding_models_enabled: true
    timeout: 180
    
  watsonx:
    url: "https://us-south.ml.cloud.ibm.com"
    apikey: "${WATSONX_API_KEY}"
    project_id: "${WATSONX_PROJECT_ID}"
    embedding_endpoint: "/ml/v1/embeddings"
    
  huggingface:
    api_key: "${HF_API_KEY}"
    cache_dir: "./hf_embedding_cache"
    device: "auto"

models:
  default_text: "granite-embedding-768"
  default_code: "granite-code-embedding-v1"
  default_document: "granite-document-embedding"
  default_multilingual: "granite-multilingual-embedding"

vector_databases:
  chromadb:
    persist_directory: "./chromadb_data"
    collection_prefix: "granite_"
    
  pinecone:
    api_key: "${PINECONE_API_KEY}"
    environment: "${PINECONE_ENVIRONMENT}"
    index_dimension: 768
    
  weaviate:
    url: "${WEAVIATE_URL}"
    api_key: "${WEAVIATE_API_KEY}"
    
  qdrant:
    host: "${QDRANT_HOST}"
    port: 6333
    api_key: "${QDRANT_API_KEY}"

processing:
  max_batch_size: 64
  max_text_length: 8192
  default_similarity_threshold: 0.7
  enable_gpu: true
  
search:
  default_top_k: 10
  max_results: 1000
  enable_reranking: true
  
clustering:
  algorithms: ["kmeans", "hierarchical", "dbscan", "spectral"]
  max_clusters: 100
  min_cluster_size: 2

Usage Examples

Generate Document Embeddings

# Generate embeddings for a collection of documents
result = await mcp_client.call_tool("generate_embeddings", {
    "texts": [
        "Artificial intelligence is transforming healthcare",
        "Machine learning algorithms improve diagnostic accuracy", 
        "Deep learning models process medical imaging data"
    ],
    "model": "granite-embedding-768",
    "provider": "ollama",
    "embedding_type": "document",
    "normalize_embeddings": True,
    "output_format": "numpy"
})

Semantic Search

# Search for relevant documents
result = await mcp_client.call_tool("semantic_search", {
    "query": "AI applications in medical diagnosis",
    "model": "granite-embedding-1024",
    "provider": "watsonx",
    "collection_id": "medical_papers",
    "top_k": 15,
    "similarity_threshold": 0.75,
    "include_scores": True,
    "rerank_results": True
})

Document Similarity Analysis

# Compare document similarity
result = await mcp_client.call_tool("compute_similarity", {
    "text_pairs": [
        ("Research paper on neural networks", "Deep learning study"),
        ("Climate change analysis", "Weather prediction model"),
        ("Financial market trends", "Stock price forecasting")
    ],
    "model": "granite-embedding-768",
    "provider": "huggingface",
    "similarity_metric": "cosine",
    "batch_processing": True
})

Document Clustering

# Cluster related documents
result = await mcp_client.call_tool("cluster_documents", {
    "documents": [
        "Introduction to machine learning algorithms",
        "Neural network architecture design",
        "Climate change impact studies",
        "Weather forecasting techniques",
        "Financial risk assessment models",
        "Investment portfolio optimization"
    ],
    "model": "granite-document-embedding",
    "provider": "ollama",
    "clustering_algorithm": "kmeans",
    "num_clusters": 3,
    "include_cluster_summaries": True
})

Build Vector Index

# Create vector database index
result = await mcp_client.call_tool("build_vector_index", {
    "documents": [
        {
            "text": "Research paper content...",
            "title": "AI in Healthcare",
            "author": "Dr. Smith",
            "category": "medical"
        },
        # ... more documents
    ],
    "model": "granite-embedding-1024",
    "provider": "watsonx",
    "vector_db": "chromadb",
    "index_name": "research_papers",
    "metadata_fields": ["title", "author", "category"],
    "batch_size": 50
})

Content Recommendation

# Recommend content based on user profile
result = await mcp_client.call_tool("recommend_content", {
    "user_profile": "Interested in AI, machine learning, and data science applications",
    "candidate_items": [
        "Advanced neural network architectures",
        "Statistical analysis methods", 
        "Computer vision applications",
        "Natural language processing",
        "Database optimization techniques"
    ],
    "model": "granite-embedding-768",
    "provider": "huggingface",
    "top_k": 3,
    "diversity_factor": 0.3
})

Duplicate Detection

# Find near-duplicate documents
result = await mcp_client.call_tool("find_duplicates", {
    "documents": [
        "Machine learning is a subset of artificial intelligence",
        "ML represents a branch of AI technology",
        "Natural language processing enables human-computer interaction",
        "NLP allows computers to understand human language",
        "Deep learning uses neural networks for pattern recognition"
    ],
    "model": "granite-embedding-768",
    "provider": "ollama",
    "similarity_threshold": 0.85,
    "include_similarity_scores": True
})

Advanced Features

  • Multi-modal Embeddings: Support for text, code, and document embeddings
  • Cross-lingual Search: Multilingual semantic understanding
  • Hierarchical Clustering: Multi-level document organization
  • Temporal Embeddings: Time-aware semantic representations
  • Domain Adaptation: Fine-tuned embeddings for specific domains
  • Federated Search: Search across multiple vector databases

Enterprise Features

  • High-Performance Computing: GPU acceleration for large-scale embedding
  • Scalable Architecture: Handle millions of documents
  • API Integration: RESTful endpoints for enterprise applications
  • Security: Encrypted embedding storage and transmission
  • Monitoring: Performance metrics and usage analytics
  • Compliance: Data governance and audit trails

Vector Database Ecosystem

  • Multi-database Support: Work with various vector database platforms
  • Index Optimization: Efficient vector storage and retrieval
  • Distributed Storage: Scale across multiple database instances
  • Backup and Recovery: Data persistence and disaster recovery
  • Migration Tools: Move embeddings between database systems

Performance Optimizations

  • Batch Processing: Efficient bulk embedding generation
  • Caching: Intelligent caching of embeddings and search results
  • Model Quantization: Optimized models for deployment
  • Parallel Processing: Multi-threaded embedding computation
  • Memory Management: Efficient memory usage for large datasets

Acceptance Criteria

  • Python MCP server with 8+ Granite embedding model tools
  • Support for all major Granite embedding models
  • Multi-provider integration (Ollama, watsonx.ai, Hugging Face)
  • Semantic search and similarity analysis capabilities
  • Document clustering and duplicate detection
  • Vector database integration (ChromaDB, Pinecone, Weaviate, Qdrant)
  • Content recommendation system
  • Batch embedding generation and processing
  • Multi-format output support (NumPy, lists, tensors)
  • Performance optimization for large-scale processing
  • Comprehensive test suite with sample documents (>90% coverage)
  • Complete documentation with semantic search examples

Priority

High - Essential for modern AI applications requiring semantic understanding

Use Cases

  • Semantic search and information retrieval
  • Document similarity and clustering
  • Content recommendation systems
  • Duplicate detection and deduplication
  • Knowledge base organization
  • Customer support automation
  • Legal document analysis
  • Research paper categorization
  • Code similarity and search
  • Multi-language content processing
  • E-commerce product matching
  • Enterprise knowledge management

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestmcp-serversMCP Server SamplesoicOpen Innovation Community ContributionspythonPython / backend development (FastAPI)

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions