arangoml
diff --git a/‎comps/embeddings/deployment/docker_compose/compose.yaml
Lines changed: 16 additions & 0 deletions b/‎comps/embeddings/deployment/docker_compose/compose.yaml
Lines changed: 16 additions & 0 deletions
diff --git a/‎comps/embeddings/src/README.md
Lines changed: 5 additions & 1 deletion b/‎comps/embeddings/src/README.md
Lines changed: 5 additions & 1 deletion
diff --git a/‎comps/embeddings/src/README_ovms.md
Lines changed: 149 additions & 0 deletions b/‎comps/embeddings/src/README_ovms.md
Lines changed: 149 additions & 0 deletions
diff --git a/‎comps/embeddings/src/integrations/ovms.py
Lines changed: 100 additions & 0 deletions b/‎comps/embeddings/src/integrations/ovms.py
Lines changed: 100 additions & 0 deletions
diff --git a/‎comps/embeddings/src/opea_embedding_microservice.py
Lines changed: 1 addition & 0 deletions b/‎comps/embeddings/src/opea_embedding_microservice.py
Lines changed: 1 addition & 0 deletions
diff --git a/‎comps/llms/deployment/docker_compose/compose_text-generation.yaml
Lines changed: 12 additions & 0 deletions b/‎comps/llms/deployment/docker_compose/compose_text-generation.yaml
Lines changed: 12 additions & 0 deletions
diff --git a/‎comps/llms/src/text-generation/Dockerfile
Lines changed: 2 additions & 1 deletion b/‎comps/llms/src/text-generation/Dockerfile
Lines changed: 2 additions & 1 deletion
diff --git a/‎comps/llms/src/text-generation/README.md
Lines changed: 12 additions & 12 deletions b/‎comps/llms/src/text-generation/README.md
Lines changed: 12 additions & 12 deletions
@@ -3,6 +3,7 @@
 
 include:
   - ../../../third_parties/tei/deployment/docker_compose/compose.yaml
+  - ../../../third_parties/ovms/deployment/docker_compose/compose.yaml
   - ../../../third_parties/bridgetower/deployment/docker_compose/compose.yaml
   - ../../../third_parties/clip/deployment/docker_compose/compose_intel_cpu.yaml
 
@@ -40,6 +41,21 @@ services:
         condition: service_healthy
     restart: unless-stopped
 
+  ovms-embedding-server:
+    image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
+    container_name: ovms-embedding-server
+    ports:
+      - "${EMBEDDER_PORT:-10201}:6000"
+    ipc: host
+    environment:
+      no_proxy: ${no_proxy}
+      http_proxy: ${http_proxy}
+      https_proxy: ${https_proxy}
+      OVMS_EMBEDDING_ENDPOINT: ${OVMS_EMBEDDING_ENDPOINT}
+      EMBEDDING_COMPONENT_NAME: "OPEA_OVMS_EMBEDDING"
+      MODEL_ID: ${MODEL_ID}
+    restart: unless-stopped
+
   pg-embedding-server:
     image: ${REGISTRY:-opea}/embedding:${TAG:-latest}
     container_name: pg-embedding-server
 
@@ -12,7 +12,11 @@ Key Features:
 
 **Customizable**: Supports configuration and customization to meet specific use case requirements, including different embedding models and preprocessing techniques.
 
-Users are albe to configure and build embedding-related services according to their actual needs.
+Users are able to configure and build embedding-related services according to their actual needs.
+
+## Embeddings Microservice with OVMS
+
+For details, please refer to [readme](./README_ovms.md).
 
 ## Embeddings Microservice with TEI
 
 
@@ -0,0 +1,149 @@
+# 🌟 Embedding Microservice with OpenVINO Model Server
+
+This guide walks you through starting, deploying, and consuming the **OVMS Embeddings Microservice**. 🚀
+It is Intel highly optimized serving solution which employs OpenVINO Runtime for super fast inference on CPU.
+
+---
+
+## 📦 1. Start Microservice with `docker run`
+
+### 🔹 1.1 Start Embedding Service with OVMS
+
+1. Prepare the model in the model repository
+   This step will export the model from HuggingFace Hub to the local models repository. At the some time model will be converted to IR format and optionally quantized.  
+   It speedup starting the service and avoids copying the model from Internet each time the container starts.
+
+   ```
+   pip3 install -r https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/0/demos/common/export_models/requirements.txt
+   curl https://raw.githubusercontent.com/openvinotoolkit/model_server/refs/heads/releases/2025/0/demos/common/export_models/export_model.py -o export_model.py
+   mkdir models
+   python export_model.py embeddings --source_model BAAI/bge-large-en-v1.5 --weight-format int8 --config_file_path models/config_embeddings.json --model_repository_path models --target_device CPU
+   ```
+
+2. **Test the OVMS service**:
+   Run the following command to check if the service is up and running.
+
+```bash
+your_port=8090
+docker run -p $your_port:8000 -v ./models:/models --name ovms-embedding-serving \
+openvino/model_server:2025.0 --port 8000 --config_path /models/config_embeddings.json
+```
+
+3. **Test the OVMS service**:
+   Run the following command to check if the service is up and running.
+
+   ```bash
+   curl http://localhost:$your_port/v3/embeddings \
+   -X POST \
+   -H 'Content-Type: application/json'
+   -d '{
+   "model": "BAAI/bge-large-en-v1.5",
+   "input":"What is Deep Learning?"
+   }'
+   ```
+
+### 🔹 1.2 Build Docker Image and Run Docker with CLI
+
+1. Build the Docker image for the embedding microservice:
+
+   ```bash
+   cd ../../../
+   docker build -t opea/embedding:latest \
+   --build-arg https_proxy=$https_proxy --build-arg http_proxy=$http_proxy \
+   -f comps/embeddings/src/Dockerfile .
+   ```
+
+2. Run the embedding microservice and connect it to the OVMS service:
+
+   ```bash
+   docker run -d --name="embedding-ovms-server" \
+   -p 6000:6000 \
+   --ipc=host \
+   -e OVMS_EMBEDDING_ENDPOINT=$OVMS_EMBEDDING_ENDPOINT \
+   -e MODEL_ID=$MODEL_ID \
+   -e EMBEDDING_COMPONENT_NAME="OPEA_OVMS_EMBEDDING" \
+   opea/embedding:latest
+   ```
+
+## 📦 2. Start Microservice with docker compose
+
+Deploy both the OVMS Embedding Service and the Embedding Microservice using Docker Compose.
+
+🔹 Steps:
+
+1. Set environment variables:
+
+   ```bash
+   export host_ip=${your_ip_address}
+   export MODEL_ID="BAAI/bge-large-en-v1.5"
+   export OVMS_EMBEDDER_PORT=8090
+   export EMBEDDER_PORT=6000
+   export OVMS_EMBEDDING_ENDPOINT="http://${host_ip}:${OVMS_EMBEDDER_PORT}"
+   ```
+
+2. Navigate to the Docker Compose directory:
+
+   ```bash
+   cd comps/embeddings/deployment/docker_compose/
+   ```
+
+3. Start the services:
+
+   ```bash
+   docker compose up ovms-embedding-server -d
+   ```
+
+## 📦 3. Consume Embedding Service
+
+### 🔹 3.1 Check Service Status
+
+Verify the embedding service is running:
+
+```bash
+curl http://localhost:6000/v1/health_check \
+-X GET \
+-H 'Content-Type: application/json'
+```
+
+### 🔹 3.2 Use the Embedding Service API
+
+The API is compatible with the [OpenAI API](https://platform.openai.com/docs/api-reference/embeddings).
+
+1. Single Text Input
+
+   ```bash
+   curl http://localhost:6000/v1/embeddings \
+   -X POST \
+   -d '{"input":"Hello, world!"}' \
+   -H 'Content-Type: application/json'
+   ```
+
+2. Multiple Text Inputs with Parameters
+
+   ```bash
+   curl http://localhost:6000/v1/embeddings \
+   -X POST \
+   -d '{"input":["Hello, world!","How are you?"], "dimensions":100}' \
+   -H 'Content-Type: application/json'
+   ```
+
+## ✨ Tips for Better Understanding:
+
+1. Port Mapping:
+   Ensure the ports are correctly mapped to avoid conflicts with other services.
+
+2. Model Selection:
+   Choose a model appropriate for your use case, like "BAAI/bge-large-en-v1.5" or "BAAI/bge-base-en-v1.5".
+   It should be exported to the models repository and set in 'MODEL_ID' env in the deployment of the OPEA API wrapper.
+
+3. Models repository Volume:
+   The `-v ./models:/models` flag ensures the models directory is correctly mounted.
+
+4. Select correct configuration JSON file
+   Models repository can host multiple models. Choose the models to be served by selecting the right configuration file.
+   In the example above `config_embeddings.json`
+
+5. Upload the models to persistent volume claim in Kubernetes
+   Models repository with configuration JSON file will be mounted in the OVMS containers when deployed via [helm chart](../../third_parties/ovms/deployment/kubernetes/README.md).
+
+6. Learn more about [OVMS embeddings API](https://docs.openvino.ai/2025/model-server/ovms_docs_rest_api_embeddings.html)
@@ -0,0 +1,100 @@
+# Copyright (C) 2024 Intel Corporation
+# SPDX-License-Identifier: Apache-2.0
+
+import json
+import os
+from typing import List, Union
+
+import requests
+from huggingface_hub import AsyncInferenceClient
+
+from comps import CustomLogger, OpeaComponent, OpeaComponentRegistry, ServiceType
+from comps.cores.mega.utils import get_access_token
+from comps.cores.proto.api_protocol import EmbeddingRequest, EmbeddingResponse
+
+logger = CustomLogger("opea_ovms_embedding")
+logflag = os.getenv("LOGFLAG", False)
+TOKEN_URL = os.getenv("TOKEN_URL")
+CLIENTID = os.getenv("CLIENTID")
+CLIENT_SECRET = os.getenv("CLIENT_SECRET")
+MODEL_ID = os.getenv("MODEL_ID")
+
+
+@OpeaComponentRegistry.register("OPEA_OVMS_EMBEDDING")
+class OpeaOVMSEmbedding(OpeaComponent):
+    """A specialized embedding component derived from OpeaComponent for OVMS embedding services.
+
+    Attributes:
+        client (AsyncInferenceClient): An instance of the async client for embedding generation.
+        model_name (str): The name of the embedding model used.
+    """
+
+    def __init__(self, name: str, description: str, config: dict = None):
+        super().__init__(name, ServiceType.EMBEDDING.name.lower(), description, config)
+        self.base_url = os.getenv("OVMS_EMBEDDING_ENDPOINT", "http://localhost:8080")
+        self.client = self._initialize_client()
+
+        health_status = self.check_health()
+        if not health_status:
+            logger.error("OpeaOVMSEmbedding health check failed.")
+
+    def _initialize_client(self) -> AsyncInferenceClient:
+        """Initializes the AsyncInferenceClient."""
+        access_token = (
+            get_access_token(TOKEN_URL, CLIENTID, CLIENT_SECRET) if TOKEN_URL and CLIENTID and CLIENT_SECRET else None
+        )
+        headers = {"Authorization": f"Bearer {access_token}"} if access_token else {}
+        return AsyncInferenceClient(
+            model=MODEL_ID,
+            token=os.getenv("HUGGINGFACEHUB_API_TOKEN"),
+            headers=headers,
+        )
+
+    async def invoke(self, input: EmbeddingRequest) -> EmbeddingResponse:
+        """Invokes the embedding service to generate embeddings for the provided input.
+
+        Args:
+            input (EmbeddingRequest): The input in OpenAI embedding format, including text(s) and optional parameters like model.
+
+        Returns:
+            EmbeddingResponse: The response in OpenAI embedding format, including embeddings, model, and usage information.
+        """
+        # Parse input according to the EmbeddingRequest format
+        if isinstance(input.input, str):
+            texts = [input.input.replace("\n", " ")]
+        elif isinstance(input.input, list):
+            if all(isinstance(item, str) for item in input.input):
+                texts = [text.replace("\n", " ") for text in input.input]
+            else:
+                raise ValueError("Invalid input format: Only string or list of strings are supported.")
+        else:
+            raise TypeError("Unsupported input type: input must be a string or list of strings.")
+        response = await self.client.post(
+            json={
+                "input": texts,
+                "encoding_format": input.encoding_format,
+                "model": self.client.model,
+                "user": input.user,
+            },
+            model=f"{self.base_url}/v3/embeddings",
+            task="text-embedding",
+        )
+        embeddings = json.loads(response.decode())
+        return EmbeddingResponse(**embeddings)
+
+    def check_health(self) -> bool:
+        """Checks the health of the embedding service.
+
+        Returns:
+            bool: True if the service is reachable and healthy, False otherwise.
+        """
+        try:
+            response = requests.get(f"{self.base_url}/v2/health/ready")
+            if response.status_code == 200:
+                return True
+            else:
+                return False
+        except Exception as e:
+            # Handle connection errors, timeouts, etc.
+            logger.error(f"Health check failed: {e}")
+        return False
@@ -5,6 +5,7 @@
 import time
 
 from integrations.clip import OpeaClipEmbedding
+from integrations.ovms import OpeaOVMSEmbedding
 from integrations.predictionguard import PredictionguardEmbedding
 from integrations.tei import OpeaTEIEmbedding
 
 
@@ -5,6 +5,7 @@ include:
   - ../../../third_parties/tgi/deployment/docker_compose/compose.yaml
   - ../../../third_parties/vllm/deployment/docker_compose/compose.yaml
   - ../../../third_parties/ollama/deployment/docker_compose/compose.yaml
+  - ../../../third_parties/ovms/deployment/docker_compose/compose.yaml
 
 services:
   textgen:
@@ -100,6 +101,17 @@ services:
     environment:
       LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME:-OpeaTextGenNative}
 
+  textgen-service-ovms:
+    extends: textgen
+    container_name: textgen-service-ovms
+    environment:
+      LLM_COMPONENT_NAME: ${LLM_COMPONENT_NAME:-OpeaTextGenService}
+      OVMS_LLM_ENDPOINT: ${OVMS_LLM_ENDPOINT}
+      MODEL_ID: ${MODEL_ID}
+    depends_on:
+      ovms-llm-serving:
+        condition: service_healthy
+
 networks:
   default:
     driver: bridge
@@ -11,11 +11,12 @@ RUN useradd -m -s /bin/bash user && \
     mkdir -p /home/user && \
     chown -R user /home/user/
 
-COPY comps /home/user/comps
+COPY comps/llms/src/text-generation/requirements.txt /home/user/comps/llms/src/text-generation/requirements.txt
 
 RUN pip install --no-cache-dir --upgrade pip setuptools && \
     pip install --no-cache-dir -r /home/user/comps/llms/src/text-generation/requirements.txt
 
+COPY comps /home/user/comps
 ENV PYTHONPATH=$PYTHONPATH:/home/user
 
 USER user
 
@@ -8,18 +8,18 @@ Overall, this microservice offers a streamlined way to integrate large language
 
 ## Validated LLM Models
 
-| Model                                       | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi |
-| ------------------------------------------- | --------- | -------- | ---------- |
-| [Intel/neural-chat-7b-v3-3]                 | ✓         | ✓        | ✓          |
-| [meta-llama/Llama-2-7b-chat-hf]             | ✓         | ✓        | ✓          |
-| [meta-llama/Llama-2-70b-chat-hf]            | ✓         | -        | ✓          |
-| [meta-llama/Meta-Llama-3-8B-Instruct]       | ✓         | ✓        | ✓          |
-| [meta-llama/Meta-Llama-3-70B-Instruct]      | ✓         | -        | ✓          |
-| [Phi-3]                                     | x         | Limit 4K | Limit 4K   |
-| [deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | ✓         | -        | ✓          |
-| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B]  | ✓         | -        | ✓          |
-| [mistralai/Mistral-Small-24B-Instruct-2501] | ✓         | -        | ✓          |
-| [mistralai/Mistral-Large-Instruct-2411]     | x         | -        | ✓          |
+| Model                                       | TGI-Gaudi | vLLM-CPU | vLLM-Gaudi | OVMS     |
+| ------------------------------------------- | --------- | -------- | ---------- | -------- |
+| [Intel/neural-chat-7b-v3-3]                 | ✓         | ✓        | ✓          | ✓        |
+| [meta-llama/Llama-2-7b-chat-hf]             | ✓         | ✓        | ✓          | ✓        |
+| [meta-llama/Llama-2-70b-chat-hf]            | ✓         | -        | ✓          | -        |
+| [meta-llama/Meta-Llama-3-8B-Instruct]       | ✓         | ✓        | ✓          | ✓        |
+| [meta-llama/Meta-Llama-3-70B-Instruct]      | ✓         | -        | ✓          | -        |
+| [Phi-3]                                     | x         | Limit 4K | Limit 4K   | Limit 4K |
+| [deepseek-ai/DeepSeek-R1-Distill-Llama-70B] | ✓         | -        | ✓          | -        |
+| [deepseek-ai/DeepSeek-R1-Distill-Qwen-32B]  | ✓         | -        | ✓          | -        |
+| [mistralai/Mistral-Small-24B-Instruct-2501] | ✓         | -        | ✓          | -        |
+| [mistralai/Mistral-Large-Instruct-2411]     | x         | -        | ✓          | -        |
 
 ### System Requirements for LLM Models