-
Notifications
You must be signed in to change notification settings - Fork 301
Description
Hi, maintainers,
I followed the README https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/gpu and tried to use docker-compose to deploy ChatQnA. However, tei-reranking-server and tei-embedding-server failed. The following is the error logs:
$ docker logs tei-reranking-server
2024-07-23T04:25:26.629348Z INFO text_embeddings_router: router/src/main.rs:140: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, hf_api_token: None, hostname: "427706bc91b6", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None }
2024-07-23T04:25:26.635961Z INFO hf_hub: /root/.cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-23T04:25:28.061305Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-07-23T04:25:28.061547Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 254.674µs
2024-07-23T04:25:28.900779Z WARN text_embeddings_router: router/src/lib.rs:165: Could not find a Sentence Transformers config
2024-07-23T04:25:28.900827Z INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-07-23T04:25:28.925017Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 112 tokenization workers
2024-07-23T04:25:51.644147Z INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
Error: Could not create backend
Caused by:
Could not start backend: Runtime compute cap 90 is not compatible with compile time compute cap 80
I root caused that the image ghcr.io/huggingface/text-embeddings-inference:1.2
reranking and embedding services use is incompatible for some GPUs. For example, my GPU card H100, which is Hopper architecture, it should use image ghcr.io/huggingface/text-embeddings-inference:hopper-1.5
. See compatibility in : https://github.com/huggingface/text-embeddings-inference/tree/main
I filed a PR to fix this issue. Please correct me if I am not right. Or notify me if you have a better fix.