Skip to content

Commit 574fadb

Browse files
Add vllm Arc Dockerfile support
Support vllm inference on Intel ARC GPU Signed-off-by: Li Gang <[email protected]> Co-authored-by: Chen, Hu1 <[email protected]>
1 parent 9f68bd3 commit 574fadb

File tree

2 files changed

+28
-0
lines changed

2 files changed

+28
-0
lines changed
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Copyright (C) 2024 Intel Corporation
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
FROM intelanalytics/ipex-llm-serving-vllm-xpu-experiment:2.1.0b2
5+
6+
COPY comps/llms/text-generation/vllm/vllm_arc.sh /llm
7+
8+
RUN chmod +x /llm/vllm_arc.sh
9+
10+
ENTRYPOINT ["/llm/vllm_arc.sh"]
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/bin/bash
2+
3+
# Copyright (C) 2024 Intel Corporation
4+
# SPDX-License-Identifier: Apache-2.0
5+
6+
LLM_MODEL_ID="${LLM_MODEL_ID:=Intel/neural-chat-7b-v3-3}"
7+
8+
source /opt/intel/oneapi/setvars.sh
9+
source /opt/intel/1ccl-wks/setvars.sh
10+
11+
python -m ipex_llm.vllm.xpu.entrypoints.openai.api_server \
12+
--port 9009 \
13+
--model ${LLM_MODEL_ID} \
14+
--trust-remote-code \
15+
--gpu-memory-utilization 0.9 \
16+
--device xpu \
17+
--enforce-eager \
18+
$@

0 commit comments

Comments
 (0)