Skip to content

[ChatQnA] Switch to vLLM as default llm backend on Xeon #1403

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 24 commits into from
Jan 17, 2025

Conversation

wangkl2
Copy link
Collaborator

@wangkl2 wangkl2 commented Jan 16, 2025

Description

Switch to vLLM as the default LLM backend on Xeon for ChatQnA pipeline.

Switching from TGI to vLLM as the default LLM serving backend on Xeon for the ChatQnA example to enhance the perf. Via benchmarking on Xeon server with vLLM and TGI backend for LLM component for different ISL/OSL and various number of queries and concurrency, the geomean of measured LLMServe perf on a 7B model shows perf improvement of vLLM over TGI on several metrics including average total latency, average TTFT, average TPOT and throughput. TGI is still offered as an option to deploy for LLM serving. Besides, vLLM LLM also replaces TGI LLM for other provided E2E ChatQnA pipelines including without-rerank pipeline, pinecone as the vectorDB, qdrant as the vectorDB. This PR also aligns the parameters of llm service in all chatqna test scripts with what in readme file.

Issues

#1213

Type of change

  • New feature (non-breaking change which adds new functionality)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

n/a

Tests

TGI version: 2.4.0
vLLM version: 0.6.6.post2.dev151+gbd828722

Benchmark and compare the LLMServe perf on GNR server with OOB-vLLM and OOB-TGI backend via GenAIEval. The geomean perf of vLLM performs better than TGI for average total latency, average TTFT, average TPOT and throughput on 7B LLM with 4 sets of ISL/OSL (128/128, 128/1024, 1024/128, 1024/1024), measured on different num_queries and concurrency, including 32/8, 128/32.

wangkl2 and others added 15 commits January 15, 2025 06:20
Switching from TGI to vLLM as the default LLM serving backend on Xeon for the ChatQnA example to enhance the perf. Via benchmarking on Xeon server with vLLM and TGI backend for LLM component for different ISL/OSL and various number of queries and concurrency, the geomean of measured LLMServe perf on a 7B model shows perf improvement of vLLM over TGI on several metrics including average total latency, average TTFT, average TPOT and throughput. TGI is still offered as an option to deploy for LLM serving. Besides, vLLM LLM also replaces TGI LLM for other provided E2E ChatQnA pipelines including without-rerank pipeline, pinecone as the vectorDB, qdrant as the vectorDB.

Implement opea-project#1213

Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Copy link

github-actions bot commented Jan 16, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

@wangkl2 wangkl2 requested review from chensuyue and XinyaoWa January 16, 2025 15:44
@joshuayao joshuayao requested a review from yao531441 January 17, 2025 08:13
@chensuyue chensuyue merged commit 742cb6d into opea-project:main Jan 17, 2025
18 checks passed
chyundunovDatamonsters pushed a commit to chyundunovDatamonsters/OPEA-GenAIExamples that referenced this pull request Mar 4, 2025
…#1403)

Switching from TGI to vLLM as the default LLM serving backend on Xeon for the ChatQnA example to enhance the perf.

opea-project#1213
Signed-off-by: Wang, Kai Lawrence <[email protected]>
Signed-off-by: Chingis Yundunov <[email protected]>
@joshuayao joshuayao added this to the v1.3 milestone Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants