Skip to content

Commit ef90fbb

Browse files
yinghu5pre-commit-ci[bot]mkbhanda
authored
Update README.md for Multiplatforms (#707)
* Update README.md for Multiplatforms Update README.md for Gaudi & Multiplatforms * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update comps/llms/text-generation/vllm/ray/README.md Co-authored-by: Malini Bhandaru <[email protected]> * Update comps/llms/text-generation/vllm/ray/README.md Co-authored-by: Malini Bhandaru <[email protected]> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Update comps/llms/text-generation/vllm/ray/README.md Co-authored-by: Malini Bhandaru <[email protected]> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Malini Bhandaru <[email protected]>
1 parent 3a31295 commit ef90fbb

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

comps/llms/text-generation/vllm/ray/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
11
# VLLM-Ray Endpoint Service
22

3-
[Ray](https://docs.ray.io/en/latest/serve/index.html) is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs, built on [Ray Serve](https://docs.ray.io/en/latest/serve/index.html), has native support for autoscaling and multi-node deployments, which is easy to use for LLM inference serving on Intel Gaudi2 accelerators. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](<(https://habana.ai/products)>) for more details.
3+
[Ray](https://docs.ray.io/en/latest/serve/index.html) is an LLM serving solution that makes it easy to deploy and manage a variety of open source LLMs. Built on [Ray Serve](https://docs.ray.io/en/latest/serve/index.html) it has native support for autoscaling and multi-node deployments, and is easy to use for LLM inference serving across multiple platforms.
44

5-
[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference and serving, it delivers state-of-the-art serving throughput with a set of advanced features such as PagedAttention, Continuous batching and etc.. Besides GPUs, vLLM already supported [Intel CPUs](https://www.intel.com/content/www/us/en/products/overview.html) and [Gaudi accelerators](https://habana.ai/products).
5+
[vLLM](https://github.com/vllm-project/vllm) is a fast and easy-to-use library for LLM inference and serving, it delivers state-of-the-art serving throughput with a set of advanced features such as PagedAttention and Continuous Batching among others. Besides GPUs, vLLM supports [Intel CPUs](https://www.intel.com/content/www/us/en/products/overview.html) and [Intel Gaudi accelerators](https://habana.ai/products).
66

7-
This guide provides an example on how to launch vLLM with Ray serve endpoint on Gaudi accelerators.
7+
This guide provides an example on how to launch vLLM with Ray serve endpoint on [Intel Gaudi2 Accelerator](https://www.intel.com/content/www/us/en/products/details/processors/ai-accelerators/gaudi-overview.html).
88

99
## Set up environment
1010

0 commit comments

Comments
 (0)