You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -20,11 +20,11 @@ Easy, advanced inference platform for large language models on Kubernetes
20
20
21
21
> 🌱 llmaz is alpha now, so API may change before graduating to Beta.
22
22
23
-
## Concept
23
+
#Architecture
24
24
25
-

25
+

26
26
27
-
## Feature Overview
27
+
## Features Overview
28
28
29
29
-**Easy of Use**: People can quick deploy a LLM service with minimal configurations.
30
30
-**Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md).
@@ -34,6 +34,7 @@ Easy, advanced inference platform for large language models on Kubernetes
34
34
-**Various Model Providers**: llmaz supports a wide range of model providers, such as [HuggingFace](https://huggingface.co/), [ModelScope](https://www.modelscope.cn), ObjectStores(aliyun OSS, more on the way). llmaz automatically handles the model loading requiring no effort from users.
35
35
-**Multi-hosts Support**: llmaz supports both single-host and multi-hosts scenarios with [LWS](https://github.com/kubernetes-sigs/lws) from day 1.
36
36
37
+
37
38
## Quick Start
38
39
39
40
### Installation
@@ -42,7 +43,7 @@ Read the [Installation](./docs/installation.md) for guidance.
42
43
43
44
### Deploy
44
45
45
-
Here's a simplest sample for deploying `facebook/opt-125m`, all you need to do
46
+
Here's a toy sample for deploying `facebook/opt-125m`, all you need to do
46
47
is to apply a `Model` and a `Playground`.
47
48
48
49
Please refer to **[examples](/docs/examples/README.md)** to learn more.
A develop guidance for people who want to learn more about this project.
4
+
5
+
## Project Structure
6
+
7
+
```structure
8
+
llmaz # root
9
+
├── llmaz # where the model loader logic locates
10
+
├── pkg # where the main logic for Kubernetes controllers locates
11
+
```
12
+
13
+
## API design
14
+
15
+
### Core APIs
16
+
17
+
**OpenModel**: `OpenModel` is mostly like to store the open sourced models as a cluster-scope object. We may need namespaced models in the future for tenant isolation. Usually, the cloud provider or model provider should set this object because they know models well, like the accelerators or the scaling primitives.
18
+
19
+
### Inference APIs
20
+
21
+
**Playground**: `Playground` is for easy usage, people who has little knowledge about cloud can quick deploy a large language model with minimal configurations. `Playground` is integrated with the SOTA inference engines already, like vLLM.
22
+
23
+
**Service**: `Service` is the real inference workload, people has advanced configuration requirements can deploy with `Service` directly if `Playground` can not meet their demands like they have a customized inference engine, which hasn't been integrated with llmaz yet. Or they have different topology requirements to align with the Pods.
0 commit comments