You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -20,11 +20,11 @@ Easy, advanced inference platform for large language models on Kubernetes
20
20
21
21
> 🌱 llmaz is alpha now, so API may change before graduating to Beta.
22
22
23
-
## Concept
23
+
## Architecture
24
24
25
-

25
+

26
26
27
-
## Feature Overview
27
+
## Features Overview
28
28
29
29
-**Easy of Use**: People can quick deploy a LLM service with minimal configurations.
30
30
-**Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md).
@@ -42,7 +42,7 @@ Read the [Installation](./docs/installation.md) for guidance.
42
42
43
43
### Deploy
44
44
45
-
Here's a simplest sample for deploying `facebook/opt-125m`, all you need to do
45
+
Here's a toy sample for deploying `facebook/opt-125m`, all you need to do
46
46
is to apply a `Model` and a `Playground`.
47
47
48
48
Please refer to **[examples](/docs/examples/README.md)** to learn more.
A develop guidance for people who want to learn more about this project.
4
+
5
+
## Project Structure
6
+
7
+
```structure
8
+
llmaz # root
9
+
├── llmaz # where the model loader logic locates
10
+
├── pkg # where the main logic for Kubernetes controllers locates
11
+
```
12
+
13
+
## API design
14
+
15
+
### Core APIs
16
+
17
+
**OpenModel**: `OpenModel` is mostly like to store the open sourced models as a cluster-scope object. We may need namespaced models in the future for tenant isolation. Usually, the cloud provider or model provider should set this object because they know models well, like the accelerators or the scaling primitives.
18
+
19
+
### Inference APIs
20
+
21
+
**Playground**: `Playground` is for easy usage, people who has little knowledge about cloud can quick deploy a large language model with minimal configurations. `Playground` is integrated with the SOTA inference engines already, like vLLM.
22
+
23
+
**Service**: `Service` is the real inference workload, people has advanced configuration requirements can deploy with `Service` directly if `Playground` can not meet their demands like they have a customized inference engine, which hasn't been integrated with llmaz yet. Or they have different topology requirements to align with the Pods.
0 commit comments