Skip to content

Commit 9307e33

Browse files
Merge pull request #129 from kerthcet/doc/add-architecture
Add Architecture diagram
2 parents fcc3543 + 920f586 commit 9307e33

File tree

3 files changed

+35
-13
lines changed

3 files changed

+35
-13
lines changed

README.md

Lines changed: 12 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
<p align="center">
22
<picture>
33
<source media="(prefers-color-scheme: dark)" srcset="./docs/assets/logo.png">
4-
<img alt="llmaz" src="https://github.com/InftyAI/llmaz/blob/main/docs/assets/logo.png" width=55%>
4+
<img alt="llmaz" src="./docs/assets/logo.png" width=55%>
55
</picture>
66
</p>
77

@@ -20,11 +20,11 @@ Easy, advanced inference platform for large language models on Kubernetes
2020

2121
> 🌱 llmaz is alpha now, so API may change before graduating to Beta.
2222
23-
## Concept
23+
## Architecture
2424

25-
![image](./docs/assets/overview.png)
25+
![image](./docs/assets/arch.png)
2626

27-
## Feature Overview
27+
## Features Overview
2828

2929
- **Easy of Use**: People can quick deploy a LLM service with minimal configurations.
3030
- **Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md).
@@ -42,7 +42,7 @@ Read the [Installation](./docs/installation.md) for guidance.
4242

4343
### Deploy
4444

45-
Here's a simplest sample for deploying `facebook/opt-125m`, all you need to do
45+
Here's a toy sample for deploying `facebook/opt-125m`, all you need to do
4646
is to apply a `Model` and a `Playground`.
4747

4848
Please refer to **[examples](/docs/examples/README.md)** to learn more.
@@ -107,6 +107,10 @@ curl http://localhost:8080/v1/completions \
107107
}'
108108
```
109109

110+
### More than QuickStart
111+
112+
If you want to learn more about this project, please refer to [develop.md](./docs/develop.md).
113+
110114
## Roadmap
111115

112116
- Gateway support for traffic routing
@@ -115,17 +119,12 @@ curl http://localhost:8080/v1/completions \
115119
- CLI tool support
116120
- Model training, fine tuning in the long-term
117121

118-
## Project Structures
119-
120-
```structure
121-
llmaz # root
122-
├── llmaz # where the model loader logic locates
123-
├── pkg # where the main logic for Kubernetes controllers locates
124-
```
125122

126123
## Contributions
127124

128-
🚀 All kinds of contributions are welcomed ! Please follow [Contributing](./CONTRIBUTING.md). Thanks to all these contributors.
125+
🚀 All kinds of contributions are welcomed ! Please follow [CONTRIBUTING.md](./CONTRIBUTING.md).
126+
127+
**🎉 Thanks to all these contributors !**
129128

130129
<a href="https://github.com/inftyai/llmaz/graphs/contributors">
131130
<img src="https://contrib.rocks/image?repo=inftyai/llmaz" />

docs/assets/arch.png

109 KB
Loading

docs/develop.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Develop Guidance
2+
3+
A develop guidance for people who want to learn more about this project.
4+
5+
## Project Structure
6+
7+
```structure
8+
llmaz # root
9+
├── llmaz # where the model loader logic locates
10+
├── pkg # where the main logic for Kubernetes controllers locates
11+
```
12+
13+
## API design
14+
15+
### Core APIs
16+
17+
**OpenModel**: `OpenModel` is mostly like to store the open sourced models as a cluster-scope object. We may need namespaced models in the future for tenant isolation. Usually, the cloud provider or model provider should set this object because they know models well, like the accelerators or the scaling primitives.
18+
19+
### Inference APIs
20+
21+
**Playground**: `Playground` is for easy usage, people who has little knowledge about cloud can quick deploy a large language model with minimal configurations. `Playground` is integrated with the SOTA inference engines already, like vLLM.
22+
23+
**Service**: `Service` is the real inference workload, people has advanced configuration requirements can deploy with `Service` directly if `Playground` can not meet their demands like they have a customized inference engine, which hasn't been integrated with llmaz yet. Or they have different topology requirements to align with the Pods.

0 commit comments

Comments
 (0)