Add Architecture figure

kerthcet · kerthcet · commit 71b3bda9a655 · 2024-09-06T17:07:40.000+08:00
Signed-off-by: kerthcet &lt;kerthcet@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -1,7 +1,7 @@
 <p align="center">
   <picture>
     <source media="(prefers-color-scheme: dark)" srcset="./docs/assets/logo.png">
-    <img alt="llmaz" src="https://github.com/InftyAI/llmaz/blob/main/docs/assets/logo.png" width=55%>
+    <img alt="llmaz" src="./docs/assets/logo.png" width=55%>
   </picture>
 </p>
 
@@ -20,11 +20,11 @@ Easy, advanced inference platform for large language models on Kubernetes
 
 > 🌱 llmaz is alpha now, so API may change before graduating to Beta.
 
-## Concept
+# Architecture
 
-![image](./docs/assets/overview.png)
+![image](./docs/assets/arch.png)
 
-## Feature Overview
+## Features Overview
 
 - **Easy of Use**: People can quick deploy a LLM service with minimal configurations.
 - **Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md).
@@ -34,6 +34,7 @@ Easy, advanced inference platform for large language models on Kubernetes
 - **Various Model Providers**: llmaz supports a wide range of model providers, such as [HuggingFace](https://huggingface.co/), [ModelScope](https://www.modelscope.cn), ObjectStores(aliyun OSS, more on the way). llmaz automatically handles the model loading requiring no effort from users.
 - **Multi-hosts Support**: llmaz supports both single-host and multi-hosts scenarios with [LWS](https://github.com/kubernetes-sigs/lws) from day 1.
 
+
 ## Quick Start
 
 ### Installation
@@ -42,7 +43,7 @@ Read the [Installation](./docs/installation.md) for guidance.
 
 ### Deploy
 
-Here's a simplest sample for deploying `facebook/opt-125m`, all you need to do
+Here's a toy sample for deploying `facebook/opt-125m`, all you need to do
 is to apply a `Model` and a `Playground`.
 
 Please refer to **[examples](/docs/examples/README.md)** to learn more.
@@ -107,6 +108,10 @@ curl http://localhost:8080/v1/completions \
 }'
 ```
 
+### More than QuickStart
+
+If you want to learn more about this project, please refer to [develop.md](./docs/develop.md).
+
 ## Roadmap
 
 - Gateway support for traffic routing
@@ -115,17 +120,12 @@ curl http://localhost:8080/v1/completions \
 - CLI tool support
 - Model training, fine tuning in the long-term
 
-## Project Structures
-
-```structure
-llmaz # root
-├── llmaz # where the model loader logic locates
-├── pkg # where the main logic for Kubernetes controllers locates
-```
 
 ## Contributions
 
-🚀 All kinds of contributions are welcomed ! Please follow [Contributing](./CONTRIBUTING.md). Thanks to all these contributors.
+🚀 All kinds of contributions are welcomed ! Please follow [CONTRIBUTING.md](./CONTRIBUTING.md).
+
+**🎉 Thanks to all these contributors !**
 
 <a href="https://github.com/inftyai/llmaz/graphs/contributors">
   <img src="https://contrib.rocks/image?repo=inftyai/llmaz" />
diff --git a/docs/assets/arch.png b/docs/assets/arch.png
diff --git a/docs/develop.md b/docs/develop.md
@@ -0,0 +1,23 @@
+# Develop Guidance
+
+A develop guidance for people who want to learn more about this project.
+
+## Project Structure
+
+```structure
+llmaz # root
+├── llmaz # where the model loader logic locates
+├── pkg # where the main logic for Kubernetes controllers locates
+```
+
+## API design
+
+### Core APIs
+
+**OpenModel**: `OpenModel` is mostly like to store the open sourced models as a cluster-scope object. We may need namespaced models in the future for tenant isolation. Usually, the cloud provider or model provider should set this object because they know models well, like the accelerators or the scaling primitives.
+
+### Inference APIs
+
+**Playground**: `Playground` is for easy usage, people who has little knowledge about cloud can quick deploy a large language model with minimal configurations. `Playground` is integrated with the SOTA inference engines already, like vLLM.
+
+**Service**: `Service` is the real inference workload, people has advanced configuration requirements can deploy with `Service` directly if `Playground` can not meet their demands like they have a customized inference engine, which hasn't been integrated with llmaz yet. Or they have different topology requirements to align with the Pods.