diff --git a/README.md b/README.md index b3950bf2..ac97c934 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,7 @@

- llmaz + llmaz

@@ -20,11 +20,11 @@ Easy, advanced inference platform for large language models on Kubernetes > 🌱 llmaz is alpha now, so API may change before graduating to Beta. -## Concept +## Architecture -![image](./docs/assets/overview.png) +![image](./docs/assets/arch.png) -## Feature Overview +## Features Overview - **Easy of Use**: People can quick deploy a LLM service with minimal configurations. - **Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md). @@ -42,7 +42,7 @@ Read the [Installation](./docs/installation.md) for guidance. ### Deploy -Here's a simplest sample for deploying `facebook/opt-125m`, all you need to do +Here's a toy sample for deploying `facebook/opt-125m`, all you need to do is to apply a `Model` and a `Playground`. Please refer to **[examples](/docs/examples/README.md)** to learn more. @@ -107,6 +107,10 @@ curl http://localhost:8080/v1/completions \ }' ``` +### More than QuickStart + +If you want to learn more about this project, please refer to [develop.md](./docs/develop.md). + ## Roadmap - Gateway support for traffic routing @@ -115,17 +119,12 @@ curl http://localhost:8080/v1/completions \ - CLI tool support - Model training, fine tuning in the long-term -## Project Structures - -```structure -llmaz # root -├── llmaz # where the model loader logic locates -├── pkg # where the main logic for Kubernetes controllers locates -``` ## Contributions -🚀 All kinds of contributions are welcomed ! Please follow [Contributing](./CONTRIBUTING.md). Thanks to all these contributors. +🚀 All kinds of contributions are welcomed ! Please follow [CONTRIBUTING.md](./CONTRIBUTING.md). + +**🎉 Thanks to all these contributors !** diff --git a/docs/assets/arch.png b/docs/assets/arch.png new file mode 100644 index 00000000..2dff5e32 Binary files /dev/null and b/docs/assets/arch.png differ diff --git a/docs/develop.md b/docs/develop.md new file mode 100644 index 00000000..70f95082 --- /dev/null +++ b/docs/develop.md @@ -0,0 +1,23 @@ +# Develop Guidance + +A develop guidance for people who want to learn more about this project. + +## Project Structure + +```structure +llmaz # root +├── llmaz # where the model loader logic locates +├── pkg # where the main logic for Kubernetes controllers locates +``` + +## API design + +### Core APIs + +**OpenModel**: `OpenModel` is mostly like to store the open sourced models as a cluster-scope object. We may need namespaced models in the future for tenant isolation. Usually, the cloud provider or model provider should set this object because they know models well, like the accelerators or the scaling primitives. + +### Inference APIs + +**Playground**: `Playground` is for easy usage, people who has little knowledge about cloud can quick deploy a large language model with minimal configurations. `Playground` is integrated with the SOTA inference engines already, like vLLM. + +**Service**: `Service` is the real inference workload, people has advanced configuration requirements can deploy with `Service` directly if `Playground` can not meet their demands like they have a customized inference engine, which hasn't been integrated with llmaz yet. Or they have different topology requirements to align with the Pods.