Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 12 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
<p align="center">
<picture>
<source media="(prefers-color-scheme: dark)" srcset="./docs/assets/logo.png">
<img alt="llmaz" src="https://github.com/InftyAI/llmaz/blob/main/docs/assets/logo.png" width=55%>
<img alt="llmaz" src="./docs/assets/logo.png" width=55%>
</picture>
</p>

Expand All @@ -20,11 +20,11 @@ Easy, advanced inference platform for large language models on Kubernetes

> 🌱 llmaz is alpha now, so API may change before graduating to Beta.

## Concept
## Architecture

![image](./docs/assets/overview.png)
![image](./docs/assets/arch.png)

## Feature Overview
## Features Overview

- **Easy of Use**: People can quick deploy a LLM service with minimal configurations.
- **Broad Backend Support**: llmaz supports a wide range of advanced inference backends for different scenarios, like [vLLM](https://github.com/vllm-project/vllm), [SGLang](https://github.com/sgl-project/sglang), [llama.cpp](https://github.com/ggerganov/llama.cpp). Find the full list of supported backends [here](./docs/support-backends.md).
Expand All @@ -42,7 +42,7 @@ Read the [Installation](./docs/installation.md) for guidance.

### Deploy

Here's a simplest sample for deploying `facebook/opt-125m`, all you need to do
Here's a toy sample for deploying `facebook/opt-125m`, all you need to do
is to apply a `Model` and a `Playground`.

Please refer to **[examples](/docs/examples/README.md)** to learn more.
Expand Down Expand Up @@ -107,6 +107,10 @@ curl http://localhost:8080/v1/completions \
}'
```

### More than QuickStart

If you want to learn more about this project, please refer to [develop.md](./docs/develop.md).

## Roadmap

- Gateway support for traffic routing
Expand All @@ -115,17 +119,12 @@ curl http://localhost:8080/v1/completions \
- CLI tool support
- Model training, fine tuning in the long-term

## Project Structures

```structure
llmaz # root
├── llmaz # where the model loader logic locates
├── pkg # where the main logic for Kubernetes controllers locates
```

## Contributions

🚀 All kinds of contributions are welcomed ! Please follow [Contributing](./CONTRIBUTING.md). Thanks to all these contributors.
🚀 All kinds of contributions are welcomed ! Please follow [CONTRIBUTING.md](./CONTRIBUTING.md).

**🎉 Thanks to all these contributors !**

<a href="https://github.com/inftyai/llmaz/graphs/contributors">
<img src="https://contrib.rocks/image?repo=inftyai/llmaz" />
Expand Down
Binary file added docs/assets/arch.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
23 changes: 23 additions & 0 deletions docs/develop.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Develop Guidance

A develop guidance for people who want to learn more about this project.

## Project Structure

```structure
llmaz # root
├── llmaz # where the model loader logic locates
├── pkg # where the main logic for Kubernetes controllers locates
```

## API design

### Core APIs

**OpenModel**: `OpenModel` is mostly like to store the open sourced models as a cluster-scope object. We may need namespaced models in the future for tenant isolation. Usually, the cloud provider or model provider should set this object because they know models well, like the accelerators or the scaling primitives.

### Inference APIs

**Playground**: `Playground` is for easy usage, people who has little knowledge about cloud can quick deploy a large language model with minimal configurations. `Playground` is integrated with the SOTA inference engines already, like vLLM.

**Service**: `Service` is the real inference workload, people has advanced configuration requirements can deploy with `Service` directly if `Playground` can not meet their demands like they have a customized inference engine, which hasn't been integrated with llmaz yet. Or they have different topology requirements to align with the Pods.