Skip to content

Commit 09332c8

Browse files
authored
Add Karpenter integration docs (#448)
Signed-off-by: carlory <[email protected]>
1 parent 9a3167e commit 09332c8

File tree

9 files changed

+296
-7
lines changed

9 files changed

+296
-7
lines changed
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
title: Features
3+
weight: 2
4+
description: >
5+
This section contains the advanced features of llmaz.
6+
---
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
title: Heterogeneous Cluster Support
3+
weight: 1
4+
---
5+
6+
A `llama2-7B` model can be running on __1xA100__ GPU, also on __1xA10__ GPU, even on __1x4090__ and a variety of other types of GPUs as well, that's what we called resource fungibility. In practical scenarios, we may have a heterogeneous cluster with different GPU types, and high-end GPUs will stock out a lot, to meet the SLOs of the service as well as the cost, we need to schedule the workloads on different GPU types. With the [ResourceFungibility](https://github.com/InftyAI/scheduler-plugins/blob/main/pkg/plugins/resource_fungibility) in the InftyAI scheduler, we can simply achieve this with at most 8 alternative GPU types.
7+
8+
## How to use
9+
10+
### Enable InftyAI scheduler
11+
12+
Edit the `values.global.yaml` file to modify the following values:
13+
14+
```yaml
15+
kube-scheduler:
16+
enabled: true
17+
18+
globalConfig:
19+
configData: |-
20+
scheduler-name: inftyai-scheduler
21+
```
22+
23+
then run `make helm-upgrade` to install or upgrade llmaz.
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Getting Started
3-
weight: 2
3+
weight: 1
44
description: >
55
This section contains the tutorials for llmaz.
66
---

site/content/en/docs/getting-started/installation.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,7 +38,6 @@ If you want to change the default configurations, please change the values in [v
3838

3939
**Do not change** the values in _values.yaml_ because it's auto-generated and will be overwritten.
4040

41-
4241
### Install
4342

4443
```cmd
@@ -70,4 +69,4 @@ Once you changed your code, run the command to upgrade the controller:
7069

7170
```cmd
7271
IMG=<image-registry>:<tag> make helm-upgrade
73-
```
72+
```
Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Integrations
3-
weight: 2
3+
weight: 3
44
description: >
55
This section contains the llmaz integration information.
66
---
Lines changed: 261 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,261 @@
1+
---
2+
title: Karpenter
3+
weight: 2
4+
---
5+
6+
[Karpenter](https://github.com/kubernetes-sigs/karpenter) automatically launches just the right compute resources to handle your cluster's applications, but it is built to adhere to the scheduling decisions of kube-scheduler, so it's certainly possible we would run across some cases where Karpenter makes incorrect decisions when the InftyAI scheduler is in the mix.
7+
8+
We forked the Karpenter project and re-complie the karpenter image for cloud providers like AWS, and you can find the details in [this proposal](https://github.com/InftyAI/llmaz/blob/main/docs/proposals/106-spot-instance-karpenter/README.md). This document provides deployment steps to install and configure Customized Karpenter in an EKS cluster.
9+
10+
## How to use
11+
12+
### Set environment variables
13+
14+
```shell
15+
export KARPENTER_NAMESPACE="kube-system"
16+
export KARPENTER_VERSION="1.5.0"
17+
export K8S_VERSION="1.32"
18+
19+
export AWS_PARTITION="aws" # if you are not using standard partitions, you may need to configure to aws-cn / aws-us-gov
20+
export CLUSTER_NAME="${USER}-karpenter-demo"
21+
export AWS_DEFAULT_REGION="us-west-2"
22+
export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)"
23+
export TEMPOUT="$(mktemp)"
24+
export ALIAS_VERSION="$(aws ssm get-parameter --name "/aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2023/x86_64/standard/recommended/image_id" --query Parameter.Value | xargs aws ec2 describe-images --query 'Images[0].Name' --image-ids | sed -r 's/^.*(v[[:digit:]]+).*$/\1/')"
25+
```
26+
27+
If you open a new shell to run steps in this procedure, you need to set some or all of the environment variables again. To remind yourself of these values, type:
28+
29+
```shell
30+
echo "${KARPENTER_NAMESPACE}" "${KARPENTER_VERSION}" "${K8S_VERSION}" "${CLUSTER_NAME}" "${AWS_DEFAULT_REGION}" "${AWS_ACCOUNT_ID}" "${TEMPOUT}" "${ALIAS_VERSION}"
31+
```
32+
33+
### Create a cluster and add Karpenter
34+
35+
Please refer to the [Getting Started with Karpenter](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) to create a cluster and add Karpenter.
36+
37+
### Install the gpu operator
38+
39+
```shell
40+
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \
41+
&& helm repo update
42+
helm install --wait --generate-name \
43+
-n gpu-operator --create-namespace \
44+
nvidia/gpu-operator \
45+
--version=v25.3.0
46+
```
47+
48+
### Install llmaz with InftyAI scheduler enabled
49+
50+
Please refer to [heterogeneous cluster support](../features/heterogeneous-cluster-support.md).
51+
52+
### Configure Karpenter with customized image
53+
54+
We need to assign the `karpenter-core-llmaz` cluster role to the `karpenter` service account and update the karpenter image to the customized one.
55+
56+
```shell
57+
cat <<EOF | envsubst | kubectl apply -f -
58+
apiVersion: rbac.authorization.k8s.io/v1
59+
kind: ClusterRoleBinding
60+
metadata:
61+
name: karpenter-core-llmaz
62+
roleRef:
63+
apiGroup: rbac.authorization.k8s.io
64+
kind: ClusterRole
65+
name: karpenter-core-llmaz
66+
subjects:
67+
- kind: ServiceAccount
68+
name: karpenter
69+
namespace: ${KARPENTER_NAMESPACE}
70+
---
71+
apiVersion: rbac.authorization.k8s.io/v1
72+
kind: ClusterRole
73+
metadata:
74+
name: karpenter-core-llmaz
75+
rules:
76+
- apiGroups: ["llmaz.io"]
77+
resources: ["openmodels"]
78+
verbs: ["get", "list", "watch"]
79+
EOF
80+
81+
helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \
82+
--set "settings.clusterName=${CLUSTER_NAME}" \
83+
--set "settings.interruptionQueue=${CLUSTER_NAME}" \
84+
--set controller.resources.requests.cpu=1 \
85+
--set controller.resources.requests.memory=1Gi \
86+
--set controller.resources.limits.cpu=1 \
87+
--set controller.resources.limits.memory=1Gi \
88+
--wait \
89+
--set controller.image.repository=inftyai/aws-karpenter \
90+
--set "controller.image.tag=${KARPENTER_VERSION}" \
91+
--set controller.image.digest=""
92+
```
93+
94+
## Basic Example
95+
96+
1. Create a gpu node pool
97+
98+
```shell
99+
cat <<EOF | envsubst | kubectl apply -f -
100+
apiVersion: karpenter.k8s.aws/v1
101+
kind: EC2NodeClass
102+
metadata:
103+
name: llmaz-demo # you can change the name to a more meaningful one, please align with the node pool's nodeClassRef.
104+
spec:
105+
amiSelectorTerms:
106+
- alias: al2023@${ALIAS_VERSION}
107+
blockDeviceMappings:
108+
# the default volume size of the selected AMI is 20Gi, it is not enough for kubelet to pull
109+
# the images and run the workloads. So we need to map a larger volume to the root device.
110+
# You can change the volume size to a larger value according to your actual needs.
111+
- deviceName: /dev/xvda
112+
ebs:
113+
deleteOnTermination: true
114+
volumeSize: 50Gi
115+
volumeType: gp3
116+
role: KarpenterNodeRole-${CLUSTER_NAME} # replace with your cluster name
117+
securityGroupSelectorTerms:
118+
- tags:
119+
karpenter.sh/discovery: ${CLUSTER_NAME} # replace with your cluster name
120+
subnetSelectorTerms:
121+
- tags:
122+
karpenter.sh/discovery: ${CLUSTER_NAME} # replace with your cluster name
123+
---
124+
apiVersion: karpenter.sh/v1
125+
kind: NodePool
126+
metadata:
127+
name: llmaz-demo-gpu-nodepool # you can change the name to a more meaningful one.
128+
spec:
129+
disruption:
130+
budgets:
131+
- nodes: 10%
132+
consolidateAfter: 5m
133+
consolidationPolicy: WhenEmptyOrUnderutilized
134+
limits: # You can change the limits to match your actual needs.
135+
cpu: 1000
136+
template:
137+
spec:
138+
expireAfter: 720h
139+
nodeClassRef:
140+
group: karpenter.k8s.aws
141+
kind: EC2NodeClass
142+
name: llmaz-demo
143+
requirements:
144+
- key: kubernetes.io/arch
145+
operator: In
146+
values:
147+
- amd64
148+
- key: kubernetes.io/os
149+
operator: In
150+
values:
151+
- linux
152+
- key: karpenter.sh/capacity-type
153+
operator: In
154+
values:
155+
- spot
156+
- key: karpenter.k8s.aws/instance-family
157+
operator: In
158+
values: # replace with your instance-family with gpu supported
159+
- g4dn
160+
- g5g
161+
taints:
162+
- effect: NoSchedule
163+
key: nvidia.com/gpu
164+
value: "true"
165+
```
166+
167+
2. Deploy a model with flavors
168+
169+
```shell
170+
cat <<EOF | kubectl apply -f -
171+
apiVersion: llmaz.io/v1alpha1
172+
kind: OpenModel
173+
metadata:
174+
name: qwen2-0--5b
175+
spec:
176+
familyName: qwen2
177+
source:
178+
modelHub:
179+
modelID: Qwen/Qwen2-0.5B-Instruct
180+
inferenceConfig:
181+
flavors:
182+
# The g5g instance family in the aws cloud can provide the t4g GPU type.
183+
# we define the instance family in the node pool like llmaz-demo-gpu-nodepool.
184+
- name: t4g
185+
limits:
186+
nvidia.com/gpu: 1
187+
# The flavorName is not recongnized by the Karpenter, so we need to specify the
188+
# instance-gpu-name via nodeSelector to match the t4g GPU type when node is provisioned
189+
# by Karpenter from multiple node pools.
190+
#
191+
# When you only have a single node pool to provision the GPU instance and the node pool
192+
# only has one GPU type, it is okay to not specify the nodeSelector. But in practice,
193+
# it is better to specify the nodeSelector to make the provisioned node more predictable.
194+
#
195+
# The available node labels for selecting the target GPU device is listed below:
196+
# karpenter.k8s.aws/instance-gpu-count
197+
# karpenter.k8s.aws/instance-gpu-manufacturer
198+
# karpenter.k8s.aws/instance-gpu-memory
199+
# karpenter.k8s.aws/instance-gpu-name
200+
nodeSelector:
201+
karpenter.k8s.aws/instance-gpu-name: t4g
202+
# The g4dn instance family in the aws cloud can provide the t4 GPU type.
203+
# we define the instance family in the node pool like llmaz-demo-gpu-nodepool.
204+
- name: t4
205+
limits:
206+
nvidia.com/gpu: 1
207+
# The flavorName is not recongnized by the Karpenter, so we need to specify the
208+
# instance-gpu-name via nodeSelector to match the t4 GPU type when node is provisioned
209+
# by Karpenter from multiple node pools.
210+
#
211+
# When you only have a single node pool to provision the GPU instance and the node pool
212+
# only has one GPU type, it is okay to not specify the nodeSelector. But in practice,
213+
# it is better to specify the nodeSelector to make the provisioned node more predictable.
214+
#
215+
# The available node labels for selecting the target GPU device is listed below:
216+
# karpenter.k8s.aws/instance-gpu-count
217+
# karpenter.k8s.aws/instance-gpu-manufacturer
218+
# karpenter.k8s.aws/instance-gpu-memory
219+
# karpenter.k8s.aws/instance-gpu-name
220+
nodeSelector:
221+
karpenter.k8s.aws/instance-gpu-name: t4
222+
---
223+
# Currently, the Playground resource type does not support to configure tolerations
224+
# for the generated pods. But luckily, when a pod with the `nvidia.com/gpu` resource
225+
# is created on the eks cluster, the generated pod will be tweaked with the following
226+
# tolerations:
227+
# - effect: NoExecute
228+
# key: node.kubernetes.io/not-ready
229+
# operator: Exists
230+
# tolerationSeconds: 300
231+
# - effect: NoExecute
232+
# key: node.kubernetes.io/unreachable
233+
# operator: Exists
234+
# tolerationSeconds: 300
235+
# - effect: NoSchedule
236+
# key: nvidia.com/gpu
237+
# operator: Exists
238+
apiVersion: inference.llmaz.io/v1alpha1
239+
kind: Playground
240+
metadata:
241+
labels:
242+
llmaz.io/model-name: qwen2-0--5b
243+
name: qwen2-0--5b
244+
spec:
245+
backendRuntimeConfig:
246+
backendName: tgi
247+
# Due to the limitation of our aws account, we have to decrease the resources to match
248+
# the avaliable instance type which is g4dn.xlarge. If your account has no such limitation,
249+
# you can remove the custom resources settings below.
250+
resources:
251+
limits:
252+
cpu: "2"
253+
memory: 4Gi
254+
requests:
255+
cpu: "2"
256+
memory: 4Gi
257+
modelClaim:
258+
modelName: qwen2-0--5b
259+
replicas: 1
260+
EOF
261+
```

site/content/en/docs/integrations/open-webui.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Open-WebUI
3-
weight: 2
3+
weight: 3
44
---
55

66
[Open WebUI](https://github.com/open-webui/open-webui) is a user-friendly AI interface with OpenAI-compatible APIs, serving as the default chatbot for llmaz.

site/content/en/docs/integrations/prometheus-operator.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Prometheus Operator
3-
weight: 3
3+
weight: 4
44
---
55

66
This document provides deployment steps to install and configure Prometheus Operator in a Kubernetes cluster.

site/content/en/docs/integrations/support-backends.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
---
22
title: Supported Inference Backends
3-
weight: 4
3+
weight: 5
44
---
55

66
If you want to integrate more backends into llmaz, please refer to this [PR](https://github.com/InftyAI/llmaz/pull/182). It's always welcomed.

0 commit comments

Comments
 (0)