|
| 1 | +--- |
| 2 | +title: Karpenter |
| 3 | +weight: 2 |
| 4 | +--- |
| 5 | + |
| 6 | +[Karpenter](https://github.com/kubernetes-sigs/karpenter) automatically launches just the right compute resources to handle your cluster's applications, but it is built to adhere to the scheduling decisions of kube-scheduler, so it's certainly possible we would run across some cases where Karpenter makes incorrect decisions when the InftyAI scheduler is in the mix. |
| 7 | + |
| 8 | +We forked the Karpenter project and re-complie the karpenter image for cloud providers like AWS, and you can find the details in [this proposal](https://github.com/InftyAI/llmaz/blob/main/docs/proposals/106-spot-instance-karpenter/README.md). This document provides deployment steps to install and configure Customized Karpenter in an EKS cluster. |
| 9 | + |
| 10 | +## How to use |
| 11 | + |
| 12 | +### Set environment variables |
| 13 | + |
| 14 | +```shell |
| 15 | +export KARPENTER_NAMESPACE="kube-system" |
| 16 | +export KARPENTER_VERSION="1.5.0" |
| 17 | +export K8S_VERSION="1.32" |
| 18 | + |
| 19 | +export AWS_PARTITION="aws" # if you are not using standard partitions, you may need to configure to aws-cn / aws-us-gov |
| 20 | +export CLUSTER_NAME="${USER}-karpenter-demo" |
| 21 | +export AWS_DEFAULT_REGION="us-west-2" |
| 22 | +export AWS_ACCOUNT_ID="$(aws sts get-caller-identity --query Account --output text)" |
| 23 | +export TEMPOUT="$(mktemp)" |
| 24 | +export ALIAS_VERSION="$(aws ssm get-parameter --name "/aws/service/eks/optimized-ami/${K8S_VERSION}/amazon-linux-2023/x86_64/standard/recommended/image_id" --query Parameter.Value | xargs aws ec2 describe-images --query 'Images[0].Name' --image-ids | sed -r 's/^.*(v[[:digit:]]+).*$/\1/')" |
| 25 | +``` |
| 26 | + |
| 27 | +If you open a new shell to run steps in this procedure, you need to set some or all of the environment variables again. To remind yourself of these values, type: |
| 28 | + |
| 29 | +```shell |
| 30 | +echo "${KARPENTER_NAMESPACE}" "${KARPENTER_VERSION}" "${K8S_VERSION}" "${CLUSTER_NAME}" "${AWS_DEFAULT_REGION}" "${AWS_ACCOUNT_ID}" "${TEMPOUT}" "${ALIAS_VERSION}" |
| 31 | +``` |
| 32 | + |
| 33 | +### Create a cluster and add Karpenter |
| 34 | + |
| 35 | +Please refer to the [Getting Started with Karpenter](https://docs.aws.amazon.com/eks/latest/userguide/getting-started-eksctl.html) to create a cluster and add Karpenter. |
| 36 | + |
| 37 | +### Install the gpu operator |
| 38 | + |
| 39 | +```shell |
| 40 | +helm repo add nvidia https://helm.ngc.nvidia.com/nvidia \ |
| 41 | + && helm repo update |
| 42 | +helm install --wait --generate-name \ |
| 43 | + -n gpu-operator --create-namespace \ |
| 44 | + nvidia/gpu-operator \ |
| 45 | + --version=v25.3.0 |
| 46 | +``` |
| 47 | + |
| 48 | +### Install llmaz with InftyAI scheduler enabled |
| 49 | + |
| 50 | +Please refer to [heterogeneous cluster support](../features/heterogeneous-cluster-support.md). |
| 51 | + |
| 52 | +### Configure Karpenter with customized image |
| 53 | + |
| 54 | +We need to assign the `karpenter-core-llmaz` cluster role to the `karpenter` service account and update the karpenter image to the customized one. |
| 55 | + |
| 56 | +```shell |
| 57 | +cat <<EOF | envsubst | kubectl apply -f - |
| 58 | +apiVersion: rbac.authorization.k8s.io/v1 |
| 59 | +kind: ClusterRoleBinding |
| 60 | +metadata: |
| 61 | + name: karpenter-core-llmaz |
| 62 | +roleRef: |
| 63 | + apiGroup: rbac.authorization.k8s.io |
| 64 | + kind: ClusterRole |
| 65 | + name: karpenter-core-llmaz |
| 66 | +subjects: |
| 67 | +- kind: ServiceAccount |
| 68 | + name: karpenter |
| 69 | + namespace: ${KARPENTER_NAMESPACE} |
| 70 | +--- |
| 71 | +apiVersion: rbac.authorization.k8s.io/v1 |
| 72 | +kind: ClusterRole |
| 73 | +metadata: |
| 74 | + name: karpenter-core-llmaz |
| 75 | +rules: |
| 76 | +- apiGroups: ["llmaz.io"] |
| 77 | + resources: ["openmodels"] |
| 78 | + verbs: ["get", "list", "watch"] |
| 79 | +EOF |
| 80 | + |
| 81 | +helm upgrade --install karpenter oci://public.ecr.aws/karpenter/karpenter --version "${KARPENTER_VERSION}" --namespace "${KARPENTER_NAMESPACE}" --create-namespace \ |
| 82 | + --set "settings.clusterName=${CLUSTER_NAME}" \ |
| 83 | + --set "settings.interruptionQueue=${CLUSTER_NAME}" \ |
| 84 | + --set controller.resources.requests.cpu=1 \ |
| 85 | + --set controller.resources.requests.memory=1Gi \ |
| 86 | + --set controller.resources.limits.cpu=1 \ |
| 87 | + --set controller.resources.limits.memory=1Gi \ |
| 88 | + --wait \ |
| 89 | + --set controller.image.repository=inftyai/aws-karpenter \ |
| 90 | + --set "controller.image.tag=${KARPENTER_VERSION}" \ |
| 91 | + --set controller.image.digest="" |
| 92 | +``` |
| 93 | + |
| 94 | +## Basic Example |
| 95 | + |
| 96 | +1. Create a gpu node pool |
| 97 | + |
| 98 | +```shell |
| 99 | +cat <<EOF | envsubst | kubectl apply -f - |
| 100 | +apiVersion: karpenter.k8s.aws/v1 |
| 101 | +kind: EC2NodeClass |
| 102 | +metadata: |
| 103 | + name: llmaz-demo # you can change the name to a more meaningful one, please align with the node pool's nodeClassRef. |
| 104 | +spec: |
| 105 | + amiSelectorTerms: |
| 106 | + - alias: al2023@${ALIAS_VERSION} |
| 107 | + blockDeviceMappings: |
| 108 | + # the default volume size of the selected AMI is 20Gi, it is not enough for kubelet to pull |
| 109 | + # the images and run the workloads. So we need to map a larger volume to the root device. |
| 110 | + # You can change the volume size to a larger value according to your actual needs. |
| 111 | + - deviceName: /dev/xvda |
| 112 | + ebs: |
| 113 | + deleteOnTermination: true |
| 114 | + volumeSize: 50Gi |
| 115 | + volumeType: gp3 |
| 116 | + role: KarpenterNodeRole-${CLUSTER_NAME} # replace with your cluster name |
| 117 | + securityGroupSelectorTerms: |
| 118 | + - tags: |
| 119 | + karpenter.sh/discovery: ${CLUSTER_NAME} # replace with your cluster name |
| 120 | + subnetSelectorTerms: |
| 121 | + - tags: |
| 122 | + karpenter.sh/discovery: ${CLUSTER_NAME} # replace with your cluster name |
| 123 | +--- |
| 124 | +apiVersion: karpenter.sh/v1 |
| 125 | +kind: NodePool |
| 126 | +metadata: |
| 127 | + name: llmaz-demo-gpu-nodepool # you can change the name to a more meaningful one. |
| 128 | +spec: |
| 129 | + disruption: |
| 130 | + budgets: |
| 131 | + - nodes: 10% |
| 132 | + consolidateAfter: 5m |
| 133 | + consolidationPolicy: WhenEmptyOrUnderutilized |
| 134 | + limits: # You can change the limits to match your actual needs. |
| 135 | + cpu: 1000 |
| 136 | + template: |
| 137 | + spec: |
| 138 | + expireAfter: 720h |
| 139 | + nodeClassRef: |
| 140 | + group: karpenter.k8s.aws |
| 141 | + kind: EC2NodeClass |
| 142 | + name: llmaz-demo |
| 143 | + requirements: |
| 144 | + - key: kubernetes.io/arch |
| 145 | + operator: In |
| 146 | + values: |
| 147 | + - amd64 |
| 148 | + - key: kubernetes.io/os |
| 149 | + operator: In |
| 150 | + values: |
| 151 | + - linux |
| 152 | + - key: karpenter.sh/capacity-type |
| 153 | + operator: In |
| 154 | + values: |
| 155 | + - spot |
| 156 | + - key: karpenter.k8s.aws/instance-family |
| 157 | + operator: In |
| 158 | + values: # replace with your instance-family with gpu supported |
| 159 | + - g4dn |
| 160 | + - g5g |
| 161 | + taints: |
| 162 | + - effect: NoSchedule |
| 163 | + key: nvidia.com/gpu |
| 164 | + value: "true" |
| 165 | +``` |
| 166 | +
|
| 167 | +2. Deploy a model with flavors |
| 168 | +
|
| 169 | +```shell |
| 170 | +cat <<EOF | kubectl apply -f - |
| 171 | +apiVersion: llmaz.io/v1alpha1 |
| 172 | +kind: OpenModel |
| 173 | +metadata: |
| 174 | + name: qwen2-0--5b |
| 175 | +spec: |
| 176 | + familyName: qwen2 |
| 177 | + source: |
| 178 | + modelHub: |
| 179 | + modelID: Qwen/Qwen2-0.5B-Instruct |
| 180 | + inferenceConfig: |
| 181 | + flavors: |
| 182 | + # The g5g instance family in the aws cloud can provide the t4g GPU type. |
| 183 | + # we define the instance family in the node pool like llmaz-demo-gpu-nodepool. |
| 184 | + - name: t4g |
| 185 | + limits: |
| 186 | + nvidia.com/gpu: 1 |
| 187 | + # The flavorName is not recongnized by the Karpenter, so we need to specify the |
| 188 | + # instance-gpu-name via nodeSelector to match the t4g GPU type when node is provisioned |
| 189 | + # by Karpenter from multiple node pools. |
| 190 | + # |
| 191 | + # When you only have a single node pool to provision the GPU instance and the node pool |
| 192 | + # only has one GPU type, it is okay to not specify the nodeSelector. But in practice, |
| 193 | + # it is better to specify the nodeSelector to make the provisioned node more predictable. |
| 194 | + # |
| 195 | + # The available node labels for selecting the target GPU device is listed below: |
| 196 | + # karpenter.k8s.aws/instance-gpu-count |
| 197 | + # karpenter.k8s.aws/instance-gpu-manufacturer |
| 198 | + # karpenter.k8s.aws/instance-gpu-memory |
| 199 | + # karpenter.k8s.aws/instance-gpu-name |
| 200 | + nodeSelector: |
| 201 | + karpenter.k8s.aws/instance-gpu-name: t4g |
| 202 | + # The g4dn instance family in the aws cloud can provide the t4 GPU type. |
| 203 | + # we define the instance family in the node pool like llmaz-demo-gpu-nodepool. |
| 204 | + - name: t4 |
| 205 | + limits: |
| 206 | + nvidia.com/gpu: 1 |
| 207 | + # The flavorName is not recongnized by the Karpenter, so we need to specify the |
| 208 | + # instance-gpu-name via nodeSelector to match the t4 GPU type when node is provisioned |
| 209 | + # by Karpenter from multiple node pools. |
| 210 | + # |
| 211 | + # When you only have a single node pool to provision the GPU instance and the node pool |
| 212 | + # only has one GPU type, it is okay to not specify the nodeSelector. But in practice, |
| 213 | + # it is better to specify the nodeSelector to make the provisioned node more predictable. |
| 214 | + # |
| 215 | + # The available node labels for selecting the target GPU device is listed below: |
| 216 | + # karpenter.k8s.aws/instance-gpu-count |
| 217 | + # karpenter.k8s.aws/instance-gpu-manufacturer |
| 218 | + # karpenter.k8s.aws/instance-gpu-memory |
| 219 | + # karpenter.k8s.aws/instance-gpu-name |
| 220 | + nodeSelector: |
| 221 | + karpenter.k8s.aws/instance-gpu-name: t4 |
| 222 | +--- |
| 223 | +# Currently, the Playground resource type does not support to configure tolerations |
| 224 | +# for the generated pods. But luckily, when a pod with the `nvidia.com/gpu` resource |
| 225 | +# is created on the eks cluster, the generated pod will be tweaked with the following |
| 226 | +# tolerations: |
| 227 | +# - effect: NoExecute |
| 228 | +# key: node.kubernetes.io/not-ready |
| 229 | +# operator: Exists |
| 230 | +# tolerationSeconds: 300 |
| 231 | +# - effect: NoExecute |
| 232 | +# key: node.kubernetes.io/unreachable |
| 233 | +# operator: Exists |
| 234 | +# tolerationSeconds: 300 |
| 235 | +# - effect: NoSchedule |
| 236 | +# key: nvidia.com/gpu |
| 237 | +# operator: Exists |
| 238 | +apiVersion: inference.llmaz.io/v1alpha1 |
| 239 | +kind: Playground |
| 240 | +metadata: |
| 241 | + labels: |
| 242 | + llmaz.io/model-name: qwen2-0--5b |
| 243 | + name: qwen2-0--5b |
| 244 | +spec: |
| 245 | + backendRuntimeConfig: |
| 246 | + backendName: tgi |
| 247 | + # Due to the limitation of our aws account, we have to decrease the resources to match |
| 248 | + # the avaliable instance type which is g4dn.xlarge. If your account has no such limitation, |
| 249 | + # you can remove the custom resources settings below. |
| 250 | + resources: |
| 251 | + limits: |
| 252 | + cpu: "2" |
| 253 | + memory: 4Gi |
| 254 | + requests: |
| 255 | + cpu: "2" |
| 256 | + memory: 4Gi |
| 257 | + modelClaim: |
| 258 | + modelName: qwen2-0--5b |
| 259 | + replicas: 1 |
| 260 | +EOF |
| 261 | +``` |
0 commit comments