Skip to content

Commit 38538a8

Browse files
shaloulcychuanyun.lcy
andauthored
Proposal: add proposal for multi tree quota (#1902)
Signed-off-by: chuanyun.lcy <chuanyun.lcy@alibaba-inc.com> Co-authored-by: chuanyun.lcy <chuanyun.lcy@alibaba-inc.com>
1 parent adb8b81 commit 38538a8

File tree

2 files changed

+271
-0
lines changed

2 files changed

+271
-0
lines changed

docs/images/multi-quota-tree.jpg

148 KB
Loading
Lines changed: 271 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,271 @@
1+
---
2+
title: Multi Tree Elastic Quota
3+
authors:
4+
- "@shaloulcy"
5+
reviewers:
6+
- "@hormes"
7+
- "@buptcozy"
8+
- "@eahydra"
9+
creation-date: 2024-02-06
10+
last-updated: 2024-02-06
11+
status: provisional
12+
---
13+
14+
# Multi Tree Elastic Quota
15+
16+
## Table of Contents
17+
18+
- [Multi Tree Elastic Quota](#multi-tree-elastic-quota)
19+
- [Table of Contents](#table-of-contents)
20+
- [Summary](#summary)
21+
- [Motivation](#motivation)
22+
- [Goals](#goals)
23+
- [Non-Goals/Future Work](#non-goalsfuture-work)
24+
- [Proposal](#proposal)
25+
- [User Stories](#user-stories)
26+
- [Story 1](#story-1)
27+
- [Story 2](#story-2)
28+
- [Architecture](#architecture)
29+
- [Implementation Details](#implementation-details)
30+
- [API](#api)
31+
- [Koord-manager](#koord-manager)
32+
- [Koord-scheduler](#koord-scheduler)
33+
- [Implementation History](#implementation-history)
34+
35+
36+
## Summary
37+
38+
39+
The proposal provides a mechanism to construct multiple trees for elastic quota. Different quota tree can include different nodes. It will help the cluster administrators to manage the node resource more granularly
40+
41+
42+
## Motivation
43+
44+
[Elastic Quota](https://github.com/koordinator-sh/koordinator/blob/main/docs/proposals/scheduling/20220722-multi-hierarchy-elastic-quota-management.md) provides a balance between enforcing limits to prevent resource exhaustion and allowing for efficient use of resources. Currently, there is only a quota tree in cluster. The root quota is `koordinator-root-quota`, all other quotas is the descendants of the root quota. In addition, all nodes resources belong to the only quota tree.
45+
46+
But the kubernetes products(ACK/EKS/GKE) from cloud providers usually provide users with the concept of node pools to support users in managing different workloads. It means than the users can create multiple node pools to deploy different workloads and the workloads can't spread cross node pools. But the elastic quota tree includes all nodes, it's not accurate.
47+
48+
In this case, we should build a single quota tree for every node pool.
49+
50+
### Goals
51+
52+
- Define a API to construct multiple quota trees
53+
- Enhance the elastic quota and help the cluster administrators to manage the node resource more granularly
54+
55+
### Non-Goals/Future Work
56+
57+
- Improve the cluster resource usage
58+
59+
## Proposal
60+
61+
62+
### User Stories
63+
64+
#### Story 1
65+
66+
cluster administrator devides nodes based on the node cpu architecture. There are two node pools: amd64 node pools and arm64 node pools. The amd64 pools include amd64 nodes, annother pool include arm64 nodes. We construct two quota trees based the node pools.
67+
68+
69+
The user creates child quota as the children of amd64 quota tree, and associate pods with it. The pods will be scheduled to the amd64 nodes.
70+
71+
72+
The resource can be shared in a single quota tree, but not shared between different trees
73+
74+
75+
#### Story 2
76+
In certain AI scenarios, to reduce network latency, we forbid deploy jobs across availability zones. We should construct multi quota trees according to the availability zones. Every availability zone has a quota tree which include the same availability zone nodes.
77+
78+
79+
### Architecture
80+
81+
82+
83+
![multi-quota-tree](../../images/multi-quota-tree.jpg)
84+
85+
86+
87+
### Implementation Details
88+
89+
90+
#### API
91+
92+
##### ElasticQuotaProfile
93+
94+
A Custom Resource Definition (CRD) named `ElasticQuotaProfile` is proposed to describe a quota tree
95+
96+
```go
97+
type ElasticQuotaProfile struct {
98+
metav1.TypeMeta `json:",inline"`
99+
metav1.ObjectMeta `json:"metadata,omitempty"`
100+
101+
Spec ElasticQuotaProfileSpec `json:"spec,omitempty"`
102+
Status ElasticQuotaProfileStatus `json:"status,omitempty"`
103+
}
104+
105+
type ElasticQuotaProfileSpec struct {
106+
// QuotaName defines the associated quota name of the profile.
107+
// +required
108+
QuotaName string `json:"quotaName"`
109+
// QuotaLabels defines the labels of the quota.
110+
QuotaLabels map[string]string `json:"quotaLabels,omitempty"`
111+
// ResourceRatio is a ratio, we will use it to fix the resource fragmentation problem.
112+
// If the total resource is 100 and the resource ratio is 0.9, the allocable resource is 100*0.9=90
113+
ResourceRatio *string `json:"resourceRatio,omitempty"`
114+
// NodeSelector defines a node selector to select nodes.
115+
// +required
116+
NodeSelector *metav1.LabelSelector `json:"nodeSelector"`
117+
}
118+
119+
type ElasticQuotaProfileStatus struct {
120+
}
121+
```
122+
123+
124+
* `NodeSelector`: define a label selector to select particular nodes. The match nodes belong to the quota tree
125+
* `QuotaName`: define the root quota name of the quota tree. The operator will create the root quota if not existed.
126+
* `QuotaLabels`: define the labels which will be updated to root quota
127+
* `ResourceRatio`: control the total resource reported. If the total resource is 100 and the resource ratio is 0.9, the allocable resource is 100*0.9=90
128+
129+
130+
##### ElasticQuota
131+
132+
We addon some annotations for elastic quota to construct multiple quota trees
133+
134+
* `quota.scheduling.koordinator.sh/tree-id`: every quota tree has a unique tree id. The elastic quotas in the same tree have the same tree id. Users don't need add the tree-id annotation when create elastic quota. Indeed the webhook will inject tree-id if the quota is the descendants of root quota.
135+
136+
* `quota.scheduling.koordinator.sh/is-root`: If the quota is the root of the quota tree, the controller will add the annotation which means the quota is the root of quota tree
137+
138+
139+
#### Koord-manager
140+
141+
##### elastic-quota-profile-controller
142+
143+
add a controller named elastic-quota-profile-controller. The controller watch the elastic-quota-profile. It will do some works:
144+
145+
* create the root quota if the quota not existed and generate the tree id
146+
* total resource summary: it will sum the resource of matched nodes periodically, and updated it to the root quota as an annotation `quota.scheduling.koordinator.sh/total-resource`
147+
148+
149+
example
150+
151+
```yaml
152+
apiVersion: quota.koordinator.sh/v1alpha1
153+
kind: ElasticQuotaProfile
154+
metadata:
155+
labels:
156+
kubernetes.io/arch: amd64
157+
topology.kubernetes.io/region: cn-hangzhou
158+
topology.kubernetes.io/zone: cn-hangzhou-k
159+
name: cn-hangzhou-k-amd64-standard
160+
namespace: kube-system
161+
spec:
162+
nodeSelector:
163+
matchLabels:
164+
kubernetes.io/arch: amd64
165+
topology.kubernetes.io/region: cn-hangzhou
166+
topology.kubernetes.io/zone: cn-hangzhou-k
167+
quotaLabels:
168+
kubernetes.io/arch: amd64
169+
quota.scheduling.koordinator.sh/is-parent: "true"
170+
topology.kubernetes.io/region: cn-hangzhou
171+
topology.kubernetes.io/zone: cn-hangzhou-k
172+
quotaName: cn-hangzhou-k-amd64-root-quota
173+
resourceRatio: "0.9"
174+
```
175+
176+
the controller will create the root quota of the quota tree
177+
178+
```yaml
179+
apiVersion: scheduling.sigs.k8s.io/v1alpha1
180+
kind: ElasticQuota
181+
metadata:
182+
annotations:
183+
quota.scheduling.koordinator.sh/total-resource: '{"cpu":"276480m","ephemeral-storage":"5257857641983","memory":"1894530432614","pods":"360"}'
184+
labels:
185+
kubernetes.io/arch: amd64
186+
quota.scheduling.koordinator.sh/is-parent: "true"
187+
quota.scheduling.koordinator.sh/is-root: "true"
188+
quota.scheduling.koordinator.sh/parent: koordinator-root-quota
189+
quota.scheduling.koordinator.sh/profile: cn-hangzhou-k-amd64
190+
quota.scheduling.koordinator.sh/tree-id: "9066322268974913314"
191+
topology.kubernetes.io/region: cn-hangzhou
192+
topology.kubernetes.io/zone: cn-hangzhou-k
193+
name: cn-hangzhou-k-amd64-root-quota
194+
namespace: kube-system
195+
spec:
196+
max:
197+
cpu: "4611686018427387"
198+
memory: "4611686018427387"
199+
min:
200+
cpu: 276480m
201+
memory: "1894530432614"
202+
```
203+
204+
##### webhook
205+
206+
webhook will add some mutating works for pods and elastic quotas
207+
208+
* `pod node affinity`: as the elastic quota profile selects particular nodes, if the user associates pods with the quota in the tree, we should limit the pods to be scheduled to matched nodes. So the webhook will inject corresponding node affinity into the pods.
209+
210+
* `elastic quota tree id`: if the user create a elastic quota as the descendants of the quota tree, The webhook will inject corresponding tree id into the quota. In addition, the webhook will forbid change the quota tree id to reduce complexity.
211+
212+
#### Koord-scheduler
213+
214+
##### elastic quota plugin
215+
216+
The elastic quota plugin handles the elastic quotas. The `groupQuotaManager` will handle all elastic quotas and calculate runtimeQuota.
217+
218+
219+
```go
220+
type Plugin struct {
221+
handle framework.Handle
222+
client versioned.Interface
223+
pluginArgs *config.ElasticQuotaArgs
224+
quotaLister v1alpha1.ElasticQuotaLister
225+
quotaInformer cache.SharedIndexInformer
226+
podLister v1.PodLister
227+
pdbLister policylisters.PodDisruptionBudgetLister
228+
nodeLister v1.NodeLister
229+
groupQuotaManager *core.GroupQuotaManager
230+
}
231+
```
232+
233+
To implement multi quota trees. We will construct a `groupQuotaManager` for every quota tree.
234+
235+
add two fileds
236+
237+
* `groupQuotaManagersForQuotaTree`: store multiple GroupQuotaManagers. The key is the tree id, value is corresponding groupQuotaManager
238+
* `quotaToTreeMap`: store the mapping of quota and tree. The key is the quota name, value is the tree id
239+
240+
241+
242+
```go
243+
type Plugin struct {
244+
...
245+
246+
quotaManagerLock sync.RWMutex
247+
// groupQuotaManagersForQuotaTree store the GroupQuotaManager of all quota trees. The key is the quota tree id
248+
groupQuotaManagersForQuotaTree map[string]*core.GroupQuotaManager
249+
250+
quotaToTreeMapLock sync.RWMutex
251+
// quotaToTreeMap store the relationship of quota and quota tree
252+
// the key is the quota name, the value is the tree id
253+
quotaToTreeMap map[string]string
254+
}
255+
```
256+
257+
##### runtime quota calculate
258+
259+
Previously, all node resources are summed to calculate runtime quota. The plugin will watch the node and update the delta resource.
260+
261+
262+
When we construct multi quota trees, we can get the total resource from the quota root annotation `quota.scheduling.koordinator.sh/total-resource` which updated by elastic-quota-profile-controller
263+
264+
265+
266+
267+
268+
## Implementation History
269+
270+
- [ ] 2024-02-06: Open proposal PR
271+

0 commit comments

Comments
 (0)