Skip to content

Commit f4e3e58

Browse files
authored
rework device sharing in volcano (volcano-sh#2643)
* Signed-off-by: limengxuan <391013634@qq.com> Rework device-sharing mechanism to volcano * Signed-off-by: limengxuan <391013634@qq.com> after review #1
1 parent e145aff commit f4e3e58

15 files changed

Lines changed: 774 additions & 512 deletions

File tree

docs/design/device-sharing.md

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# Sharing devices in volcano
2+
3+
## Introduction
4+
5+
We implement a common interface for shareable devices(GPU,NPU,FPGA,...) called Devices, and use it to reimplement current gpu-share mechanism. The goal is to let device-sharing easy to implement, and better organised. If you wish to grant vc-scheduler the ability to share another device, all you need is to implement these methods in Devices, and place your logic under pkg/scheduler/api/devices.
6+
7+
## Backguards
8+
9+
We intended to provide volcano the ability to share third-party resources link GPU,NPU,etc in the near future. At fitst, I tried to implement these logics based on predicate.gpushare, but i sooner realised that these logics scattered in device_info.go, node_info.go, pod_info.go, and whole predicate folder. if i follow the implementation of predicate.gpushare, i will have no choice but hack deeply into vc-scheduler api. Sooner or later vc-scheduler api will be crowded with various device-sharing logic, which is probably not what we wished.
10+
11+
## Implementation
12+
13+
### Interface Devices design
14+
15+
The design of Devices is shown below:
16+
17+
```
18+
type Devices interface {
19+
//following two functions used in node_info
20+
//AddResource is to add the corresponding device resource of this 'pod' into current scheduler cache
21+
AddResource(pod *v1.Pod)
22+
//SubResoure is to substract the corresponding device resource of this 'pod' from current scheduler cache
23+
SubResource(pod *v1.Pod)
24+
25+
//following four functions used in predicate
26+
//HasDeviceRequest checks if the 'pod' request this device
27+
HasDeviceRequest(pod *v1.Pod) bool
28+
//FiltreNode checks if the 'pod' fit in current node
29+
FilterNode(pod *v1.Pod) (bool, error)
30+
//Allocate action in predicate
31+
Allocate(kubeClient kubernetes.Interface, pod *v1.Pod) error
32+
//Release action in predicate
33+
Release(kubeClient kubernetes.Interface, pod *v1.Pod) error
34+
35+
//used for debug and monitor
36+
GetStatus() string
37+
}
38+
```
39+
40+
The first two method are used for node_info to update cluster status. The following four methods are used in predicate which allocatation and deallocation actually take place. Finally a monitor mothod for debug.
41+
42+
### Create a seperate package for gpushare related methods, and use Devices method to reimplement it.
43+
44+
There are two steps we need to do, first, we need to create a new package in "pkg/scheduler/api/devices/nvidia/gpushare", and implement Devices methods in it, then we need to seperate gpushare-related logic from "scheduler.api" and "predicate plugin", and convert them to package "pkg/scheduler/api/devices/nvidia/gpushare". The package contains the following files: device.go(which implement SharedDevicePool interface methods), share.go(which contains private methods for device.go), type.go(which contains const values and definations).
45+
46+
Details of methods mapping is shown in the table below:
47+
48+
| origin file | corresponding file(s) in new package |
49+
| ------------- | ------------- |
50+
| pkg/scheduler/api/node_info.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go, pkg/scheduler/api/devices/nvidia/gpushare/share.go |
51+
| pkg/scheduler/api/device_info.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go, pkg/scheduler/api/devices/nvidia/gpushare/share.go |
52+
| pkg/scheduler/api/pod_info.go | pkg/scheduler/api/devices/nvidia/gpushare/share.go |
53+
| pkg/scheduler/plugins/predicates/predicates.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go |
54+
| pkg/scheduler/plugins/predicates/gpu.go | pkg/scheduler/api/devices/nvidia/gpushare/share.go |
55+
56+
## How to add a new device-share policy
57+
58+
### 1. Define your device in /pkg/scheduler/api/shared_device_pool.go
59+
60+
Name your policy and put it in shared_device_pool.go as follows:
61+
62+
```
63+
const (
64+
GPUSharingDevice = "GpuShare"
65+
Your_new_sharing_policy = "xxxxx"
66+
)
67+
```
68+
69+
### 2. Create a new package in /pkg/scheduler/api/devices/"your device name"/"your policy name"
70+
71+
For example, if you try to implement a NPU share policy, then you are recommended to create a package in /pkg/scheduler/api/device/ascend/npushare
72+
73+
### 3. Implement methods of interface shared_device_pool, and put them in your new package
74+
75+
Note that, you can't to refer to any struct of methods in scheduler.api to avoid cycle importing. If there is anything in scheduler.api you *must* need, then you should modify the SharedDevicePool interface to pass it.
76+
The methods defined in SharedDevicePool interface and its information is shown in table below:
77+
78+
| interface | invoker file | information |
79+
| ------------- | ------------ | ------------- |
80+
| AddResource(pod *v1.Pod) | pkg/scheduler/api/node_info.go | Add the 'pod' and its resources into scheduler cache |
81+
| SubResource(pod *v1.Pod) | pkg/scheduler/api/node_info.go | Delete the 'pod' and substract its resources from scheduler cache |
82+
| HasDeviceRequest(pod *v1.Pod) bool | pkg/scheduler/plugins/predicates/predicate.go | Check whether this 'pod' request a portion of this device |
83+
| FilterNode(pod *v1.Pod)| pkg/scheduler/plugins/predicates/predicate.go | Check whether the portion of device this pod requests can fit in current node |
84+
| Allocate(kubeClient kubernetes.Interface, pod *v1.Pod) error | pkg/scheduler/plugins/predicates/predicate.go | Allocate the portion of this device from the current node to this pod |
85+
| Release(kubeClient kubernetes.Interface, pod *v1.Pod) error | pkg/scheduler/plugins/predicates/predicate.go | Dellocate the portion of this device from this pod |
86+
| GetStatus() string | none | Used for debug and monitor |
87+
88+
### 4. Add your initialization code in /pkg/scheduler/api/node_info.go
89+
90+
This is the *only* place you hack into scheduler.api ,which you have to register your policy during initialization of node_struct.
91+
92+
```
93+
94+
// setNodeOthersResource initialize sharable devices
95+
func (ni *NodeInfo) setNodeOthersResource(node *v1.Node) {
96+
ni.Others[GPUSharingDevice] = gpushare.NewGPUDevices(ni.Name, node)
97+
//ni.Others["your device sharing policy name"] = your device sharing package initialization method
98+
}
99+
100+
```
101+
102+
### 5. Check if your policy is enabled in /pkg/scheduler/plugins/predicate/predicates.go
103+
104+
This is the *only* plae you hack into predicates.go, when the scheduler checks if your policy is enabled in scheduler configuration.
105+
106+
predicates.go:
107+
108+
```
109+
...
110+
// Checks whether predicate.GPUSharingEnable is provided or not, if given, modifies the value in predicateEnable struct.
111+
args.GetBool(&gpushare.GpuSharingEnable, GPUSharingPredicate)
112+
args.GetBool(&gpushare.GpuNumberEnable, GPUNumberPredicate)
113+
args.GetBool(&gpushare.NodeLockEnable, NodeLockEnable)
114+
args.GetBool("your policy enable variable","your policy enable parameter")
115+
...
116+
```
117+
118+
119+
120+

go.mod

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@ require (
1313
github.com/mitchellh/mapstructure v1.5.0
1414
github.com/onsi/ginkgo/v2 v2.3.0
1515
github.com/onsi/gomega v1.21.1
16+
github.com/pkg/errors v0.9.1
1617
github.com/prometheus/client_golang v1.12.1
1718
github.com/prometheus/common v0.32.1
1819
github.com/spf13/cobra v1.4.0
@@ -72,7 +73,6 @@ require (
7273
github.com/munnerz/goautoneg v0.0.0-20191010083416-a7dc8b61c822 // indirect
7374
github.com/opencontainers/go-digest v1.0.0 // indirect
7475
github.com/opencontainers/selinux v1.10.0 // indirect
75-
github.com/pkg/errors v0.9.1 // indirect
7676
github.com/prometheus/client_model v0.2.0 // indirect
7777
github.com/prometheus/procfs v0.7.3 // indirect
7878
golang.org/x/mod v0.6.0-dev.0.20220419223038-86c51ed26bb4 // indirect

pkg/scheduler/api/device_info.go

Lines changed: 0 additions & 119 deletions
This file was deleted.

0 commit comments

Comments
 (0)