|
| 1 | +# Sharing devices in volcano |
| 2 | + |
| 3 | +## Introduction |
| 4 | + |
| 5 | +We implement a common interface for shareable devices(GPU,NPU,FPGA,...) called Devices, and use it to reimplement current gpu-share mechanism. The goal is to let device-sharing easy to implement, and better organised. If you wish to grant vc-scheduler the ability to share another device, all you need is to implement these methods in Devices, and place your logic under pkg/scheduler/api/devices. |
| 6 | + |
| 7 | +## Backguards |
| 8 | + |
| 9 | +We intended to provide volcano the ability to share third-party resources link GPU,NPU,etc in the near future. At fitst, I tried to implement these logics based on predicate.gpushare, but i sooner realised that these logics scattered in device_info.go, node_info.go, pod_info.go, and whole predicate folder. if i follow the implementation of predicate.gpushare, i will have no choice but hack deeply into vc-scheduler api. Sooner or later vc-scheduler api will be crowded with various device-sharing logic, which is probably not what we wished. |
| 10 | + |
| 11 | +## Implementation |
| 12 | + |
| 13 | +### Interface Devices design |
| 14 | + |
| 15 | +The design of Devices is shown below: |
| 16 | + |
| 17 | +``` |
| 18 | +type Devices interface { |
| 19 | + //following two functions used in node_info |
| 20 | + //AddResource is to add the corresponding device resource of this 'pod' into current scheduler cache |
| 21 | + AddResource(pod *v1.Pod) |
| 22 | + //SubResoure is to substract the corresponding device resource of this 'pod' from current scheduler cache |
| 23 | + SubResource(pod *v1.Pod) |
| 24 | +
|
| 25 | + //following four functions used in predicate |
| 26 | + //HasDeviceRequest checks if the 'pod' request this device |
| 27 | + HasDeviceRequest(pod *v1.Pod) bool |
| 28 | + //FiltreNode checks if the 'pod' fit in current node |
| 29 | + FilterNode(pod *v1.Pod) (bool, error) |
| 30 | + //Allocate action in predicate |
| 31 | + Allocate(kubeClient kubernetes.Interface, pod *v1.Pod) error |
| 32 | + //Release action in predicate |
| 33 | + Release(kubeClient kubernetes.Interface, pod *v1.Pod) error |
| 34 | +
|
| 35 | + //used for debug and monitor |
| 36 | + GetStatus() string |
| 37 | +} |
| 38 | +``` |
| 39 | + |
| 40 | +The first two method are used for node_info to update cluster status. The following four methods are used in predicate which allocatation and deallocation actually take place. Finally a monitor mothod for debug. |
| 41 | + |
| 42 | +### Create a seperate package for gpushare related methods, and use Devices method to reimplement it. |
| 43 | + |
| 44 | +There are two steps we need to do, first, we need to create a new package in "pkg/scheduler/api/devices/nvidia/gpushare", and implement Devices methods in it, then we need to seperate gpushare-related logic from "scheduler.api" and "predicate plugin", and convert them to package "pkg/scheduler/api/devices/nvidia/gpushare". The package contains the following files: device.go(which implement SharedDevicePool interface methods), share.go(which contains private methods for device.go), type.go(which contains const values and definations). |
| 45 | + |
| 46 | +Details of methods mapping is shown in the table below: |
| 47 | + |
| 48 | +| origin file | corresponding file(s) in new package | |
| 49 | +| ------------- | ------------- | |
| 50 | +| pkg/scheduler/api/node_info.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go, pkg/scheduler/api/devices/nvidia/gpushare/share.go | |
| 51 | +| pkg/scheduler/api/device_info.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go, pkg/scheduler/api/devices/nvidia/gpushare/share.go | |
| 52 | +| pkg/scheduler/api/pod_info.go | pkg/scheduler/api/devices/nvidia/gpushare/share.go | |
| 53 | +| pkg/scheduler/plugins/predicates/predicates.go | pkg/scheduler/api/devices/nvidia/gpushare/device_info.go | |
| 54 | +| pkg/scheduler/plugins/predicates/gpu.go | pkg/scheduler/api/devices/nvidia/gpushare/share.go | |
| 55 | + |
| 56 | +## How to add a new device-share policy |
| 57 | + |
| 58 | +### 1. Define your device in /pkg/scheduler/api/shared_device_pool.go |
| 59 | + |
| 60 | +Name your policy and put it in shared_device_pool.go as follows: |
| 61 | + |
| 62 | +``` |
| 63 | +const ( |
| 64 | + GPUSharingDevice = "GpuShare" |
| 65 | + Your_new_sharing_policy = "xxxxx" |
| 66 | +) |
| 67 | +``` |
| 68 | + |
| 69 | +### 2. Create a new package in /pkg/scheduler/api/devices/"your device name"/"your policy name" |
| 70 | + |
| 71 | +For example, if you try to implement a NPU share policy, then you are recommended to create a package in /pkg/scheduler/api/device/ascend/npushare |
| 72 | + |
| 73 | +### 3. Implement methods of interface shared_device_pool, and put them in your new package |
| 74 | + |
| 75 | +Note that, you can't to refer to any struct of methods in scheduler.api to avoid cycle importing. If there is anything in scheduler.api you *must* need, then you should modify the SharedDevicePool interface to pass it. |
| 76 | +The methods defined in SharedDevicePool interface and its information is shown in table below: |
| 77 | + |
| 78 | +| interface | invoker file | information | |
| 79 | +| ------------- | ------------ | ------------- | |
| 80 | +| AddResource(pod *v1.Pod) | pkg/scheduler/api/node_info.go | Add the 'pod' and its resources into scheduler cache | |
| 81 | +| SubResource(pod *v1.Pod) | pkg/scheduler/api/node_info.go | Delete the 'pod' and substract its resources from scheduler cache | |
| 82 | +| HasDeviceRequest(pod *v1.Pod) bool | pkg/scheduler/plugins/predicates/predicate.go | Check whether this 'pod' request a portion of this device | |
| 83 | +| FilterNode(pod *v1.Pod)| pkg/scheduler/plugins/predicates/predicate.go | Check whether the portion of device this pod requests can fit in current node | |
| 84 | +| Allocate(kubeClient kubernetes.Interface, pod *v1.Pod) error | pkg/scheduler/plugins/predicates/predicate.go | Allocate the portion of this device from the current node to this pod | |
| 85 | +| Release(kubeClient kubernetes.Interface, pod *v1.Pod) error | pkg/scheduler/plugins/predicates/predicate.go | Dellocate the portion of this device from this pod | |
| 86 | +| GetStatus() string | none | Used for debug and monitor | |
| 87 | + |
| 88 | +### 4. Add your initialization code in /pkg/scheduler/api/node_info.go |
| 89 | + |
| 90 | +This is the *only* place you hack into scheduler.api ,which you have to register your policy during initialization of node_struct. |
| 91 | + |
| 92 | +``` |
| 93 | +
|
| 94 | +// setNodeOthersResource initialize sharable devices |
| 95 | +func (ni *NodeInfo) setNodeOthersResource(node *v1.Node) { |
| 96 | + ni.Others[GPUSharingDevice] = gpushare.NewGPUDevices(ni.Name, node) |
| 97 | + //ni.Others["your device sharing policy name"] = your device sharing package initialization method |
| 98 | +} |
| 99 | +
|
| 100 | +``` |
| 101 | + |
| 102 | +### 5. Check if your policy is enabled in /pkg/scheduler/plugins/predicate/predicates.go |
| 103 | + |
| 104 | +This is the *only* plae you hack into predicates.go, when the scheduler checks if your policy is enabled in scheduler configuration. |
| 105 | + |
| 106 | +predicates.go: |
| 107 | + |
| 108 | +``` |
| 109 | +... |
| 110 | +// Checks whether predicate.GPUSharingEnable is provided or not, if given, modifies the value in predicateEnable struct. |
| 111 | +args.GetBool(&gpushare.GpuSharingEnable, GPUSharingPredicate) |
| 112 | +args.GetBool(&gpushare.GpuNumberEnable, GPUNumberPredicate) |
| 113 | +args.GetBool(&gpushare.NodeLockEnable, NodeLockEnable) |
| 114 | +args.GetBool("your policy enable variable","your policy enable parameter") |
| 115 | +... |
| 116 | +``` |
| 117 | + |
| 118 | + |
| 119 | + |
| 120 | + |
0 commit comments