Skip to content

koord-descheduler: implement PodMigrationJob controller#404

Merged
koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom
eahydra:implement_podmigrationjob_controller
Aug 2, 2022
Merged

koord-descheduler: implement PodMigrationJob controller#404
koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom
eahydra:implement_podmigrationjob_controller

Conversation

@eahydra
Copy link
Member

@eahydra eahydra commented Jul 25, 2022

Signed-off-by: Joseph joseph.t.lee@outlook.com

Ⅰ. Describe what this PR does

The current version implements simple reservation-based migration capabilities. In the future versions implement the PodMigrationJob arbitration mechanism.

Ⅱ. Does this pull request fix one issue?

implement #196 #214

Ⅲ. Describe how to verify it

  1. create a test Pod with StatefulSet or Deployment
  2. apply one PodMigrationJob CR
apiVersion: scheduling.koordinator.sh/v1alpha1
kind: PodMigrationJob
metadata:
  name: test
spec:
  paused: false
  ttl: 30m
  mode: ReservationFirst
  podRef: 
    namespace: default
    name: default--6a049230-ce77-40d8-beb2-4a8898e595ed
status:
  phase: Pending
  1. watch the status change
$ kubectl get podmigrationjob test
NAME   PHASE     STATUS     AGE   NODE                     RESERVATION                            PODNAMESPACE   POD                                             NEWPOD                                          TTL
test   Succeed   Complete   24s   i-8vbh1xk320qazk24bkd8   18504799-d13a-4588-8790-f30fd841af79   default        default--72206752-55a2-49df-a131-1de32eb5424c   default--c32a94c8-b16e-40ad-8a21-2c7048a61292   30m0s

$ kubectl describe podmigrationjob test
...
Events:
  Type    Reason                Age   From                     Message
  ----    ------                ----  ----                     -------
  Normal  ReservationCreated    65s   MigrationJobController   Successfully create Reservation "default/18504799-d13a-4588-8790-f30fd841af79"
  Normal  ReservationScheduled  65s   MigrationJobController   Assigned Reservation "default/18504799-d13a-4588-8790-f30fd841af79" to node "i-8vbh1xk320qazk24bkd8"
  Normal  Evicting              65s   MigrationJobController   Try to evict Pod "default/default--72206752-55a2-49df-a131-1de32eb5424c"
  Normal  EvictComplete         55s   MigrationJobController   Pod "default/default--72206752-55a2-49df-a131-1de32eb5424c" has been evicted
  Normal  Complete              49s   MigrationJobController   Bind Pod "default/default--c32a94c8-b16e-40ad-8a21-2c7048a61292" in Reservation "default/18504799-d13a-4588-8790-f30fd841af79"

$ kubectl get podmigrationjob test -o yaml
apiVersion: scheduling.koordinator.sh/v1alpha1
kind: PodMigrationJob
metadata:
  annotations:
  creationTimestamp: "2022-07-25T13:41:36Z"
  generation: 2
  name: test
  resourceVersion: "7857214200"
  uid: 18504799-d13a-4588-8790-f30fd841af79
spec:
  mode: ReservationFirst
  podRef:
    name: default--72206752-55a2-49df-a131-1de32eb5424c
    namespace: default
  reservationOptions:
    reservationRef:
      name: 18504799-d13a-4588-8790-f30fd841af79
      namespace: default
      uid: acf4e254-108b-4ad1-9f5d-278d7e11479b
    template:
      metadata:
        creationTimestamp: null
        labels:
          app.kubernetes.io/created-by: koord-descheduler
        name: 18504799-d13a-4588-8790-f30fd841af79
        namespace: default
      spec:
        owners:
        - controller:
            apiVersion: apps/v1
            blockOwnerDeletion: true
            controller: true
            kind: StatefulSet
            name: demo-test
            namespace: default
            uid: 2f96233d-a6b9-4981-b594-7c90c987aed9
        template:
          ... # pod template
        ttl: 30m0s
  ttl: 30m0s
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-07-25T13:41:38Z"
    status: "True"
    type: ReservationCreated
  - lastProbeTime: null
    lastTransitionTime: "2022-07-25T13:41:38Z"
    status: "True"
    type: ReservationScheduled
  - lastProbeTime: null
    lastTransitionTime: "2022-07-25T13:41:47Z"
    reason: EvictComplete
    status: "True"
    type: Eviction
  - lastProbeTime: null
    lastTransitionTime: "2022-07-25T13:41:54Z"
    message: Bind Pod "default/default--c32a94c8-b16e-40ad-8a21-2c7048a61292" in Reservation
      "default/18504799-d13a-4588-8790-f30fd841af79"
    status: "True"
    type: PodBoundReservation
  message: Bind Pod "default/default--c32a94c8-b16e-40ad-8a21-2c7048a61292" in Reservation
    "default/18504799-d13a-4588-8790-f30fd841af79"
  nodeName: i-8vbh1xk320qazk24bkd8
  phase: Succeed
  podRef:
    kind: Pod
    name: default--c32a94c8-b16e-40ad-8a21-2c7048a61292
    namespace: default
  status: Complete

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@koordinator-bot koordinator-bot bot requested review from hormes and yihuifeng July 25, 2022 09:01
@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch from d10bd99 to 9e221e9 Compare July 25, 2022 09:03
@eahydra
Copy link
Member Author

eahydra commented Jul 25, 2022

/hold

@eahydra eahydra requested review from jasonliu747 and saintube July 25, 2022 09:06
@eahydra eahydra added this to the v0.6 milestone Jul 25, 2022
@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch 4 times, most recently from 8e78e9f to f1e46cb Compare July 25, 2022 09:38
@codecov
Copy link

codecov bot commented Jul 25, 2022

Codecov Report

Merging #404 (15e5e8c) into main (9e8fc01) will increase coverage by 0.11%.
The diff coverage is 70.71%.

@@            Coverage Diff             @@
##             main     #404      +/-   ##
==========================================
+ Coverage   67.75%   67.87%   +0.11%     
==========================================
  Files         134      137       +3     
  Lines       14313    14912     +599     
==========================================
+ Hits         9698    10121     +423     
- Misses       3900     4043     +143     
- Partials      715      748      +33     
Flag Coverage Δ
unittests 67.87% <70.71%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...kg/descheduler/controllers/migration/controller.go 66.53% <66.53%> (ø)
...descheduler/controllers/migration/assumed_cache.go 93.18% <93.18%> (ø)
pkg/descheduler/controllers/migration/util/util.go 100.00% <100.00%> (ø)
pkg/descheduler/evictions/evictions.go 90.90% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9e8fc01...15e5e8c. Read the comment docs.

@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch from f1e46cb to b8338b7 Compare July 25, 2022 17:06
@eahydra
Copy link
Member Author

eahydra commented Jul 25, 2022

/hold cancel

@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch 3 times, most recently from 50a9e92 to 367eb47 Compare July 26, 2022 03:20
@eahydra eahydra requested a review from allwmh July 26, 2022 04:00
@eahydra eahydra changed the title koord-deschuler: implement PodMigrationJob controller koord-descheduler: implement PodMigrationJob controller Jul 27, 2022
@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch from 367eb47 to 25f3c62 Compare July 27, 2022 02:47
@eahydra
Copy link
Member Author

eahydra commented Jul 29, 2022

/hold

@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch from 25f3c62 to 5af06db Compare August 2, 2022 04:19
@eahydra
Copy link
Member Author

eahydra commented Aug 2, 2022

test with Koordinator Reservation API

  1. create a test Pod with StatefulSet or Deployment
  2. apply one PodMigrationJob CR
apiVersion: scheduling.koordinator.sh/v1alpha1
kind: PodMigrationJob
metadata:
  name: test
spec:
  paused: false
  ttl: 30m
  mode: ReservationFirst
  podRef: 
    namespace: default
    name: curlimage-78d676778d-r585h
status:
  phase: Pending
  1. watch the status change
$ kubectl get podmigrationjob test
NAME   PHASE     STATUS     AGE   NODE                       RESERVATION                            PODNAMESPACE   POD                          NEWPOD                       TTL
test   Succeed   Complete   14m   cn-hangzhou.192.168.0.35   12260bab-72b9-48bd-9266-3d090d65ccf3   default        curlimage-78d676778d-r585h   curlimage-78d676778d-d27n7   30m0s

$ kubectl describe podmigrationjob test
...
Events:
  Type    Reason                Age   From               Message
  ----    ------                ----  ----               -------
  Normal  ReservationCreated    15m   koord-descheduler  Successfully create Reservation "12260bab-72b9-48bd-9266-3d090d65ccf3"
  Normal  ReservationScheduled  15m   koord-descheduler  Assigned Reservation "12260bab-72b9-48bd-9266-3d090d65ccf3" to node "cn-hangzhou.192.168.0.35"
  Normal  Evicting              15m   koord-descheduler  Try to evict Pod "default/curlimage-78d676778d-r585h"
  Normal  EvictComplete         15m   koord-descheduler  Pod "default/curlimage-78d676778d-r585h" has been evicted
  Normal  Complete              15m   koord-descheduler  Bind Pod "default/curlimage-78d676778d-d27n7" in Reservation "12260bab-72b9-48bd-9266-3d090d65ccf3"

$ kubectl get podmigrationjob test -o yaml
apiVersion: scheduling.koordinator.sh/v1alpha1
kind: PodMigrationJob
metadata:
  creationTimestamp: "2022-08-02T04:05:47Z"
  generation: 2
  name: test
  resourceVersion: "46054312"
  uid: 12260bab-72b9-48bd-9266-3d090d65ccf3
spec:
  mode: ReservationFirst
  podRef:
    name: curlimage-78d676778d-r585h
    namespace: default
  reservationOptions:
    reservationRef:
      name: 12260bab-72b9-48bd-9266-3d090d65ccf3
      uid: d182c8d8-a599-4579-ac32-e5184aa6bde2
    template:
      metadata:
        creationTimestamp: null
        labels:
          app.kubernetes.io/created-by: koord-descheduler
        name: 12260bab-72b9-48bd-9266-3d090d65ccf3
        namespace: default
      spec:
        owners:
        - controller:
            apiVersion: apps/v1
            blockOwnerDeletion: true
            controller: true
            kind: ReplicaSet
            name: curlimage-78d676778d
            namespace: default
            uid: 6cd097a1-0029-4743-b987-12e03ef54161
        template:
          # ... PodTemplateSpec
        ttl: 30m0s
  ttl: 30m0s
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2022-08-02T04:05:46Z"
    status: "True"
    type: ReservationCreated
  - lastProbeTime: null
    lastTransitionTime: "2022-08-02T04:05:46Z"
    status: "True"
    type: ReservationScheduled
  - lastProbeTime: null
    lastTransitionTime: "2022-08-02T04:06:19Z"
    reason: EvictComplete
    status: "True"
    type: Eviction
  - lastProbeTime: null
    lastTransitionTime: "2022-08-02T04:06:20Z"
    message: Bind Pod "default/curlimage-78d676778d-d27n7" in Reservation "12260bab-72b9-48bd-9266-3d090d65ccf3"
    status: "True"
    type: PodBoundReservation
  message: Bind Pod "default/curlimage-78d676778d-d27n7" in Reservation "12260bab-72b9-48bd-9266-3d090d65ccf3"
  nodeName: cn-hangzhou.192.168.0.35
  phase: Succeed
  podRef:
    name: curlimage-78d676778d-d27n7
    namespace: default
    uid: bf04f31f-254f-4c0f-af0b-1a262362857c
  status: Complete

@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch from 5af06db to 9eb8a38 Compare August 2, 2022 04:42
@eahydra
Copy link
Member Author

eahydra commented Aug 2, 2022

/hold cancel

@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch from 9eb8a38 to 9f0e8ce Compare August 2, 2022 05:55
FYI: docs/proposals/scheduling/20220701-pod-migration-job.md

Signed-off-by: Joseph <joseph.t.lee@outlook.com>
@eahydra eahydra force-pushed the implement_podmigrationjob_controller branch from 9f0e8ce to 15e5e8c Compare August 2, 2022 06:31
@hormes
Copy link
Member

hormes commented Aug 2, 2022

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit a89cd98 into koordinator-sh:main Aug 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants