koord-descheduler: fix descheduler object limiter with multiple profiles#2200
Conversation
8a0c4bb to
39743f8
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2200 +/- ##
==========================================
+ Coverage 65.95% 65.96% +0.01%
==========================================
Files 478 478
Lines 56251 56256 +5
==========================================
+ Hits 37098 37107 +9
+ Misses 16469 16462 -7
- Partials 2684 2687 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
This issue has been automatically marked as stale because it has not had recent activity.
|
|
This issue has been automatically closed because it has not had recent activity.
|
|
/reopen |
|
@songtao98: Reopened this PR. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Signed-off-by: songtao98 <songtao2603060@gmail.com>
39743f8 to
3bc5ff9
Compare
|
/lgtm |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hormes The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
cd618e9
into
koordinator-sh:main
scheduler: fix runtime not updated when no pending pods in quota Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com> manager: reconcile colocation-profile if enabled (koordinator-sh#2472) Signed-off-by: saintube <saintube@foxmail.com> Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> proposal: heterogeneous GPU device reporting (koordinator-sh#2423) Signed-off-by: ZhuZhezz <zzhuzju@163.com> scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467) Signed-off-by: Zach Zhu <zzqshu@126.com> koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168) Signed-off-by: songtao98 <songtao2603060@gmail.com> koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200) Signed-off-by: songtao98 <songtao2603060@gmail.com> scheduler: fix quota webhook panic (koordinator-sh#2473) Signed-off-by: shaloulcy <lcy041536@gmail.com>
scheduler: fix runtime not updated when no pending pods in quota Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com> manager: reconcile colocation-profile if enabled (koordinator-sh#2472) Signed-off-by: saintube <saintube@foxmail.com> Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> proposal: heterogeneous GPU device reporting (koordinator-sh#2423) Signed-off-by: ZhuZhezz <zzhuzju@163.com> scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467) Signed-off-by: Zach Zhu <zzqshu@126.com> koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168) Signed-off-by: songtao98 <songtao2603060@gmail.com> koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200) Signed-off-by: songtao98 <songtao2603060@gmail.com> scheduler: fix quota webhook panic (koordinator-sh#2473) Signed-off-by: shaloulcy <lcy041536@gmail.com>
scheduler: fix runtime not updated when no pending pods in quota Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com> manager: reconcile colocation-profile if enabled (koordinator-sh#2472) Signed-off-by: saintube <saintube@foxmail.com> Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> proposal: heterogeneous GPU device reporting (koordinator-sh#2423) Signed-off-by: ZhuZhezz <zzhuzju@163.com> scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467) Signed-off-by: Zach Zhu <zzqshu@126.com> koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168) Signed-off-by: songtao98 <songtao2603060@gmail.com> koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200) Signed-off-by: songtao98 <songtao2603060@gmail.com> scheduler: fix quota webhook panic (koordinator-sh#2473) Signed-off-by: shaloulcy <lcy041536@gmail.com> Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>
scheduler: fix runtime not updated when no pending pods in quota Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com> manager: reconcile colocation-profile if enabled (koordinator-sh#2472) Signed-off-by: saintube <saintube@foxmail.com> Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> proposal: heterogeneous GPU device reporting (koordinator-sh#2423) Signed-off-by: ZhuZhezz <zzhuzju@163.com> scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467) Signed-off-by: Zach Zhu <zzqshu@126.com> koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168) Signed-off-by: songtao98 <songtao2603060@gmail.com> koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200) Signed-off-by: songtao98 <songtao2603060@gmail.com> scheduler: fix quota webhook panic (koordinator-sh#2473) Signed-off-by: shaloulcy <lcy041536@gmail.com> Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com> Signed-off-by: fu_qin <fu_qin_stu@163.com>
scheduler: fix runtime not updated when no pending pods in quota Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com> manager: reconcile colocation-profile if enabled (koordinator-sh#2472) Signed-off-by: saintube <saintube@foxmail.com> Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com> proposal: heterogeneous GPU device reporting (koordinator-sh#2423) Signed-off-by: ZhuZhezz <zzhuzju@163.com> scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467) Signed-off-by: Zach Zhu <zzqshu@126.com> koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168) Signed-off-by: songtao98 <songtao2603060@gmail.com> koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200) Signed-off-by: songtao98 <songtao2603060@gmail.com> scheduler: fix quota webhook panic (koordinator-sh#2473) Signed-off-by: shaloulcy <lcy041536@gmail.com> Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com> Signed-off-by: fu_qin <fu_qin_stu@163.com> Signed-off-by: qinfustu <fu_qin_stu@163.com>
Ⅰ. Describe what this PR does
koord-descheduler uses limiterMap to manage object limiters. If multiple profiles are configured, all MigrationControllers in all profiles can evict pods from all PodMigrationJobs. But a workload limiter may be recorded inside one limiterMap of a profile, but evicted by another limiterMap. This may cause object limiter not working.
This PR add a UUID to MigrationController and an Annotation to PodMigrationJob. Each time a PodMigrationJob is created by MigrationController, its UUID will be recorded into this Annotation. Each time a MigrationController from a profile reconciling a PodMigrationJob, it decide if to drop this PodMigraionJob by this annotation.
Ⅱ. Does this pull request fix one issue?
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test