Skip to content

koord-descheduler: fix descheduler object limiter with multiple profiles#2200

Merged
koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom
songtao98:support_descheduler_multiple_profiles
Jun 11, 2025
Merged

koord-descheduler: fix descheduler object limiter with multiple profiles#2200
koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom
songtao98:support_descheduler_multiple_profiles

Conversation

@songtao98
Copy link
Contributor

Ⅰ. Describe what this PR does

koord-descheduler uses limiterMap to manage object limiters. If multiple profiles are configured, all MigrationControllers in all profiles can evict pods from all PodMigrationJobs. But a workload limiter may be recorded inside one limiterMap of a profile, but evicted by another limiterMap. This may cause object limiter not working.

This PR add a UUID to MigrationController and an Annotation to PodMigrationJob. Each time a PodMigrationJob is created by MigrationController, its UUID will be recorded into this Annotation. Each time a MigrationController from a profile reconciling a PodMigrationJob, it decide if to drop this PodMigraionJob by this annotation.

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@songtao98 songtao98 force-pushed the support_descheduler_multiple_profiles branch from 8a0c4bb to 39743f8 Compare September 9, 2024 03:04
@codecov
Copy link

codecov bot commented Sep 9, 2024

Codecov Report

Attention: Patch coverage is 42.85714% with 4 lines in your changes missing coverage. Please review.

Project coverage is 65.96%. Comparing base (12265ae) to head (3bc5ff9).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
...kg/descheduler/controllers/migration/controller.go 0.00% 3 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2200      +/-   ##
==========================================
+ Coverage   65.95%   65.96%   +0.01%     
==========================================
  Files         478      478              
  Lines       56251    56256       +5     
==========================================
+ Hits        37098    37107       +9     
+ Misses      16469    16462       -7     
- Partials     2684     2687       +3     
Flag Coverage Δ
unittests 65.96% <42.85%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@stale
Copy link

stale bot commented Dec 8, 2024

This issue has been automatically marked as stale because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Close this issue or PR with /close
    Thank you for your contributions.

@stale stale bot added the lifecycle/stale label Dec 8, 2024
@stale
Copy link

stale bot commented Jan 8, 2025

This issue has been automatically closed because it has not had recent activity.
This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the issue is closed
    You can:
  • Reopen this PR with /reopen
    Thank you for your contributions.

@stale stale bot closed this Jan 8, 2025
@songtao98
Copy link
Contributor Author

/reopen

@koordinator-bot koordinator-bot bot reopened this Jun 9, 2025
@koordinator-bot
Copy link

@songtao98: Reopened this PR.

Details

In response to this:

/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Signed-off-by: songtao98 <songtao2603060@gmail.com>
@hormes
Copy link
Member

hormes commented Jun 11, 2025

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hormes

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit cd618e9 into koordinator-sh:main Jun 11, 2025
33 of 34 checks passed
qinfustu added a commit to qinfustu/koordinator that referenced this pull request Jun 11, 2025
scheduler: fix runtime not updated when no pending pods in quota

Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>

manager: reconcile colocation-profile if enabled (koordinator-sh#2472)

Signed-off-by: saintube <saintube@foxmail.com>
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>

proposal: heterogeneous GPU device reporting (koordinator-sh#2423)

Signed-off-by: ZhuZhezz <zzhuzju@163.com>

scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467)

Signed-off-by: Zach Zhu <zzqshu@126.com>

koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

scheduler: fix quota webhook panic (koordinator-sh#2473)

Signed-off-by: shaloulcy <lcy041536@gmail.com>
qinfustu added a commit to qinfustu/koordinator that referenced this pull request Jun 11, 2025
scheduler: fix runtime not updated when no pending pods in quota

Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>

manager: reconcile colocation-profile if enabled (koordinator-sh#2472)

Signed-off-by: saintube <saintube@foxmail.com>
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>

proposal: heterogeneous GPU device reporting (koordinator-sh#2423)

Signed-off-by: ZhuZhezz <zzhuzju@163.com>

scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467)

Signed-off-by: Zach Zhu <zzqshu@126.com>

koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

scheduler: fix quota webhook panic (koordinator-sh#2473)

Signed-off-by: shaloulcy <lcy041536@gmail.com>
qinfustu added a commit to qinfustu/koordinator that referenced this pull request Jun 11, 2025
scheduler: fix runtime not updated when no pending pods in quota

Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>

manager: reconcile colocation-profile if enabled (koordinator-sh#2472)

Signed-off-by: saintube <saintube@foxmail.com>
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>

proposal: heterogeneous GPU device reporting (koordinator-sh#2423)

Signed-off-by: ZhuZhezz <zzhuzju@163.com>

scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467)

Signed-off-by: Zach Zhu <zzqshu@126.com>

koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

scheduler: fix quota webhook panic (koordinator-sh#2473)

Signed-off-by: shaloulcy <lcy041536@gmail.com>
Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>
qinfustu added a commit to qinfustu/koordinator that referenced this pull request Jun 11, 2025
scheduler: fix runtime not updated when no pending pods in quota

Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>

manager: reconcile colocation-profile if enabled (koordinator-sh#2472)

Signed-off-by: saintube <saintube@foxmail.com>
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>

proposal: heterogeneous GPU device reporting (koordinator-sh#2423)

Signed-off-by: ZhuZhezz <zzhuzju@163.com>

scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467)

Signed-off-by: Zach Zhu <zzqshu@126.com>

koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

scheduler: fix quota webhook panic (koordinator-sh#2473)

Signed-off-by: shaloulcy <lcy041536@gmail.com>
Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>
Signed-off-by: fu_qin <fu_qin_stu@163.com>
qinfustu added a commit to qinfustu/koordinator that referenced this pull request Jun 12, 2025
scheduler: fix runtime not updated when no pending pods in quota

Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>

manager: reconcile colocation-profile if enabled (koordinator-sh#2472)

Signed-off-by: saintube <saintube@foxmail.com>
Co-authored-by: shenxin <rougang.hrg@alibaba-inc.com>

proposal: heterogeneous GPU device reporting (koordinator-sh#2423)

Signed-off-by: ZhuZhezz <zzhuzju@163.com>

scheduler: fine-grained device scheduling support Huawei Ascend NPU (full card) (koordinator-sh#2467)

Signed-off-by: Zach Zhu <zzqshu@126.com>

koord-descheduler: support node selector for each descheduler profile (koordinator-sh#2168)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

koord-descheduler: fix descheduler object limiter with multiple profiles (koordinator-sh#2200)

Signed-off-by: songtao98 <songtao2603060@gmail.com>

scheduler: fix quota webhook panic (koordinator-sh#2473)

Signed-off-by: shaloulcy <lcy041536@gmail.com>
Signed-off-by: qinfustu <30459241+qinfustu@users.noreply.github.com>
Signed-off-by: fu_qin <fu_qin_stu@163.com>
Signed-off-by: qinfustu <fu_qin_stu@163.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants