Skip to content

koord-scheduler: deviceshare plugin skips handling nodes without device cr#1594

Merged
koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom
lucming:fix-gpu-sched
Aug 30, 2023
Merged

koord-scheduler: deviceshare plugin skips handling nodes without device cr#1594
koordinator-bot[bot] merged 1 commit intokoordinator-sh:mainfrom
lucming:fix-gpu-sched

Conversation

@lucming
Copy link
Contributor

@lucming lucming commented Aug 29, 2023

Ⅰ. Describe what this PR does

background:
Initially, gpu scheduling was implemented through the gpu device plugin, and after switching to koord-scheduler scheduling, the pod could not be scheduled, prompts that no device resources.

reason:
The gpu node doesn't have a koordlet, but it does report gpu info to node.status via the device plugin, so it can be scheduled by default-scheduler. But koord-scheduler's deviceshare plugin intercepts scheduling pod because there are no device resources.

resolve:
deviceshare skips handling nodes without device

Ⅱ. Does this pull request fix one issue?

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@koordinator-bot koordinator-bot bot requested review from buptcozy and eahydra August 29, 2023 15:37
@lucming lucming force-pushed the fix-gpu-sched branch 3 times, most recently from a10c5a4 to e905bcb Compare August 29, 2023 16:20
@codecov
Copy link

codecov bot commented Aug 29, 2023

Codecov Report

Patch coverage: 66.66% and project coverage change: -0.09% ⚠️

Comparison is base (490dfd6) 65.28% compared to head (7804be6) 65.20%.
Report is 5 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1594      +/-   ##
==========================================
- Coverage   65.28%   65.20%   -0.09%     
==========================================
  Files         359      359              
  Lines       36718    36783      +65     
==========================================
+ Hits        23971    23983      +12     
- Misses      10995    11048      +53     
  Partials     1752     1752              
Flag Coverage Δ
unittests 65.20% <66.66%> (-0.09%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Changed Coverage Δ
pkg/scheduler/plugins/deviceshare/plugin.go 60.81% <66.66%> (ø)

... and 9 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eahydra
Copy link
Member

eahydra commented Aug 30, 2023

We've also had this problem for the last few days. But such modifications may not be enough.
For example, there may be both nvidia device-plugin and koordlet on a node, then the scheduler needs to check whether the Device CRD object exists at this time; but if there is only nvidia device-plugin, then there is no need to check.

So I have an idea to add a label on the node dimension, such as node.koordinator.sh/gpu-resource-model, the corresponding value is as follows:

  • Shared indicates that only the Koordinator GPU Shared protocol is supported
  • Vendor indicates that only vendor protocols are supported, such as nvidia.com/gpu
  • Mixed means that both of the above are supported
  • Native indicates that the node is in native mode and does not need to be processed by the device-share plugin of the scheduler.

WDYT? @lucming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the lifetime of the state is one scheduling cycle, so cannot fix as this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will work for the entire scheduling cycle, and I'm handling it this way in the hope that the plugin wouldn't handle nodes without devices.

so, if there is no device it will only be affected by the result of noderesourcefit, and if there is a device it will be handled by deviceshare

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But state.skip means that Pod does not need to request GPU or other devices.

@eahydra
Copy link
Member

eahydra commented Aug 30, 2023

We've also had this problem for the last few days. But such modifications may not be enough. For example, there may be both nvidia device-plugin and koordlet on a node, then the scheduler needs to check whether the Device CRD object exists at this time; but if there is only nvidia device-plugin, then there is no need to check.

So I have an idea to add a label on the node dimension, such as node.koordinator.sh/gpu-resource-model, the corresponding value is as follows:

  • Shared indicates that only the Koordinator GPU Shared protocol is supported
  • Vendor indicates that only vendor protocols are supported, such as nvidia.com/gpu
  • Mixed means that both of the above are supported
  • Native indicates that the node is in native mode and does not need to be processed by the device-share plugin of the scheduler.

WDYT? @lucming

I think it's complicated. In fact, it is only useful in the Native scenario, but in this scenario, judging whether it is Native is equivalent to not checking the Device CRD.

@lucming
Copy link
Contributor Author

lucming commented Aug 30, 2023

We've also had this problem for the last few days. But such modifications may not be enough. For example, there may be both nvidia device-plugin and koordlet on a node, then the scheduler needs to check whether the Device CRD object exists at this time; but if there is only nvidia device-plugin, then there is no need to check.
So I have an idea to add a label on the node dimension, such as node.koordinator.sh/gpu-resource-model, the corresponding value is as follows:

  • Shared indicates that only the Koordinator GPU Shared protocol is supported
  • Vendor indicates that only vendor protocols are supported, such as nvidia.com/gpu
  • Mixed means that both of the above are supported
  • Native indicates that the node is in native mode and does not need to be processed by the device-share plugin of the scheduler.

WDYT? @lucming

I think it's complicated. In fact, it is only useful in the Native scenario, but in this scenario, judging whether it is Native is equivalent to not checking the Device CRD.

agree with you (^^)

@lucming
Copy link
Contributor Author

lucming commented Aug 30, 2023

We also need to clean up node.status.allocatable after the device resource has been deleted, this is part of the logic in koord-manger, I'll fix it by another pr.

@lucming lucming force-pushed the fix-gpu-sched branch 2 times, most recently from 1bb05b9 to c69b34b Compare August 30, 2023 09:12
@koordinator-bot koordinator-bot bot added size/S and removed size/XS labels Aug 30, 2023
…ce cr

Signed-off-by: lucming <2876757716@qq.com>
Signed-off-by: liuming6 <liuming6@360.cn>
Copy link
Member

@eahydra eahydra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: eahydra

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@koordinator-bot koordinator-bot bot merged commit 3142de0 into koordinator-sh:main Aug 30, 2023
Copy link
Member

@jasonliu747 jasonliu747 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants