Skip to content

Pipeline task if task's request resource less than the releasing resource of node during performing allocate action#541

Merged
volcano-sh-bot merged 1 commit intovolcano-sh:masterfrom
sivanzcw:bugfix
Dec 19, 2019
Merged

Pipeline task if task's request resource less than the releasing resource of node during performing allocate action#541
volcano-sh-bot merged 1 commit intovolcano-sh:masterfrom
sivanzcw:bugfix

Conversation

@sivanzcw
Copy link
Copy Markdown
Contributor

@volcano-sh-bot volcano-sh-bot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 20, 2019
@TravisBuddy
Copy link
Copy Markdown

Hey @sivanzcw,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: 3aa881f0-0b81-11ea-9cd6-8f216fa7db85

if err := stmt.Pipeline(task, node.Name); err != nil {
glog.Errorf("Failed to pipeline Task %v on %v",
task.UID, node.Name)
if err := ssn.Pipeline(task, node.Name); err != nil {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any more info on why change it from stmt to ssn ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If reclaim action is enabled after allocate action. The cluter situation is as below

serial node name resource
1 node1 4c8g
2 node2 4c8g
serial queue name weight quota status
1 default 1 0.8c 1.5M overused
2 queue1 100000 7c 8g active
serial job name pods number minA queue status
1 joba 7 1 default all running
2 jobb 7 7 queue1 all pending

There are two jobs in the cluster, joba and jobb. Joba was placed in default queue. Jobb was placed in queue1 queue. Joba has 7pods Running. Jobb has 7pods pending. default queue was overused. Pods in queue1 will try to reclaim resource from defualt queue.

  • In reclaim action, podb-1 in jobb evicted pod poda-1 in joba, poda-1 was originally at node node1, the scheduling loop ends.

  • In the next scheduling loop. In allocate action, podb-1 want to be pipelined to node node1, but gang-restriction of jobb was not meet, the pipeline action will be discard. In allocate action no pod was pipelined, though there are releasing resources in cluster.

  • In relcaim action of this scheduling loop, podb-1 in jobb will try to evicted other pods in joba.

  • Finally, podb-1 will evicted 6 pods from joba.

  • So if there are releasing resources in cluster, pod who has the higher priority, may should be pipelined to the node, regardless of whether the gang restriction of job of the pod was meet, in case that, the pod will evict other pods in subsequent actions.

@k82cn
Copy link
Copy Markdown
Member

k82cn commented Dec 16, 2019

/approve

@volcano-sh-bot volcano-sh-bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 16, 2019
@TravisBuddy
Copy link
Copy Markdown

Hey @sivanzcw,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 53495a40-1fd6-11ea-ba47-7f442aed9c1e

@TravisBuddy
Copy link
Copy Markdown

Hey @sivanzcw,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 3fc95e60-2077-11ea-830b-038034041c48

@volcano-sh-bot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: k82cn, sivanzcw

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@TravisBuddy
Copy link
Copy Markdown

Hey @sivanzcw,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: b9b37c50-207d-11ea-830b-038034041c48

…urce of node during performing allocate action
@TravisBuddy
Copy link
Copy Markdown

Hey @sivanzcw,
Something went wrong with the build.

TravisCI finished with status errored, which means the build failed because of something unrelated to the tests, such as a problem with a dependency or the build process itself.

View build log

TravisBuddy Request Identifier: 39863f60-2085-11ea-830b-038034041c48

@TravisBuddy
Copy link
Copy Markdown

Hey @sivanzcw,
Your changes look good to me!

View build log

TravisBuddy Request Identifier: 03bebdc0-2086-11ea-830b-038034041c48

@k82cn
Copy link
Copy Markdown
Member

k82cn commented Dec 19, 2019

/lgtm

@volcano-sh-bot volcano-sh-bot added the lgtm Indicates that a PR is ready to be merged. label Dec 19, 2019
@volcano-sh-bot volcano-sh-bot merged commit ba6677b into volcano-sh:master Dec 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants