Conversation
Travis tests have failedHey @lminzhw, TravisBuddy Request Identifier: 83fd1f90-a9cb-11e9-b808-dd0e8a2f961c |
|
@k82cn: GitHub didn't allow me to request PR reviews from the following users: jeefy. Note that only volcano-sh members and repo collaborators can review this PR, and authors cannot review their own PRs. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Travis tests have failedHey @lminzhw, TravisBuddy Request Identifier: ffa49300-a9d3-11e9-b808-dd0e8a2f961c |
| // map[queueId]PriorityQueue(namespaceName) | ||
| namespaceMap := map[api.QueueID]*util.PriorityQueue{} | ||
| // map[queueId]map[namespaceName]PriorityQueue(*api.JobInfo) | ||
| jobInNamespaceMap := map[api.QueueID]map[string]*util.PriorityQueue{} |
There was a problem hiding this comment.
add more comments to describe these two maps.
| } | ||
|
|
||
| // AddResourceQuota add ResourceQuota to scheduler cache | ||
| func (sc *SchedulerCache) AddResourceQuota(obj interface{}) { |
There was a problem hiding this comment.
I can not see where you filterout resource quota that does not set vaolcano.sh/xx.weight
There was a problem hiding this comment.
The namespace without this label would default to 1
There was a problem hiding this comment.
In itemFromQuota function, any ResourceQuota without xx.weight key would have weight with default value 1
| namespaceAllocation := map[string]*api.Resource{} | ||
|
|
||
| for _, preemptee := range preemptees { | ||
| if namespaceOrderEnabled && preemptor.Namespace != preemptee.Namespace { |
There was a problem hiding this comment.
Better refactor this part. since the change is little too hardcoded.
There was a problem hiding this comment.
refactor this part, namespace order policy is now a separate code block out of job order policy.
hzxuzhonghu
left a comment
There was a problem hiding this comment.
I think we need more tests, especially for namespace job allocate part.
Also i noticed a problem: The namespce may not get requested resources even when there is enough quota. Say we have queue1 and queue2, ns1 and ns2, all with weight 1. Jobs in ns1 (requires 50% of all resources) are using queue1 and queue2. Jobs in ns2(requires 50% all resources) are using queue2 only.
If queue order is queue2 ->queue1, then ns2 can only get 25% of the resources. Is this correct?
| } | ||
|
|
||
| // AddResourceQuota add ResourceQuota to scheduler cache | ||
| func (sc *SchedulerCache) AddResourceQuota(obj interface{}) { |
There was a problem hiding this comment.
The namespace without this label would default to 1
| // TODO(lminzhw): if all NamespaceOrderFn treat these two namespace as the same, | ||
| // we should make the job order have its affect among namespaces. | ||
| // or just schedule namespace one by one | ||
| lv := l.(string) |
There was a problem hiding this comment.
Do we need to check if this converting failing?
There was a problem hiding this comment.
this actions follows the code pattern in other CompareFn, like taskOrder or jobOrder in priority plugin.
| } | ||
|
|
||
| // NamespaceOrderFn invoke namespaceorder function of the plugins | ||
| func (ssn *Session) NamespaceOrderFn(l, r interface{}) bool { |
There was a problem hiding this comment.
So the order of namespace is decided by the first plugin if it has enabled the namespace order?
There was a problem hiding this comment.
Yes, if some plugin decide namespace A is prior to B, this decision is just the final result.
This logic follows the code pattern in other CompareFn, like JobOrderFn.
| namespaceAllocation := map[string]*api.Resource{} | ||
|
|
||
| for _, preemptee := range preemptees { | ||
| if namespaceOrderEnabled && preemptor.Namespace != preemptee.Namespace { |
There was a problem hiding this comment.
Better refactor this part. since the change is little too hardcoded.
ns2 would both get resource from q1 and q2 I guess so. |
Travis tests have failedHey @lminzhw, TravisBuddy Request Identifier: cca67310-ad20-11e9-a8f1-bbdc807b8a3b |
We have 2 more e2e test cases and a XXL PR now. : ) |
|
@lminzhw Would suggest not squash all commits, it is hard to review. |
| err = waitJobPhaseReady(context, job) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| }) | ||
|
|
There was a problem hiding this comment.
Better add a case with both queue and namespace fair share?
There was a problem hiding this comment.
This RP is too big now, I'll submit a new test next PR.
| ) | ||
|
|
||
| // NamespaceName is name of namespace | ||
| type NamespaceName string |
There was a problem hiding this comment.
Would Namespace be other types other than string? If not, it might be better to use string directly. Then it avoids many type conversion
There was a problem hiding this comment.
I just want to make codes more readable, so choose to mark it as a new type but not mark it in comment.
| // pick namespace from namespaces PriorityQueue | ||
| namespace := namespaces.Pop().(api.NamespaceName) | ||
|
|
||
| queueInNamespace := jobsMap[namespace] |
There was a problem hiding this comment.
A little confused. What is the relationship between queue and namespace? According to the doc motivation, a namespace works like a user, and there might be many users(namespaces) in a queue. Will be many queues in a namespace?
There was a problem hiding this comment.
@hex108 As I understand, generally speaking, different users (namespace) could schedule jobs in different Queues. This doc I think concentrates more on the faire resource sharing in one queue between users. This solution is mainly trying to resolve starvation problem in resource sharing.
| var queue *api.QueueInfo | ||
| for queueId := range queueInNamespace { | ||
| currentQueue := ssn.Queues[queueId] | ||
| if ssn.Overused(currentQueue) { |
There was a problem hiding this comment.
If the queue is overused, could not the queue get resource even the cluster has idle resource?
There was a problem hiding this comment.
Yes, the overused queue will be skipped in master code too.
| } | ||
|
|
||
| // NamespaceOrderEnabled returns the NamespaceOrder for this plugin is enabled in this session or not | ||
| func (drf *drfPlugin) NamespaceOrderEnabled(ssn *framework.Session) bool { |
There was a problem hiding this comment.
What is the difference between it and func (ssn *Session) NamespaceOrderEnabled()?
There was a problem hiding this comment.
The NamespaceOrderEnabled in plugin just return the value of that plugin. This func return the result of this session and used in allocate action.
And in the newest version code, I delete this func and will open a Issue to adjust the behavior when the NamespaceOrder is disabled or other cases.
|
/approve |
| // pick namespace from namespaces PriorityQueue | ||
| namespace := namespaces.Pop().(api.NamespaceName) | ||
|
|
||
| queueInNamespace := jobsMap[namespace] |
There was a problem hiding this comment.
@hex108 As I understand, generally speaking, different users (namespace) could schedule jobs in different Queues. This doc I think concentrates more on the faire resource sharing in one queue between users. This solution is mainly trying to resolve starvation problem in resource sharing.
| // Name is the name of this namespace | ||
| Name NamespaceName | ||
| // Weight is the highest weight among many ResourceQuota. | ||
| Weight int64 |
There was a problem hiding this comment.
if we use signed integer, what's the best practice here? Do we suggest users to always use weights > 0 or it could be negative numbers?
There was a problem hiding this comment.
Yes, weight < 1 is meaningless in this case, because we have a default value which is 1. Any weight < 1 will be ignored.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Jeffwan, k82cn, lminzhw The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
afe52b3 to
afe2ce5
Compare
Travis tests have failedHey @lminzhw, TravisBuddy Request Identifier: ef788370-c557-11e9-8712-75d78f9b457f |
afe2ce5 to
2f918a5
Compare
|
@ALL, sorry for the delayed reply. And this PR leaves two unsolved problem now:
Waiting for your response~ |
|
/lgtm help to re-label lgtm, and it LGTM to :) |
Implementation of: https://github.com/volcano-sh/volcano/blob/master/docs/design/fairshare.md
support fair share based on weight among namespaces.
Part of: #244