koordlet: support eviction trigger by allocatable#2780
koordlet: support eviction trigger by allocatable#2780lijunxin559 wants to merge 2 commits intokoordinator-sh:mainfrom
Conversation
28e5db1 to
e7bb9ac
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #2780 +/- ##
==========================================
+ Coverage 67.91% 68.00% +0.08%
==========================================
Files 513 513
Lines 52258 52539 +281
==========================================
+ Hits 35491 35728 +237
- Misses 13719 13754 +35
- Partials 3048 3057 +9
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
saintube
left a comment
There was a problem hiding this comment.
@lijunxin559 Could you please update the user manual for the new features?
26851b5 to
02efad6
Compare
|
|
@lijunxin559 Do not forget to sign off your commit for the DCO check. |
|
@lijunxin559 Please also revise the error message for the issue reported by #2782. |
1ea9924 to
d948e31
Compare
a4bcedd to
34bc444
Compare
34bc444 to
e85a299
Compare
Signed-off-by: lijunxin <lijunxin.ljx@alibaba-inc.com>
e85a299 to
47ecbbb
Compare
|
/lgtm |
Signed-off-by: lijunxin <lijunxin.ljx@alibaba-inc.com>
47ecbbb to
b624568
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/lgtm |
| continue | ||
| } | ||
| // mid/batch are milli format on node: to confirm | ||
| rqValue := float32(qosmanagerUtil.ConvertQuantityToInt64(rt, rq)) |
There was a problem hiding this comment.
The precision of float32 is only about 7 digits, which may result in loss of precision in memory scenarios
| overall[rt] = rq | ||
| continue | ||
| } | ||
| rqValue := float32(qosmanagerUtil.ConvertQuantityToInt64(rt, rq)) |
| if _, ok := releaseTypes[target]; ok { | ||
| getPodResourceFuncs = append(getPodResourceFuncs, func(info *PodEvictInfo) map[ReleaseTargetType]corev1.ResourceList { | ||
| return map[ReleaseTargetType]corev1.ResourceList{ | ||
| target: getPodResourceFunc(info), |
There was a problem hiding this comment.
Is this target a loop variable? Go1.22 has already solved this problem, but the Go version of this project is 1.21, so it should be noted
| return nil, fmt.Errorf("unknown feature: %v", feature) | ||
| } | ||
| if err := isConfigValid(thresholdConfig, feature); err != nil { | ||
| if err := generateConfigCheck(feature)(thresholdConfig); err != nil { |
There was a problem hiding this comment.
GenerateConfigCheck returns nil for unknown features, so here we should directly panic
| requestedOnNode := make(corev1.ResourceList) | ||
| priorityThreshold := *thresholdConfig.AllocatableEvictPriorityThreshold | ||
| allocatableEvictThreshold := *thresholdConfig.CPUAllocatableEvictThresholdPercent | ||
| allocatableEvictLowerThreshold := *thresholdConfig.CPUAllocatableEvictLowerPercent |
There was a problem hiding this comment.
In the isAllocateableThresholdConfig Valid code, lowerPercent:=int64 (0), there will be no error when nil is detected in the judgment, and 0 will be used to continue the verification. However, if you attempt to dereference in this area, it will directly panic,The code for memory is the same
|
The priority determination in the calculateFunc of CPU and Memory seems to be inconsistent. Is this within the expected range? |
Ⅰ. Describe what this PR does
For the mid/batch resources provided by slo-manager, as they are oversold on the basis of prod resources, when slo manager's node water level is too high or the slo config configuration changes, the mid/batch resources for the next round of calculation updates may be significantly reduced. At this time, relying solely on the physical water level of the koordlet single machine side to trigger eviction may not be timely enough, and in severe cases, it may even cause machine downtime.
Ⅱ. Does this pull request fix one issue?
We provide allocatable based eviction method on a single machine side that evicts related pods and resource based on priority to avoid the above situation in advance.
Ⅲ. Describe how to verify it
Ⅳ. Special notes for reviews
V. Checklist
make test