Skip to content

koordlet: support eviction trigger by allocatable#2780

Open
lijunxin559 wants to merge 2 commits intokoordinator-sh:mainfrom
lijunxin559:support-eviction-trigger-by-allocatable
Open

koordlet: support eviction trigger by allocatable#2780
lijunxin559 wants to merge 2 commits intokoordinator-sh:mainfrom
lijunxin559:support-eviction-trigger-by-allocatable

Conversation

@lijunxin559
Copy link
Contributor

Ⅰ. Describe what this PR does

For the mid/batch resources provided by slo-manager, as they are oversold on the basis of prod resources, when slo manager's node water level is too high or the slo config configuration changes, the mid/batch resources for the next round of calculation updates may be significantly reduced. At this time, relying solely on the physical water level of the koordlet single machine side to trigger eviction may not be timely enough, and in severe cases, it may even cause machine downtime.

Ⅱ. Does this pull request fix one issue?

We provide allocatable based eviction method on a single machine side that evicts related pods and resource based on priority to avoid the above situation in advance.

Ⅲ. Describe how to verify it

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

@lijunxin559 lijunxin559 force-pushed the support-eviction-trigger-by-allocatable branch 4 times, most recently from 28e5db1 to e7bb9ac Compare January 16, 2026 09:09
@codecov
Copy link

codecov bot commented Jan 16, 2026

Codecov Report

❌ Patch coverage is 84.96583% with 66 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.00%. Comparing base (f559bd1) to head (b624568).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
.../koordlet/qosmanager/plugins/cpuevict/cpu_evict.go 84.41% 19 Missing and 5 partials ⚠️
...let/qosmanager/plugins/memoryevict/memory_evict.go 89.79% 10 Missing and 5 partials ⚠️
pkg/koordlet/qosmanager/plugins/util/evict.go 88.00% 10 Missing and 5 partials ⚠️
pkg/util/extended_resource.go 0.00% 12 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2780      +/-   ##
==========================================
+ Coverage   67.91%   68.00%   +0.08%     
==========================================
  Files         513      513              
  Lines       52258    52539     +281     
==========================================
+ Hits        35491    35728     +237     
- Misses      13719    13754      +35     
- Partials     3048     3057       +9     
Flag Coverage Δ
unittests 68.00% <84.96%> (+0.08%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@saintube saintube left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lijunxin559 Could you please update the user manual for the new features?

@lijunxin559 lijunxin559 force-pushed the support-eviction-trigger-by-allocatable branch 2 times, most recently from 26851b5 to 02efad6 Compare January 20, 2026 09:18
@lijunxin559
Copy link
Contributor Author

@lijunxin559 Could you please update the user manual for the new features?
Ok, no problem. After testing this part of the functionality, I will further refine the usage details.

@saintube
Copy link
Member

@lijunxin559 Do not forget to sign off your commit for the DCO check.

@saintube
Copy link
Member

@lijunxin559 Please also revise the error message for the issue reported by #2782.

@lijunxin559 lijunxin559 force-pushed the support-eviction-trigger-by-allocatable branch 3 times, most recently from 1ea9924 to d948e31 Compare January 22, 2026 07:19
@ZiMengSheng ZiMengSheng added this to the v1.8 milestone Jan 22, 2026
@lijunxin559 lijunxin559 force-pushed the support-eviction-trigger-by-allocatable branch 3 times, most recently from a4bcedd to 34bc444 Compare January 28, 2026 08:39
@lijunxin559 lijunxin559 force-pushed the support-eviction-trigger-by-allocatable branch from 34bc444 to e85a299 Compare February 6, 2026 05:57
Signed-off-by: lijunxin <lijunxin.ljx@alibaba-inc.com>
@lijunxin559 lijunxin559 force-pushed the support-eviction-trigger-by-allocatable branch from e85a299 to 47ecbbb Compare February 6, 2026 06:11
@saintube
Copy link
Member

/lgtm
PTAL /cc @zwzhang0107

Signed-off-by: lijunxin <lijunxin.ljx@alibaba-inc.com>
@lijunxin559 lijunxin559 force-pushed the support-eviction-trigger-by-allocatable branch from 47ecbbb to b624568 Compare February 12, 2026 09:24
@koordinator-bot koordinator-bot bot removed the lgtm label Feb 12, 2026
@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please ask for approval from saintube and additionally assign zwzhang0107 after the PR has been reviewed.
You can assign the PR to them by writing /assign @zwzhang0107 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@saintube
Copy link
Member

/lgtm

@koordinator-bot koordinator-bot bot added the lgtm label Feb 12, 2026
continue
}
// mid/batch are milli format on node: to confirm
rqValue := float32(qosmanagerUtil.ConvertQuantityToInt64(rt, rq))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The precision of float32 is only about 7 digits, which may result in loss of precision in memory scenarios

overall[rt] = rq
continue
}
rqValue := float32(qosmanagerUtil.ConvertQuantityToInt64(rt, rq))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

likewise

if _, ok := releaseTypes[target]; ok {
getPodResourceFuncs = append(getPodResourceFuncs, func(info *PodEvictInfo) map[ReleaseTargetType]corev1.ResourceList {
return map[ReleaseTargetType]corev1.ResourceList{
target: getPodResourceFunc(info),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this target a loop variable? Go1.22 has already solved this problem, but the Go version of this project is 1.21, so it should be noted

return nil, fmt.Errorf("unknown feature: %v", feature)
}
if err := isConfigValid(thresholdConfig, feature); err != nil {
if err := generateConfigCheck(feature)(thresholdConfig); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GenerateConfigCheck returns nil for unknown features, so here we should directly panic

requestedOnNode := make(corev1.ResourceList)
priorityThreshold := *thresholdConfig.AllocatableEvictPriorityThreshold
allocatableEvictThreshold := *thresholdConfig.CPUAllocatableEvictThresholdPercent
allocatableEvictLowerThreshold := *thresholdConfig.CPUAllocatableEvictLowerPercent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the isAllocateableThresholdConfig Valid code, lowerPercent:=int64 (0), there will be no error when nil is detected in the judgment, and 0 will be used to continue the verification. However, if you attempt to dereference in this area, it will directly panic,The code for memory is the same

@AutuSnow
Copy link
Contributor

The priority determination in the calculateFunc of CPU and Memory seems to be inconsistent. Is this within the expected range?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants