Skip to content

koordlet: add cpuset sharepool cpu info to metrics.#2800

Open
tan90github wants to merge 2 commits intokoordinator-sh:mainfrom
tan90github:add-cpuset-info-metrics
Open

koordlet: add cpuset sharepool cpu info to metrics.#2800
tan90github wants to merge 2 commits intokoordinator-sh:mainfrom
tan90github:add-cpuset-info-metrics

Conversation

@tan90github
Copy link
Contributor

Ⅰ. Describe what this PR does

This PR adds the koordlet_cpuset_share_pool_info metric, which exports the specific CPU IDs currently assigned to the Share Pool. The same applies to the BE Share Pool.

For CPU pinning scenarios (e.g., LSE/LSR pods), we should monitor the Share Pool's usage. If too many exclusive cpus are allocated, non-pinned pods may be crowded into a small Share Pool, causing severe CPU contention and node hotspots.

With the koordlet_cpuset_share_pool_info metric, we can easily monitor the CPU utilization of the sharepool through the following PromQL metric, and prevent hot spot issues in advance.

avg(
  irate(node_cpu_seconds_total{instance=~"kind-worker1", mode=~"user"}[5m])
  and on(node, cpu)
  koordlet_cpuset_share_pool_info
)

Ⅱ. Does this pull request fix one issue?

no

Ⅲ. Describe how to verify it

Go to prometheus, and query follow promql

  • koordlet_cpuset_share_pool_info{}
  • koordlet_cpuset_be_share_pool_info{}

Ⅳ. Special notes for reviews

V. Checklist

  • I have written necessary docs and comments
  • I have added necessary unit tests and integration tests
  • All checks passed in make test

Signed-off-by: wangyang60 <wangyang60@xiaomi.com>
Signed-off-by: tan90github <wangy9834@163.com>
@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

❌ Patch coverage is 5.26316% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 67.90%. Comparing base (5d018da) to head (cb619c6).
⚠️ Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
pkg/koordlet/metrics/cpu_cpuset.go 0.00% 12 Missing ⚠️
pkg/koordlet/runtimehooks/hooks/cpuset/rule.go 14.28% 5 Missing and 1 partial ⚠️

❌ Your patch check has failed because the patch coverage (5.26%) is below the target coverage (70.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2800      +/-   ##
==========================================
+ Coverage   67.89%   67.90%   +0.01%     
==========================================
  Files         513      513              
  Lines       52180    52302     +122     
==========================================
+ Hits        35425    35516      +91     
- Misses      13712    13740      +28     
- Partials     3043     3046       +3     
Flag Coverage Δ
unittests 67.90% <5.26%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tan90github
Copy link
Contributor Author

/assign @saintube

Signed-off-by: tan90github <wangy9834@163.com>
@koordinator-bot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
To complete the pull request process, please assign zwzhang0107 after the PR has been reviewed.
You can assign the PR to them by writing /assign @zwzhang0107 in a comment when ready.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants