koordlet: add cpuset sharepool cpu info to metrics.#2800
koordlet: add cpuset sharepool cpu info to metrics.#2800tan90github wants to merge 2 commits intokoordinator-sh:mainfrom
Conversation
Signed-off-by: wangyang60 <wangyang60@xiaomi.com> Signed-off-by: tan90github <wangy9834@163.com>
Codecov Report❌ Patch coverage is
❌ Your patch check has failed because the patch coverage (5.26%) is below the target coverage (70.00%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## main #2800 +/- ##
==========================================
+ Coverage 67.89% 67.90% +0.01%
==========================================
Files 513 513
Lines 52180 52302 +122
==========================================
+ Hits 35425 35516 +91
- Misses 13712 13740 +28
- Partials 3043 3046 +3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/assign @saintube |
Signed-off-by: tan90github <wangy9834@163.com>
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
Ⅰ. Describe what this PR does
This PR adds the
koordlet_cpuset_share_pool_infometric, which exports the specific CPU IDs currently assigned to the Share Pool. The same applies to the BE Share Pool.For CPU pinning scenarios (e.g., LSE/LSR pods), we should monitor the Share Pool's usage. If too many exclusive cpus are allocated, non-pinned pods may be crowded into a small Share Pool, causing severe CPU contention and node hotspots.
With the
koordlet_cpuset_share_pool_infometric, we can easily monitor the CPU utilization of the sharepool through the following PromQL metric, and prevent hot spot issues in advance.Ⅱ. Does this pull request fix one issue?
no
Ⅲ. Describe how to verify it
Go to prometheus, and query follow promql
Ⅳ. Special notes for reviews
V. Checklist
make test