-
Notifications
You must be signed in to change notification settings - Fork 753
affinity: add scatter filter, add more evict check and avoid statistic miss #10080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10080 +/- ##
=======================================
Coverage 78.49% 78.49%
=======================================
Files 515 515
Lines 69236 69253 +17
=======================================
+ Hits 54344 54358 +14
- Misses 10948 10957 +9
+ Partials 3944 3938 -6
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
|
/retest |
1 similar comment
|
/retest |
| // Check if region is in an affinity group that doesn't allow regular scheduling | ||
| if !r.affinityFilter.Select(region).IsOK() { | ||
| scatterSkipAffinityCounter.Inc() | ||
| return nil, errors.Errorf("region %d is in affinity group", region.GetID()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will it trigger the retry?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: lhy1024 <admin@liudos.us>
0003cd2 to
0560360
Compare
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: HunDunDM <hundundm@gmail.com>
|
@bufferflies @rleungx PTAL |
|
/retest |
Signed-off-by: lhy1024 <admin@liudos.us>
8b76638 to
28d7ebc
Compare
Signed-off-by: lhy1024 <admin@liudos.us>
|
/retest |
|
/retest |
1 similar comment
|
/retest |
pkg/schedule/affinity/manager.go
Outdated
|
|
||
| // GetRegionAffinityGroupState returns the affinity group state and isAffinity for a region. | ||
| func (m *Manager) GetRegionAffinityGroupState(region *core.RegionInfo) (group *GroupState, isAffinity bool) { | ||
| // If skipSaveCache is not set to true, InvalidCache must be called at the appropriate time to prevent stale cache entries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about adding comments about when we need to update the cache?
Signed-off-by: lhy1024 <admin@liudos.us>
bufferflies
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
| } | ||
|
|
||
| ops, err := operator.CreateMergeRegionOperator("admin-merge-region", c, region, target, operator.OpAdmin|operator.OpMerge) | ||
| ops, err := operator.CreateMergeRegionOperator("admin-merge-region", c, region, target, operator.OpAdmin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove the OpAdmin type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| // A Region may no longer exist in the RegionTree due to a merge. | ||
| // In this case, clear the cache in affinity manager for that Region and skip processing it. | ||
| if c.cluster.GetRegion(region.GetID()) == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be put in line 87?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cannot move the check before GetAndCacheRegionAffinityGroupState.
Problem with moving it earlier:
- If check passes, then region is deleted, then HandleOverlaps calls InvalidCache (but cache doesn't
exist yet - no-op) - Then we save cache → stale cache leak (no cleanup path)
Current approach guarantees:
- Save cache first, then check
- Either HandleOverlaps (when region deleted) OR AffinityChecker (when check fails) will clean up
- At least one cleanup point always executes
|
@bufferflies @rleungx PTAL |
72bfcfe to
6e50d3d
Compare
|
/retest |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bufferflies, rleungx The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[LGTM Timeline notifier]Timeline:
|
ref tikv#9764 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
|
In response to a cherrypick label: new pull request created to branch |
…c miss (tikv#10080) ref tikv#9764 Signed-off-by: lhy1024 <admin@liudos.us>
…c miss (tikv#10080) ref tikv#9764 Signed-off-by: lhy1024 <admin@liudos.us> Signed-off-by: HunDunDM <hundundm@gmail.com> # Conflicts: # pkg/schedule/operator/operator_controller.go

What problem does this PR solve?
Issue Number: Ref #9764
What is changed and how does it work?
Check List
Tests
Release note