-
Notifications
You must be signed in to change notification settings - Fork 753
schedule: add affinity checker #10040
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Skipping CI for Draft Pull Request. |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10040 +/- ##
==========================================
+ Coverage 78.51% 78.58% +0.07%
==========================================
Files 513 514 +1
Lines 68893 69108 +215
==========================================
+ Hits 54091 54309 +218
- Misses 10887 10889 +2
+ Partials 3915 3910 -5
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
731212b to
a699d6e
Compare
|
/retest |
a699d6e to
47dff6e
Compare
|
/retest |
Signed-off-by: lhy1024 <admin@liudos.us>
dae7ba8 to
868b8ac
Compare
Signed-off-by: lhy1024 <admin@liudos.us>
|
/retest |
| affinityCheckerAbnormalReplicaCounter.Inc() | ||
| return nil | ||
| } | ||
| if filter.HasWitnessPeers(region) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we confirm that no clusters are using witness, I believe it can be removed.
BTW, should we create an issue to mark when the witness is to be completely removed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove it
|
|
||
| options := make([]core.RegionCreateOption, 0, len(sourceOnly)+1) | ||
| for i, sourceStoreID := range sourceOnly { | ||
| options = append(options, core.WithReplacePeerStore(sourceStoreID, targetOnly[i])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the targetOnly always have the same length with sourceOnly? BTW, targetOnly and sourceOnly are confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about storesToRemove and storesToAdd?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have check len(sourceVoters) and len(voterStoreIDs), they are removed the same stores.
| // so expire the group first, then provide the available Region information and fetch the Group state again. | ||
| if !isAffinity { | ||
| targetRegion := cloneRegionWithReplacePeerStores(region, group.LeaderStoreID, group.VoterStoreIDs...) | ||
| if targetRegion == nil || !filter.IsRegionReplicated(c.cluster, targetRegion) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When will the targetRegion not RegionReplicated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
placement rules changed
Signed-off-by: lhy1024 <admin@liudos.us>
| // WithLeaderStore sets the voter peer on leaderStoreID as the leader. | ||
| func WithLeaderStore(leaderStoreID uint64) RegionCreateOption { | ||
| return func(region *RegionInfo) { | ||
| for _, p := range region.GetPeers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should iterate over voters, not all peers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, RegionInfo.Clone will create &RegionInfo firstly, then apply opts(contains WithLeaderStore), then set voters(call classifyVoterAndLearner(region)). So voters is nil in opts.
So we iterate peers and check whether is learner.
| affinityCheckerAbnormalReplicaCounter.Inc() | ||
| return nil | ||
| } | ||
| if filter.HasWitnessPeers(region) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, we can
| // Preserve all existing learner peers | ||
| // Since regions that violate placement rules won't reach this function, | ||
| // we can safely assume all existing learners should be preserved | ||
| for _, learner := range region.GetLearners() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to change learner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't change learners. This code preserves all existing learner peers (e.g., TiFlash replicas) by including them in the target roles map.
If a learner's storeID conflicts with an affinity voter, we abort the operator creation (return nil) instead of modifying the learner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TestAffinityCheckerPreserveLearners keeps all learners
TestAffinityCheckerLearnerVoterConflict tests the case that learner and leader are conflict
| ).SetPeers(peers).SetExpectedRoles(roles) | ||
|
|
||
| // Skip building if target leader store currently disallows leader in (e.g., evict-leader / reject-leader). | ||
| if targetLeader := c.cluster.GetStore(group.LeaderStoreID); targetLeader != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this check should place before the build
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
|
/retest |
| ops := c.affinityChecker.Check(region) | ||
| if len(ops) > 0 { | ||
| opKind := ops[0].Kind() | ||
| if (opKind & operator.OpMerge) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there is no way to control the speed independently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not neccessary to introduce a new limit for affinity scheduling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it affects the performance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, once it enters the stable state, no further scheduling occurs, and we can restrict it using region-schedule-limit and merge-schedule-limit.
Simultaneously, we can disable affinity scheduling to prevent it from affecting the entire cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer to decouple these parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
after discussion, we use a affinity-schedule-limit
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
|
/retest |
2 similar comments
|
/retest |
|
/retest |
Signed-off-by: lhy1024 <admin@liudos.us>
[LGTM Timeline notifier]Timeline:
|
|
/retest |
|
@niubell PTAL |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bufferflies, niubell, rleungx The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
In response to a cherrypick label: new pull request created to branch |
ref tikv#9764 Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
ref tikv#9764 Signed-off-by: lhy1024 <admin@liudos.us>
ref tikv#9764 Signed-off-by: lhy1024 <admin@liudos.us>
ref tikv#9764 Signed-off-by: lhy1024 <admin@liudos.us>
ref tikv#9764 Signed-off-by: lhy1024 <admin@liudos.us>
ref tikv#9764 Signed-off-by: lhy1024 <admin@liudos.us>
What problem does this PR solve?
Issue Number: Ref #9764
What is changed and how does it work?
Check List
Tests
Release note