Skip to content

Conversation

@lhy1024
Copy link
Contributor

@lhy1024 lhy1024 commented Dec 10, 2025

What problem does this PR solve?

Issue Number: Ref #9764

What is changed and how does it work?

Check List

Tests

  • Unit test

Release note

None.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 10, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Dec 10, 2025
@codecov
Copy link

codecov bot commented Dec 10, 2025

Codecov Report

❌ Patch coverage is 82.44898% with 43 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.58%. Comparing base (09f1b5d) to head (9e19804).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10040      +/-   ##
==========================================
+ Coverage   78.51%   78.58%   +0.07%     
==========================================
  Files         513      514       +1     
  Lines       68893    69108     +215     
==========================================
+ Hits        54091    54309     +218     
- Misses      10887    10889       +2     
+ Partials     3915     3910       -5     
Flag Coverage Δ
unittests 78.58% <82.44%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@lhy1024 lhy1024 force-pushed the affinity-pr4 branch 2 times, most recently from 731212b to a699d6e Compare December 11, 2025 10:04
@lhy1024 lhy1024 marked this pull request as ready for review December 11, 2025 10:04
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Dec 11, 2025
@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 11, 2025

/retest

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 11, 2025

/retest

Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024 lhy1024 force-pushed the affinity-pr4 branch 2 times, most recently from dae7ba8 to 868b8ac Compare December 12, 2025 02:43
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 12, 2025

/retest

affinityCheckerAbnormalReplicaCounter.Inc()
return nil
}
if filter.HasWitnessPeers(region) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can remove it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we confirm that no clusters are using witness, I believe it can be removed.
BTW, should we create an issue to mark when the witness is to be completely removed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove it


options := make([]core.RegionCreateOption, 0, len(sourceOnly)+1)
for i, sourceStoreID := range sourceOnly {
options = append(options, core.WithReplacePeerStore(sourceStoreID, targetOnly[i]))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the targetOnly always have the same length with sourceOnly? BTW, targetOnly and sourceOnly are confused.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about storesToRemove and storesToAdd?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have check len(sourceVoters) and len(voterStoreIDs), they are removed the same stores.

// so expire the group first, then provide the available Region information and fetch the Group state again.
if !isAffinity {
targetRegion := cloneRegionWithReplacePeerStores(region, group.LeaderStoreID, group.VoterStoreIDs...)
if targetRegion == nil || !filter.IsRegionReplicated(c.cluster, targetRegion) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When will the targetRegion not RegionReplicated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

placement rules changed

Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
// WithLeaderStore sets the voter peer on leaderStoreID as the leader.
func WithLeaderStore(leaderStoreID uint64) RegionCreateOption {
return func(region *RegionInfo) {
for _, p := range region.GetPeers() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should iterate over voters, not all peers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, RegionInfo.Clone will create &RegionInfo firstly, then apply opts(contains WithLeaderStore), then set voters(call classifyVoterAndLearner(region)). So voters is nil in opts.

So we iterate peers and check whether is learner.

affinityCheckerAbnormalReplicaCounter.Inc()
return nil
}
if filter.HasWitnessPeers(region) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, we can

// Preserve all existing learner peers
// Since regions that violate placement rules won't reach this function,
// we can safely assume all existing learners should be preserved
for _, learner := range region.GetLearners() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to change learner

Copy link
Contributor Author

@lhy1024 lhy1024 Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't change learners. This code preserves all existing learner peers (e.g., TiFlash replicas) by including them in the target roles map.

If a learner's storeID conflicts with an affinity voter, we abort the operator creation (return nil) instead of modifying the learner

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestAffinityCheckerPreserveLearners keeps all learners
TestAffinityCheckerLearnerVoterConflict tests the case that learner and leader are conflict

).SetPeers(peers).SetExpectedRoles(roles)

// Skip building if target leader store currently disallows leader in (e.g., evict-leader / reject-leader).
if targetLeader := c.cluster.GetStore(group.LeaderStoreID); targetLeader != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this check should place before the build

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 12, 2025

/retest

ops := c.affinityChecker.Check(region)
if len(ops) > 0 {
opKind := ops[0].Kind()
if (opKind & operator.OpMerge) == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there is no way to control the speed independently?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not neccessary to introduce a new limit for affinity scheduling

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it affects the performance?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally speaking, once it enters the stable state, no further scheduling occurs, and we can restrict it using region-schedule-limit and merge-schedule-limit.

Simultaneously, we can disable affinity scheduling to prevent it from affecting the entire cluster.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to decouple these parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after discussion, we use a affinity-schedule-limit

Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot bot removed the approved label Dec 15, 2025
Signed-off-by: lhy1024 <admin@liudos.us>
@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 15, 2025

/retest

2 similar comments
@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 15, 2025

/retest

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 15, 2025

/retest

Signed-off-by: lhy1024 <admin@liudos.us>
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Dec 16, 2025
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 16, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-12-15 03:01:52.342699513 +0000 UTC m=+1442057.156477086: ☑️ agreed by bufferflies.
  • 2025-12-16 02:26:22.790132081 +0000 UTC m=+1526327.603909653: ☑️ agreed by rleungx.

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 16, 2025

/retest

@lhy1024
Copy link
Contributor Author

lhy1024 commented Dec 16, 2025

@niubell PTAL

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Dec 16, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bufferflies, niubell, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Dec 16, 2025
@ti-chi-bot ti-chi-bot bot merged commit fdc1cdf into tikv:master Dec 16, 2025
40 of 42 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #10074.
But this PR has conflicts, please resolve them!

ti-chi-bot pushed a commit to ti-chi-bot/pd that referenced this pull request Dec 16, 2025
ref tikv#9764

Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
lhy1024 added a commit to lhy1024/pd that referenced this pull request Dec 16, 2025
ref tikv#9764

Signed-off-by: lhy1024 <admin@liudos.us>
lhy1024 added a commit to lhy1024/pd that referenced this pull request Dec 16, 2025
ref tikv#9764

Signed-off-by: lhy1024 <admin@liudos.us>
lhy1024 added a commit to lhy1024/pd that referenced this pull request Dec 16, 2025
ref tikv#9764

Signed-off-by: lhy1024 <admin@liudos.us>
HunDunDM pushed a commit to HunDunDM/pd that referenced this pull request Dec 16, 2025
ref tikv#9764

Signed-off-by: lhy1024 <admin@liudos.us>
lhy1024 added a commit to lhy1024/pd that referenced this pull request Dec 16, 2025
ref tikv#9764

Signed-off-by: lhy1024 <admin@liudos.us>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. release-note-none Denotes a PR that doesn't merit a release note. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants