Skip to content

Conversation

@HunDunDM
Copy link
Member

@HunDunDM HunDunDM commented Jan 28, 2026

What problem does this PR solve?

Issue Number: Ref #10149

What is changed and how does it work?

check location labels in AffinityChecker

Check List

Tests

  • Unit test

Related changes

  • Need to cherry-pick to the release branch

Release note

None.

Summary by CodeRabbit

  • New Features

    • Affinity checker now respects placement rules, can evaluate region fit without cache, and reports region label isolation with a level and satisfaction check.
  • Bug Fixes

    • Improved detection of better replica locations and stricter enforcement of isolation; scheduling correctly respects placement-rules-disabled state.
  • Metrics

    • Added counter for placement-rules-disabled events.
  • Tests

    • New tests for placement-rule scenarios and isolation-level behavior.

✏️ Tip: You can customize this high-level summary in your review settings.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 28, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Jan 28, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 28, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign yisaer for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Jan 28, 2026

📝 Walkthrough

Walkthrough

Adds placement-rule awareness to affinity checking and replica strategy: new RuleManager usage, placement-rule satisfaction checks, better-location logic, witness-aware helpers, exported RuleFit Stores, RegionFit no-cache API, region-label isolation API changes, metric addition, and corresponding tests.

Changes

Cohort / File(s) Summary
Affinity checker
pkg/schedule/checker/affinity_checker.go, pkg/schedule/checker/affinity_checker_test.go
Adds ruleManager field; implements isRegionPlacementRuleSatisfiedWithBestLocation and strategy; early-exits when placement rules disabled; updates tests for best-location, isolation, and disabled rules.
Metrics
pkg/schedule/checker/metrics.go
Adds affinityCheckerPlacementRulesDisabledCounter Prometheus counter (label: placement-rules-disabled).
Replica strategy utilities
pkg/schedule/checker/replica_strategy.go
Adds getBetterLocation, isWitnessEnabled, and getRuleFitStores helpers to drive better-location and witness-aware decisions.
Rule checker updates
pkg/schedule/checker/rule_checker.go
Removes method-local isWitnessEnabled/getRuleFitStores; uses package-level helpers; integrates strategy.getBetterLocation(...) to determine old/new store IDs and update peer construction.
Placement fit & manager
pkg/schedule/placement/fit.go, pkg/schedule/placement/rule_manager.go
Exports RuleFit.Stores (was private stores); adds RuleManager.FitRegionWithoutCache(storeSet, region) to compute RegionFit without cache.
Statistics: region label isolation
pkg/statistics/region_collection.go, pkg/statistics/region_collection_test.go
GetRegionLabelIsolation now returns (string, int) (label and level); adds IsRegionLabelIsolationSatisfied(...); updates tests to assert label and level.
pd-ctl usage
tools/pd-ctl/pdctl/command/label_command.go
Adjusts call site to ignore second return value from GetRegionLabelIsolation (isolationLabel, _ := ...).

Sequence Diagram

sequenceDiagram
    participant AC as AffinityChecker
    participant RM as RuleManager
    participant RF as RegionFit
    participant RS as ReplicaStrategy
    participant CL as Cluster

    AC->>RM: FitRegion / FitRegionWithoutCache(region)
    RM->>RF: build RegionFit (stores, rules)
    RM-->>AC: return RegionFit

    AC->>AC: isRegionPlacementRuleSatisfiedWithBestLocation(region, isReal)
    AC->>RS: strategy(region, rule, fastFailover)
    RS-->>AC: ReplicaStrategy

    AC->>RS: getBetterLocation(ruleFit, region, fastFailover)
    RS-->>AC: (oldStoreID, newStoreID, filteredByTempState)

    AC->>CL: consult witness/cluster info
    AC-->>AC: decide satisfied / block / schedule
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested Labels

size/XXL

Suggested Reviewers

  • okJiang
  • ystaticy
  • lhy1024
  • rleungx

Poem

🐰 I hopped through rules and stores today,
I nudged each fit in a thoughtful, tidy way.
Best-locations found, some peers rearranged,
Metrics counted, tests expanded — configs changed.
A little rabbit cheered: "Scheduler, hooray!"

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 63.64% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately reflects the main change: replacing IsRegionReplicated checks with isRegionPlacementRuleSatisfiedWithBestLocation in the AffinityChecker, which is the primary focus across multiple files.
Description check ✅ Passed The description includes the issue reference (Ref #10149) and a commit message describing the change. However, it lacks detail about what specifically changed and how the solution works, and the checklist is minimal with only 'Unit test' selected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@ti-chi-bot ti-chi-bot bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 28, 2026
@codecov
Copy link

codecov bot commented Jan 28, 2026

Codecov Report

❌ Patch coverage is 96.07843% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.59%. Comparing base (b668f43) to head (cc4a660).
⚠️ Report is 2 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #10192   +/-   ##
=======================================
  Coverage   78.59%   78.59%           
=======================================
  Files         520      520           
  Lines       70014    70080   +66     
=======================================
+ Hits        55028    55082   +54     
- Misses      11008    11017    +9     
- Partials     3978     3981    +3     
Flag Coverage Δ
unittests 78.59% <96.07%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@lhy1024 lhy1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the way is OK, but we need some tests for it.

Comment on lines +518 to +522
isWitness := rf.Rule.IsWitness && isWitnessEnabled(c.cluster)
// If the peer to be moved is a witness, since no snapshot is needed, we also reuse the fast failover logic.
strategy := c.strategy(region, rf.Rule, isWitness)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we need not to consider witness

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for compatibility reasons, as the original code was retained.

Signed-off-by: HunDunDM <hundundm@gmail.com>
Signed-off-by: HunDunDM <hundundm@gmail.com>
Signed-off-by: HunDunDM <hundundm@gmail.com>
Signed-off-by: HunDunDM <hundundm@gmail.com>
Signed-off-by: HunDunDM <hundundm@gmail.com>
Signed-off-by: HunDunDM <hundundm@gmail.com>
Signed-off-by: HunDunDM <hundundm@gmail.com>
Signed-off-by: HunDunDM <hundundm@gmail.com>
@HunDunDM HunDunDM force-pushed the 20260127-best-location branch from 8929d83 to 7243c75 Compare January 30, 2026 05:04
@ti-chi-bot ti-chi-bot bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jan 30, 2026
@HunDunDM HunDunDM marked this pull request as ready for review January 30, 2026 05:07
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 30, 2026
Signed-off-by: HunDunDM <hundundm@gmail.com>
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@pkg/schedule/checker/affinity_checker.go`:
- Around line 499-533: In isRegionPlacementRuleSatisfiedWithBestLocation, guard
against a nil RegionFit returned by ruleManager by checking if fit == nil
immediately after assigning it (both for FitRegion and FitRegionWithoutCache)
and return false if nil; this prevents dereferencing fit later in the function
(e.g., when calling fit.IsSatisfied() or iterating fit.RuleFits).

In `@pkg/schedule/placement/fit.go`:
- Around line 141-143: The exported field Stores in the RuleFit struct lacks a
JSON tag and should follow the existing kebab-case tagging pattern; update the
Stores field declaration in RuleFit to include json:"stores,omitempty" (i.e.,
add the explicit tag to the Stores []*core.StoreInfo field) so it matches other
exported fields like Rule, Peers, and IsolationScore.

Comment on lines 141 to 143
// Stores is the stores that the peers are placed in.
Stores []*core.StoreInfo
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# First, let's examine the struct in fit.go around lines 141-143
cat -n pkg/schedule/placement/fit.go | sed -n '130,160p'

Repository: tikv/pd

Length of output: 1302


🏁 Script executed:

# Find the full struct definition
sed -n '130,170p' pkg/schedule/placement/fit.go

Repository: tikv/pd

Length of output: 1394


🏁 Script executed:

# Search for where this struct is used in JSON marshaling/unmarshaling
rg -n "json\." pkg/schedule/placement/fit.go

Repository: tikv/pd

Length of output: 33


🏁 Script executed:

# Check for JSON usage patterns with this struct across the codebase
rg -n "FitResult" --type go | head -20

Repository: tikv/pd

Length of output: 33


🏁 Script executed:

# Check other structs in fit.go for JSON tag patterns
rg -B2 -A2 'json:' pkg/schedule/placement/fit.go

Repository: tikv/pd

Length of output: 1032


🏁 Script executed:

# Search for where RuleFit is marshaled to JSON or used in API responses
rg -n "RuleFit" --type go -A 3 | grep -E "(marshal|json|response|http)" | head -20

Repository: tikv/pd

Length of output: 346


🏁 Script executed:

# Check if there are any API endpoints that return RuleFit
rg -n "Marshal|json\.Marshal" pkg/schedule/placement/fit.go

Repository: tikv/pd

Length of output: 33


🏁 Script executed:

# Search for API handlers or response functions that use RuleFit
rg -l "RuleFit" --type go | xargs grep -l "json\.Marshal\|api\|handler\|response" | head -10

Repository: tikv/pd

Length of output: 205


🏁 Script executed:

# Check the API test to see how RuleFit is used in JSON responses
grep -n "RuleFit\|Stores" tests/server/api/rule_test.go | head -30

Repository: tikv/pd

Length of output: 139


🏁 Script executed:

# Check the rule_manager to understand RuleFit usage
grep -n "RuleFit" pkg/schedule/placement/rule_manager.go -A 3 | head -40

Repository: tikv/pd

Length of output: 660


🏁 Script executed:

# Search for any place where Stores field is actually used
rg -n "\.Stores" pkg/schedule/placement/ --type go

Repository: tikv/pd

Length of output: 247


Add an explicit JSON tag for Stores to match struct export pattern.

The Stores field is exported but lacks a JSON tag, while all other exported fields in RuleFit have explicit tags following kebab-case convention (json:"rule", json:"peers", json:"isolation-score", etc.). Add json:"stores,omitempty" to maintain consistency.

🧩 Proposed change
	// Stores is the stores that the peers are placed in.
	Stores []*core.StoreInfo `json:"stores,omitempty"`
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Stores is the stores that the peers are placed in.
Stores []*core.StoreInfo
}
// Stores is the stores that the peers are placed in.
Stores []*core.StoreInfo `json:"stores,omitempty"`
}
🤖 Prompt for AI Agents
In `@pkg/schedule/placement/fit.go` around lines 141 - 143, The exported field
Stores in the RuleFit struct lacks a JSON tag and should follow the existing
kebab-case tagging pattern; update the Stores field declaration in RuleFit to
include json:"stores,omitempty" (i.e., add the explicit tag to the Stores
[]*core.StoreInfo field) so it matches other exported fields like Rule, Peers,
and IsolationScore.

Signed-off-by: HunDunDM <hundundm@gmail.com>
Signed-off-by: HunDunDM <hundundm@gmail.com>
@HunDunDM
Copy link
Member Author

/retest-required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants