Skip to content

Conversation

@bufferflies
Copy link
Contributor

@bufferflies bufferflies commented Feb 5, 2026

What problem does this PR solve?

Issue Number: Ref #9212

The client will be used by client-go
https://github.com/tikv/client-go/blob/362e1a226b73496efad146f1ef24f26c831f88e9/internal/locate/region_cache.go#L2270-L2329

What is changed and how does it work?

1. Don't wait to get pd leader URL
2. Add a new region/store function to require that a request must be sent to the PD leader.

Check List

Tests

  • Unit test
  • Integration test

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

  • New Features

    • Added options to restrict handling to PD leader only for region queries and for store requests.
  • Refactor

    • Adjusted member synchronization loop timing, changing when loop exit and event checks occur during each iteration.
  • Tests

    • Added unit tests validating the region/store option behaviors, including the new leader-only options.

Signed-off-by: tongjian <1045931706@qq.com>
@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. labels Feb 5, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 5, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign andremouche for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Feb 5, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 5, 2026

📝 Walkthrough

Walkthrough

Moved the select handling for cancellation/ticker/membership to the end of each iteration in updatePDMemberLoop; added two PD-leader-only option constructors (WithPDLeaderHandleStoreRequestOnly, WithAllowPDLeaderOnly) and a test verifying option behavior; minor go.mod edits.

Changes

Cohort / File(s) Summary
Server loop control flow
pkg/mcs/router/server/sync.go
Moved the select that handled server context cancellation, ticker ticks, and checkMembershipCh from the start of updatePDMemberLoop to after member-list processing, deferring event handling until the end of each loop iteration.
Client option API & tests
client/opt/option.go, client/opt/option_test.go
Added exported option constructors WithPDLeaderHandleStoreRequestOnly() (GetStoreOption) and WithAllowPDLeaderOnly() (GetRegionOption) to control router/follower handling; added TestOptions verifying defaults and option behavior.
Module file
go.mod
Minor module/dependency edits (+7/−7 lines).

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐇 I hopped the loop and nudged the beat,
Members first, then signals meet.
A leader's ribbon sewn on tight,
Tests twitch whiskers in the night.
🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 71.43% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check ❓ Inconclusive The description covers key sections but has gaps: it includes a reference (Ref #9212) and commit message explaining changes, but lacks a formal 'Issue Number: Close #xxx' line as required by template, lacks concrete test details despite checking test boxes, and omits specifics about configuration/API/data changes despite checking those boxes. Add formal 'Issue Number: Close #xxxx' line, specify which tests were added with details, and clarify what configuration, API, or data changes exist if those checkboxes apply.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat[router]: make sync pd leader faster' directly relates to the main objective stated in the PR description: making PD leader synchronization faster. It clearly and specifically summarizes the primary change.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

🧹 Recent nitpick comments
client/opt/option_test.go (1)

115-133: LGTM!

Good test coverage for the new option constructors. The test validates the full lifecycle: default → enable → PD-leader-only revert, for both GetRegionOp and GetStoreOp.

Nit: Line 118 asserts only AllowFollowerHandle default but not AllowRouterServiceHandle default before enabling, while the GetStoreOp block (line 128) does assert the default. Consider adding re.False(op.AllowRouterServiceHandle) after line 118 for symmetry, though not strictly necessary since Go zero-values guarantee this.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
pkg/mcs/router/server/sync.go (1)

125-129: ⚠️ Potential issue | 🟠 Major

Avoid busy-loop on member list errors after moving the select to loop end.

continue on a ListEtcdMembers error now skips the new wait block, which can spin the loop and spam logs/CPU when etcd is down. Ensure the select is still hit on the error path (or add a backoff).

🔧 Suggested fix (keep new ordering but always wait)
-    members, err := etcdutil.ListEtcdMembers(s.serverCtx, s.getClient())
-    if err != nil {
-        log.Warn("failed to list members", errs.ZapError(err))
-        continue
-    }
-    for _, ep := range members.Members {
+    members, err := etcdutil.ListEtcdMembers(s.serverCtx, s.getClient())
+    if err != nil {
+        log.Warn("failed to list members", errs.ZapError(err))
+    } else {
+        for _, ep := range members.Members {
             if len(ep.GetClientURLs()) == 0 { // This member is not started yet.
                 log.Info("member is not started yet", zap.String("member-id", strconv.FormatUint(ep.GetID(), 16)), errs.ZapError(err))
                 continue
             }
             status, err := s.getClient().Status(s.serverCtx, ep.ClientURLs[0])
             if err != nil {
                 log.Info("failed to get status of member", zap.String("member-id", strconv.FormatUint(ep.ID, 16)), zap.String("endpoint", ep.ClientURLs[0]), errs.ZapError(err))
                 continue
             }
             if status.Leader != ep.ID {
                 continue
             }
             leaderAddr := ep.ClientURLs[0]
             if s.pdLeaderAddr.CompareAndSwap(s.pdLeaderAddr.Load(), leaderAddr) {
                 if status.Leader != curLeader {
                     log.Info("switch PD leader", zap.String("leader-id", strconv.FormatUint(ep.ID, 16)), zap.String("endpoint", ep.ClientURLs[0]))
                     s.reconnectCh <- true
                 }
                 curLeader = ep.ID
                 break
             }
-        }
+        }
+    }

Also applies to: 153-159

Signed-off-by: tongjian <1045931706@qq.com>
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 6, 2026
Signed-off-by: tongjian <1045931706@qq.com>
@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.62%. Comparing base (a1f2017) to head (276da6e).

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10223      +/-   ##
==========================================
- Coverage   78.62%   78.62%   -0.01%     
==========================================
  Files         520      520              
  Lines       70089    70096       +7     
==========================================
+ Hits        55105    55110       +5     
- Misses      10998    11000       +2     
  Partials     3986     3986              
Flag Coverage Δ
unittests 78.62% <100.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

}

// WithPDLeaderOnlyStoreRequest means the store request must be handled by PD leader.
func WithPDLeaderOnlyStoreRequest() GetStoreOption {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coderabbitai is there a better name for both WithPDLeaderOnlyStoreRequest and WithPDLeaderOnly?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rleungx Yes, I think the naming could be improved for better consistency. Here are my suggestions:

Option 1: Use the same name for both (recommended)

  • WithPDLeaderOnly() for both store and region options

Since GetStoreOption and GetRegionOption are different function types, Go's type system naturally disambiguates them. This approach:

  • Eliminates redundancy (the "StoreRequest" suffix is implied by the type)
  • Creates consistency across the API
  • Follows the principle of conciseness seen in options like WithBuckets() and WithExcludeTombstone()

Option 2: Keep them distinct but shorter

  • WithPDLeaderOnlyStore() for store options
  • WithPDLeaderOnlyRegion() for region options

This maintains explicit distinction but is still more concise than WithPDLeaderOnlyStoreRequest.

My recommendation: Go with Option 1 and use WithPDLeaderOnly() for both. The type signatures make the context clear, and it's more idiomatic Go style. You could apply the same simplification to WithAllowRouterServiceHandleStoreRequest()WithAllowRouterServiceHandle() for consistency across the entire API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have renamed as WithPDLeaderHandleStoreRequestOnly and WithAllowPDLeaderOnly

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

Signed-off-by: tongjian <1045931706@qq.com>
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 9, 2026

@bufferflies: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-unit-test-next-gen-1 d210b55 link true /test pull-unit-test-next-gen-1
pull-unit-test-next-gen-2 d210b55 link true /test pull-unit-test-next-gen-2
pull-unit-test-next-gen-3 d210b55 link true /test pull-unit-test-next-gen-3

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dco-signoff: yes Indicates the PR's author has signed the dco. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants