-
Notifications
You must be signed in to change notification settings - Fork 753
feat[router]: make sync pd leader faster #10223
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
📝 WalkthroughWalkthroughMoved the select handling for cancellation/ticker/membership to the end of each iteration in Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
No actionable comments were generated in the recent review. 🎉 🧹 Recent nitpick comments
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pkg/mcs/router/server/sync.go (1)
125-129:⚠️ Potential issue | 🟠 MajorAvoid busy-loop on member list errors after moving the select to loop end.
continueon aListEtcdMemberserror now skips the new wait block, which can spin the loop and spam logs/CPU when etcd is down. Ensure the select is still hit on the error path (or add a backoff).🔧 Suggested fix (keep new ordering but always wait)
- members, err := etcdutil.ListEtcdMembers(s.serverCtx, s.getClient()) - if err != nil { - log.Warn("failed to list members", errs.ZapError(err)) - continue - } - for _, ep := range members.Members { + members, err := etcdutil.ListEtcdMembers(s.serverCtx, s.getClient()) + if err != nil { + log.Warn("failed to list members", errs.ZapError(err)) + } else { + for _, ep := range members.Members { if len(ep.GetClientURLs()) == 0 { // This member is not started yet. log.Info("member is not started yet", zap.String("member-id", strconv.FormatUint(ep.GetID(), 16)), errs.ZapError(err)) continue } status, err := s.getClient().Status(s.serverCtx, ep.ClientURLs[0]) if err != nil { log.Info("failed to get status of member", zap.String("member-id", strconv.FormatUint(ep.ID, 16)), zap.String("endpoint", ep.ClientURLs[0]), errs.ZapError(err)) continue } if status.Leader != ep.ID { continue } leaderAddr := ep.ClientURLs[0] if s.pdLeaderAddr.CompareAndSwap(s.pdLeaderAddr.Load(), leaderAddr) { if status.Leader != curLeader { log.Info("switch PD leader", zap.String("leader-id", strconv.FormatUint(ep.ID, 16)), zap.String("endpoint", ep.ClientURLs[0])) s.reconnectCh <- true } curLeader = ep.ID break } - } + } + }Also applies to: 153-159
Signed-off-by: tongjian <1045931706@qq.com>
Signed-off-by: tongjian <1045931706@qq.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #10223 +/- ##
==========================================
- Coverage 78.62% 78.62% -0.01%
==========================================
Files 520 520
Lines 70089 70096 +7
==========================================
+ Hits 55105 55110 +5
- Misses 10998 11000 +2
Partials 3986 3986
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
client/opt/option.go
Outdated
| } | ||
|
|
||
| // WithPDLeaderOnlyStoreRequest means the store request must be handled by PD leader. | ||
| func WithPDLeaderOnlyStoreRequest() GetStoreOption { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coderabbitai is there a better name for both WithPDLeaderOnlyStoreRequest and WithPDLeaderOnly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rleungx Yes, I think the naming could be improved for better consistency. Here are my suggestions:
Option 1: Use the same name for both (recommended)
WithPDLeaderOnly()for both store and region options
Since GetStoreOption and GetRegionOption are different function types, Go's type system naturally disambiguates them. This approach:
- Eliminates redundancy (the "StoreRequest" suffix is implied by the type)
- Creates consistency across the API
- Follows the principle of conciseness seen in options like
WithBuckets()andWithExcludeTombstone()
Option 2: Keep them distinct but shorter
WithPDLeaderOnlyStore()for store optionsWithPDLeaderOnlyRegion()for region options
This maintains explicit distinction but is still more concise than WithPDLeaderOnlyStoreRequest.
My recommendation: Go with Option 1 and use WithPDLeaderOnly() for both. The type signatures make the context clear, and it's more idiomatic Go style. You could apply the same simplification to WithAllowRouterServiceHandleStoreRequest() → WithAllowRouterServiceHandle() for consistency across the entire API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have renamed as WithPDLeaderHandleStoreRequestOnly and WithAllowPDLeaderOnly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
|
@bufferflies: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: Ref #9212
The client will be used by client-go
https://github.com/tikv/client-go/blob/362e1a226b73496efad146f1ef24f26c831f88e9/internal/locate/region_cache.go#L2270-L2329
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs/pingcap/docs-cn:pingcap/tiup:Release note
Summary by CodeRabbit
New Features
Refactor
Tests