Skip to content

Conversation

@okJiang
Copy link
Member

@okJiang okJiang commented Feb 3, 2026

What problem does this PR solve?

Issue Number: Close #10200

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

  • Bug Fixes

    • Improved startup cleanup to reliably stop embedded services, close clients and network connections on startup failures, reducing resource leaks.
  • Tests

    • Hardened cluster tests to treat cluster ID mismatches like port conflicts, automatically recreate servers with new ports and retry to improve test reliability.

Signed-off-by: okjiang <819421878@qq.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. dco-signoff: yes Indicates the PR's author has signed the dco. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 3, 2026
@coderabbitai
Copy link

coderabbitai bot commented Feb 3, 2026

📝 Walkthrough

Walkthrough

Introduces a startup cleanup routine in server/server.go that ensures embedded etcd and related clients/connections are closed on start failures, and updates tests/cluster.go to treat "Etcd cluster ID mismatch" as a retriable port-conflict, recreating servers and retrying startup.

Changes

Cohort / File(s) Summary
Startup resource cleanup
server/server.go
Changed startEtcd signature to return a named error (retErr) and added a deferred cleanup() closure that stops the embedded etcd, closes etcd/election clients, and closes HTTP idle connections on error paths.
Test retry for cluster ID mismatch
tests/cluster.go
Expanded runInitialServersWithRetry to treat "Etcd cluster ID mismatch" like "address already in use": stop/destroy servers, recreate ports, sleep briefly, and retry startup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

size/L, lgtm, approved

Suggested reviewers

  • bufferflies
  • lhy1024
  • JmPotato

Poem

🐰 I hopped through logs at break of day,
I closed the streams when starts went astray,
If IDs clash or ports misbehave,
I'll remake the nest and bravely try to save. 🥕

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the fix for the flaky TestForwardTestSuite test identified in issue #10200.
Description check ✅ Passed The description includes the linked issue (#10200) and mostly follows the repository template, though implementation details are minimal.
Linked Issues check ✅ Passed The code changes address the root cause of the TestForwardTestSuite flakiness by improving etcd cleanup on errors and handling cluster ID mismatches as port conflicts requiring retry.
Out of Scope Changes check ✅ Passed All changes in server/server.go and tests/cluster.go are directly related to fixing the flaky test by improving error handling and cleanup.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@okJiang
Copy link
Member Author

okJiang commented Feb 3, 2026

/retest

@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Feb 4, 2026
@codecov
Copy link

codecov bot commented Feb 4, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.64%. Comparing base (2139230) to head (80ce0e5).
⚠️ Report is 6 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #10203      +/-   ##
==========================================
+ Coverage   78.63%   78.64%   +0.01%     
==========================================
  Files         520      520              
  Lines       70089    70104      +15     
==========================================
+ Hits        55112    55134      +22     
+ Misses      10989    10988       -1     
+ Partials     3988     3982       -6     
Flag Coverage Δ
unittests 78.64% <100.00%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@okJiang
Copy link
Member Author

okJiang commented Feb 4, 2026

/retest

@okJiang
Copy link
Member Author

okJiang commented Feb 5, 2026

/retest

server/server.go Outdated
// NOTE: `embed.Etcd.Close()` can block for a long time in some failure paths
// (e.g. when starting a removed member that can never become ready). Avoid
// blocking the caller (tests may wait for the start error) by stopping the
// server synchronously and closing the embedded etcd in background.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean "asynchronously"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

server/server.go Outdated
}
err = s.initMember(newCtx, etcd)
if err != nil {
cleanup()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a defer that checks whether err is nil to decide whether to call cleanup()?

Signed-off-by: okjiang <819421878@qq.com>
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Feb 10, 2026
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 10, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 10, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-02-10 04:34:12.627863708 +0000 UTC m=+245868.322003558: ☑️ agreed by JmPotato.
  • 2026-02-10 05:31:43.475358566 +0000 UTC m=+249319.169498416: ☑️ agreed by lhy1024.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 10, 2026

@niubell: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Feb 10, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JmPotato, lhy1024, niubell

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@okJiang
Copy link
Member Author

okJiang commented Feb 10, 2026

/pull-error-log-review

@okJiang
Copy link
Member Author

okJiang commented Feb 10, 2026

/retest

1 similar comment
@okJiang
Copy link
Member Author

okJiang commented Feb 11, 2026

/retest

@ti-chi-bot ti-chi-bot bot merged commit ffaeac9 into tikv:master Feb 11, 2026
32 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestForwardTestSuite is flaky in next-gen

4 participants