Add failing tests proving premature LeaseAcquired bug (#3402)#3406
Merged
Aaronontheweb merged 5 commits intoakkadotnet:devfrom Mar 9, 2026
Merged
Conversation
Tests reproduce the split-brain scenario: when a CAS conflict occurs during granting and the blob/configmap has no owner, LeaseActor sends LeaseAcquired BEFORE the retry write completes. If the retry fails (another node takes the lease), the caller believes it holds the lease but it doesn't. ShouldNotSendPrematureLeaseAcquiredWhenConflictRetryIsStolen: - FAILS: receives LeaseAcquired instead of expected LeaseTaken - Proves the bug exists in both Azure and Kubernetes implementations ShouldGrantLeaseOnlyAfterConflictRetrySucceeds: - PASSES: verifies happy-path still works correctly
43d0c9e to
5c73d3f
Compare
Member
Author
|
Integrated Gregorius' original fix commits into this branch and extended them to Kubernetes for parity.\n\nIncluded with attribution via cherry-pick:\n- 611112c ( |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds deterministic reproduction tests for the premature
LeaseAcquiredsplit-brain bug in both Azure and KubernetesLeaseActorimplementations.These tests intentionally FAIL to prove the bug exists. The fix will come in a subsequent commit.
The Bug
When a CAS conflict occurs during the
Grantingstate and the blob/configmap has no owner (version moved on),LeaseActorsendsLeaseAcquiredto the caller before the retry write completes (K8s line 365, Azure line 362). If the retry then fails (another node takes the lease), the caller already believes it holds the lease — but it doesn't.localGrantedis never set totrue, heartbeat never starts.Test Results
ShouldNotSendPrematureLeaseAcquiredWhenConflictRetryIsStolenShouldGrantLeaseOnlyAfterConflictRetrySucceedsFailure Output
This directly proves the premature
LeaseAcquiredis sent before the retry write, confirming the split-brain scenario described in #3402.All existing tests pass
Fixes #3402
Closes #3403