Skip to content

Fix CreateInNotExistAsync bug#3398

Merged
Aaronontheweb merged 1 commit intoakkadotnet:devfrom
Arkatufus:#3397-Fix-CreateIfNotExistAsync-bug
Feb 25, 2026
Merged

Fix CreateInNotExistAsync bug#3398
Aaronontheweb merged 1 commit intoakkadotnet:devfrom
Arkatufus:#3397-Fix-CreateIfNotExistAsync-bug

Conversation

@Arkatufus
Copy link
Copy Markdown
Contributor

Fixes #3397

Problem

AzureApiImpl.ContainerClient() called CreateIfNotExistsAsync() with no exception handling. The Azure SDK has known bugs where this method still throws RequestFailedException(409) for ContainerAlreadyExists. The unhandled exception propagates through the lease actor to the Split Brain Resolver, which reverses its decision to ReverseDownIndirectlyConnected and downs the entire cluster.

Propagation path: ContainerClient() throws 409 → LeaseResourceExists() catch block misidentifies it as a blob-level error → re-thrown as LeaseExceptionReadOrCreateLeaseResource() (no catch) → LeaseActor PipeTo → Status.Failure → SBR receives failure → ReverseDownIndirectlyConnected → all nodes downed.

Fix

Replaced CreateIfNotExistsAsync() with CreateAsync() + explicit 409 handling in ContainerClient():

  • ContainerAlreadyExists (benign) — silently handled, set _initialized = true
  • All other errors — wrapped in a new internal ContainerInitializationException

New ContainerInitializationException — an internal exception type that distinguishes container-level errors from blob-level RequestFailedException. This prevents the existing catch (RequestFailedException) blocks (designed for blob operations) from mishandling container creation errors.

Catch blocks added to LeaseResourceExists(), CreateLeaseResource(), and GetLeaseResource() — each catches ContainerInitializationException and returns its natural "try again" value (false, null, null), which feeds back into the existing retry loop in ReadOrCreateLeaseResource().

Files Changed

File Change
src/coordination/azure/Akka.Coordination.Azure/Internal/AzureApiImpl.cs Add ContainerInitializationException; rewrite ContainerClient(); add catch blocks to caller methods
src/coordination/azure/Akka.Coordination.Azure.Tests/AzureApiSpec.cs Add regression tests for container-already-exists scenario

Test Plan

Copy link
Copy Markdown
Member

@Aaronontheweb Aaronontheweb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SBR lease-majority fails to acquire lease due to HTTP 409 ContainerAlreadyExists, causing ReverseDownIndirectlyConnected to down the entire cluster

2 participants