Skip to content

Conversation

@mikemcdougall
Copy link
Collaborator

Summary

Implements Polly resilience policies for database connection reliability as specified in Issue #48.

Closes #48

Key Changes

🔄 Resilience Implementation

  • Connection retry policy with exponential backoff (3 retries, 100ms × 2^attempt)
  • PostgreSQL error filtering - Only retries safe connection errors:
    • 57P03 (cannot_connect_now)
    • 08000 (connection_exception)
    • 08003 (connection_does_not_exist)
    • 08006 (connection_failure)
  • Transaction safety - NO retries within transactions to prevent duplicate operations

📊 Observability

  • Structured logging for retry attempts (Event ID 4040)
  • Correlation ID support for request tracing
  • Seamless integration with existing logging infrastructure

🏗️ Architecture

  • NpgsqlDataSourceExtensions for clean integration
  • Updated implementations:
    • PostgresFeatureStore - All CRUD operations use retry
    • PostgresLayerCatalog - All metadata queries use retry
  • Vertical slice organization maintained

🧪 Testing

  • 18 comprehensive tests covering all retry scenarios
  • Error code validation for PostgreSQL states
  • Integration tests with real database connections
  • Edge case handling (timeouts, cancellation, callbacks)

Quality Assurance ✅

  • All 104 tests passing (102 integration + 2 architecture)
  • Zero build warnings/errors
  • AOT compilation verified - 23MB binary (no regression)
  • Runtime performance confirmed - Health checks <1ms
  • Code formatting applied - dotnet format completed

Acceptance Criteria Met ✅

  • ✅ Connection acquisition retries on transient errors
  • ✅ Transactions are NOT retried (prevents duplicates)
  • ✅ Retry attempts logged with correlation ID
  • ✅ Policy integrates seamlessly with NpgsqlDataSource
  • ✅ Follows documented MVP_PLAN.md patterns exactly

Technical Implementation

// Usage example - transparent to existing code
await using var connection = await _dataSource.OpenConnectionWithRetryAsync();

Files Changed: 10 files, +394 insertions
New Components: ResiliencePolicies, NpgsqlDataSourceExtensions, comprehensive test suite

🤖 Generated with Claude Code

@github-actions
Copy link

github-actions bot commented Dec 19, 2025

🤖 LLM Architecture Review

🚫 Assessment: BLOCKING_ISSUES

🏗️ Architecture Review Summary

✅ Good Patterns Found:

  • Usage of dependency injection for PostgreSQL services in src/Honua.Postgres/ServiceCollectionExtensions.cs ensures modularity and ease of configuration changes.
  • Implementation of resilience policies in src/Honua.Postgres/Features/Infrastructure/Resilience/ResiliencePolicies.cs and NpgsqlDataSourceExtensions.cs enhances the reliability of database connections.
  • Comprehensive logging throughout the application, such as in src/Honua.Server/Features/Infrastructure/Logging/Log.cs, aids in monitoring and debugging.

⚠️ Architecture Concerns:

  • Critical: Core depending on Infrastructure in src/Honua.Core/Features/Infrastructure/Abstractions/IDatabaseConnectionProvider.cs violates the dependency rule (Core should not depend on Infrastructure). This is a blocking issue.
  • Warning: The PostgresLayerCatalog and PostgresFeatureStore classes in src/Honua.Postgres/Features/Catalog/PostgresLayerCatalog.cs and src/Honua.Postgres/Features/FeatureStore/PostgresFeatureStore.cs expose database-specific implementations which should be internal but are potentially accessible due to the project structure.
  • Warning: Synchronous operations in async context observed in src/Honua.Postgres/Features/HealthCheck/PostgresDatabaseHealthChecker.cs with ConfigureAwait(false) used inconsistently.

💡 Recommendations:

  • To resolve the critical dependency violation, refactor IDatabaseConnectionProvider to be part of a shared kernel or an abstraction layer that does not depend on any specific infrastructure.
  • Ensure all database-specific implementations are marked as internal and are not exposed outside the Infrastructure layer. Use InternalsVisibleTo only when absolutely necessary.
  • Standardize the use of ConfigureAwait(false) across all asynchronous operations to maintain consistency and prevent deadlocks in a mixed sync/async environment.

📚 Educational Notes:

  • Dependency Direction: Keeping Core independent of Infrastructure is crucial in a Clean Architecture to ensure that business rules can evolve without being impacted by external concerns like database specifics.
  • Resilience and Logging: Implementing resilience patterns such as retries with exponential backoff and comprehensive logging helps in maintaining the reliability and observability of the system, which is vital for high-availability applications such as geospatial feature servers.
  • Async Best Practices: Consistent use of asynchronous programming best practices, including proper use of ConfigureAwait(false), is important to prevent performance bottlenecks and enhance the scalability of the application, especially under high load in a cloud-native environment.

Overall Assessment: BLOCKING_ISSUES

CRITICAL: The PR is linked to a GitHub issue with clear acceptance criteria, and changes address those criteria. However, the violation of Core depending on Infrastructure is a blocking issue that must be resolved before merging.


Automated architectural analysis powered by OpenAI GPT-4
This review focuses on architectural patterns and design decisions
Human review still recommended for complex changes

- Add exponential backoff retry policy for PostgreSQL connection errors
- Handle transient connection failures (57P03, 08000, 08003, 08006)
- Create ResiliencePolicies.cs with connection retry logic
- Add NpgsqlDataSourceExtensions for seamless integration
- Implement structured logging for retry attempts

Clean Architecture Fixes:
- Create IDatabaseConnectionProvider abstraction to hide Polly infrastructure
- Move PostgreSQL-specific resilience logic to Infrastructure layer
- Use dependency injection for database connections
- Remove Infrastructure imports from Core domain classes
- Add TestDatabaseConnectionProvider for clean test architecture

Testing:
- Add comprehensive test coverage for resilience scenarios
- All 104 tests passing
- AOT build verified (23MB binary)
- Schema isolation for parallel test execution
@mikemcdougall mikemcdougall force-pushed the implement-polly-resilience-policies-issue-48 branch from 0919296 to 570339f Compare December 19, 2025 01:35
Mike McDougall added 2 commits December 18, 2025 15:40
- Add ConfigureAwait(false) to critical OpenConnectionAsync calls
- Prevent potential deadlocks in library code
- Maintain clean architecture with internal database types
- All 104 tests passing after fixes
@mikemcdougall mikemcdougall merged commit c9dbf7c into trunk Dec 19, 2025
9 of 10 checks passed
@mikemcdougall mikemcdougall deleted the implement-polly-resilience-policies-issue-48 branch December 19, 2025 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Polly resilience policies for database connections

2 participants