Skip to content

Conversation

@vench
Copy link

@vench vench commented Jul 19, 2024

Description

This pull request introduces circuit breaking functionality to Chproxy, enhancing its resilience and reliability when interacting with ClickHouse clusters. Circuit breaking is a critical feature for preventing cascading failures in distributed systems by halting operations when certain failure thresholds are reached

Please check the type of change your PR introduces:

  • Bugfix
  • Feature
  • Code style update (formatting, renaming)
  • Refactoring (no functional changes, no api changes)
  • Build related changes
  • Documentation content changes
  • Other (please describe):

Checklist

  • Linter passes correctly
  • Add tests which fail without the change (if possible)
  • All tests passing
  • Extended the README / documentation, if necessary

@sigua-cs
Copy link
Contributor

sigua-cs commented Jul 21, 2024

Hello @vench
Thank you for your pull request introducing the circuit breaking functionality to Chproxy. To better understand the implementation and ensure its effectiveness, could you please provide additional details on the following points?

  1. Failure Scenarios:
    Could you provide examples of failure scenarios that the circuit breaker is designed to handle? For instance, what types of failures should trigger the circuit breaker?

  2. Expected Behavior:
    What specific behavior should we expect when the circuit breaker is triggered? How should it affect ongoing operations and future requests?

  3. Test Cases:
    In proxy_test.go, there is a test case for "error with breaker on." Could you explain the expected responses and status codes? Additionally, could you provide a few more test cases to demonstrate the circuit breaker’s behavior under different scenarios?

Your detailed insights will greatly help us understand and validate the introduced feature.
Thank you for your contribution!

@vench
Copy link
Author

vench commented Jul 23, 2024

Good day.

On our project, under heavy load on the ClickHouse database, we encountered a problem where it lacks resources to process requests. And to alleviate the intensity of the load, we would like to break the chain, i.e., if we already know about a large number of errors, not to go to the database, but to immediately return an error for some time. This way, give the database time to process existing requests.

Therefore, the errors we detect are lack of memory, exceeding limits, and insufficient threads (439).

Soon, I will conduct more tests on this functionality. Thank you for your response.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants