-
Notifications
You must be signed in to change notification settings - Fork 25.4k
File-based settings health indicator #117081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
server/src/main/java/org/elasticsearch/reservedstate/service/FileSettingsService.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/reservedstate/service/FileSettingsService.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/reservedstate/service/FileSettingsService.java
Show resolved
Hide resolved
Ok I've taken a different approach.
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left one comment. I wanted to do some manual testing, but it's not blocking.
completion.onResponse(null); | ||
healthIndicatorService.successOccurred(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the failure case you call .failureOccurred
prior to the completion invocation, but here the order is switched. Should the .successOccurred()
call be before the completion.onResponse(null)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to report a success if onResponse
throws. Conversely, I did want to report a failure even if onFailure
throws.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for nodes that don't ever read file settings? They are only used by ECK/serverless, and even then only the master node actually reads them. So on other nodes, the indicator would always be in a yellow state?
@rjernst that should be covered by the |
I ran it locally, and the
|
* Add FileSettingsService health indicator * spotless * YELLOW for any failure, plus most_recent_failure
We have been trying to alert on file-based settings failures by inferring badness from logs. We've made progress there, but ultimately we're having trouble with the alert recovery conditions.
This PR adds a file-based settings Health Indicator, and we can alert directly on that instead of the logs.