-
Notifications
You must be signed in to change notification settings - Fork 816
Future of the Config Service #1244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This seems very related to: #619 Personally, I would be in favor of not requiring the config service since I don't believe it provides much (any?) benefit over having the relevant api calls directly on the ruler/alertmanager and allow those components to be fully in charge of their data. |
Re point 2, that is supposed to happen here. Interested to know what error you are referring to. |
@bboreham The way the ruler is currently set up it does not have the same validation when it uses it's own API rather than the config service: Line 87 in 0054129
This leads to the requirement of ensuring a parseable rule set later: Line 233 in 0054129
I don't think we should ever be in a situation where we have an uploaded ruleset that is unable to be scheduled. I think we should assume all rules that are uploaded can be scheduled and if an error occurs we can record it and create systems that make fixing the broken rule easy. However, I don't think we should default back to the previous rule configs. Instead I think we should avoid keeping multiple versions/histories of user configs in general. If a user wants to manage and have version control of their configs they should do it using an external service or git. That way we won't have to create a new row in postgres for every user change. |
That should be a separate issue; it is a straightforward bug. The history is that JML decided it had been a mistake to create a separate configs service, and started to migrate the functionality into ruler and alertmanager - #620, #649. To avoid downtime both routes have to work for a time. However the new code was never seen to work, JML moved to a different job, and it stalled. |
We are currently using the ruler without the config service. It is working well for the time being. Also this will likely become easier once cortex is deployed as a single binary. Then it should be easier to subsume the config service entirely. |
This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
@jtlisi can you give an update on this issue? Was anything done for the alertmanager? |
As of now you can manage rules / alertmanager configs directly using their built-in APIs w/ Object Storage. However, it has reduced consistency guarantees. |
@gotjosh Another thing to add to your plate :) I know you have too much already, but there is no rush. We should see how the APIs on the config service and the native AM/Ruler APIs differ, see if we can use just the native APIs and deprecate the config service. If not, we should provide guidance on which use-cases each of the APIs satisfy and when to pick what. |
Keep them coming 😄 , I'm looking forward to spending a solid amount of time working on the ruler in the foreseeable future. |
The Config Service is a central component to the Alertmanager and Ruler. It receives JSON data with an embedded yaml blobs that represent prometheus config files. Preliminary work to decouple the Ruler from the config service has already started. Instead of polling the config service the ruler will host it's own api and interact with the Postgres database directly. If the alertmanager moves in this direction that leaves little reason to keep the config service. Below I added some goals to keep in mind when considering the Config service. I would be interested to know what people think and what the general sentiment is around deprecating the config db in favor of embedding the libs within the alertmanager and ruler.
Goals for Configuration
In my opinion the ability to upload and export prometheus config files is essential to the cortex user experience. Users should have a portable configuration that they can move from prometheus to cortex and vice versa. However, this does not mean we need to store each user information in a json blob. Structuring user configs into a more refined format could be necessary for user friendly CRUD features related to alerts/rules in the future.
Validation features that ensure users have properly formatted configuration files would increase the user experience with the ruler. It would be beneficial for users to know they have an inoperable group of rules as a response to an api call rather than an internal error when that rule group is polled by the ruler.
The text was updated successfully, but these errors were encountered: