Skip to content

Future of the Config Service #1244

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
jtlisi opened this issue Feb 24, 2019 · 10 comments
Open

Future of the Config Service #1244

jtlisi opened this issue Feb 24, 2019 · 10 comments
Labels
component/alertmanager component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. keepalive Skipped by stale bot

Comments

@jtlisi
Copy link
Contributor

jtlisi commented Feb 24, 2019

The Config Service is a central component to the Alertmanager and Ruler. It receives JSON data with an embedded yaml blobs that represent prometheus config files. Preliminary work to decouple the Ruler from the config service has already started. Instead of polling the config service the ruler will host it's own api and interact with the Postgres database directly. If the alertmanager moves in this direction that leaves little reason to keep the config service. Below I added some goals to keep in mind when considering the Config service. I would be interested to know what people think and what the general sentiment is around deprecating the config db in favor of embedding the libs within the alertmanager and ruler.

Goals for Configuration

  1. Maintain the ability to upload prometheus config files

In my opinion the ability to upload and export prometheus config files is essential to the cortex user experience. Users should have a portable configuration that they can move from prometheus to cortex and vice versa. However, this does not mean we need to store each user information in a json blob. Structuring user configs into a more refined format could be necessary for user friendly CRUD features related to alerts/rules in the future.

  1. Validate User Configs in API

Validation features that ensure users have properly formatted configuration files would increase the user experience with the ruler. It would be beneficial for users to know they have an inoperable group of rules as a response to an api call rather than an internal error when that rule group is polled by the ruler.

@csmarchbanks
Copy link
Contributor

This seems very related to: #619

Personally, I would be in favor of not requiring the config service since I don't believe it provides much (any?) benefit over having the relevant api calls directly on the ruler/alertmanager and allow those components to be fully in charge of their data.

@bboreham
Copy link
Contributor

Re point 2, that is supposed to happen here. Interested to know what error you are referring to.

@jtlisi
Copy link
Contributor Author

jtlisi commented Apr 15, 2019

@bboreham The way the ruler is currently set up it does not have the same validation when it uses it's own API rather than the config service:

func (a *API) casConfig(w http.ResponseWriter, r *http.Request) {

This leads to the requirement of ensuring a parseable rule set later:

rulesByGroup, err := config.Config.Parse()

I don't think we should ever be in a situation where we have an uploaded ruleset that is unable to be scheduled. I think we should assume all rules that are uploaded can be scheduled and if an error occurs we can record it and create systems that make fixing the broken rule easy. However, I don't think we should default back to the previous rule configs.

Instead I think we should avoid keeping multiple versions/histories of user configs in general. If a user wants to manage and have version control of their configs they should do it using an external service or git. That way we won't have to create a new row in postgres for every user change.

@bboreham
Copy link
Contributor

it does not have the same validation when it uses its own API rather than the config service

That should be a separate issue; it is a straightforward bug.

The history is that JML decided it had been a mistake to create a separate configs service, and started to migrate the functionality into ruler and alertmanager - #620, #649. To avoid downtime both routes have to work for a time.

However the new code was never seen to work, JML moved to a different job, and it stalled.

@jtlisi
Copy link
Contributor Author

jtlisi commented Apr 16, 2019

We are currently using the ruler without the config service. It is working well for the time being. Also this will likely become easier once cortex is deployed as a single binary. Then it should be easier to subsume the config service entirely.

@stale
Copy link

stale bot commented Feb 3, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

@bboreham
Copy link
Contributor

@jtlisi can you give an update on this issue? Was anything done for the alertmanager?

@bboreham bboreham added component/alertmanager component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. size/XL labels Jul 30, 2020
@pracucci pracucci removed the size/XL label Jul 30, 2020
@jtlisi
Copy link
Contributor Author

jtlisi commented Jul 30, 2020

As of now you can manage rules / alertmanager configs directly using their built-in APIs w/ Object Storage. However, it has reduced consistency guarantees.

@gouthamve
Copy link
Contributor

@gotjosh Another thing to add to your plate :) I know you have too much already, but there is no rush. We should see how the APIs on the config service and the native AM/Ruler APIs differ, see if we can use just the native APIs and deprecate the config service.

If not, we should provide guidance on which use-cases each of the APIs satisfy and when to pick what.

@gotjosh
Copy link
Contributor

gotjosh commented Jul 30, 2020

Keep them coming 😄 , I'm looking forward to spending a solid amount of time working on the ruler in the foreseeable future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/alertmanager component/rules Bits & bobs todo with rules and alerts: the ruler, config service etc. keepalive Skipped by stale bot
Projects
None yet
Development

No branches or pull requests

6 participants