Skip to content

Refactor Ruler/Alertmanager API & Decouple configdb #1513

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 60 commits into from

Conversation

jtlisi
Copy link
Contributor

@jtlisi jtlisi commented Jul 15, 2019

This PR accomplishes the following

  • Add a separate API for the Ruler and Alertmanager based on this design document
  • Add a GCS backend for both the alertmanager and ruler API
  • Update the polling mechanism to use a KV store backed change detection system for the decouple APIs or use a generation based configdb client
  • Update the scheduler to use a user-based context to allow for cancellations after updates or deletions
  • Pass a prometheus register with the users ID to the prometheus rule group to get user level prom eval metrics

Any feedback or suggestions are greatly appreciated

Namespace: "cortex",
Name: "worker_idle_seconds_total",
Help: "How long workers have spent waiting for work.",
})
evalLatency = prometheus.NewHistogram(prometheus.HistogramOpts{
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is moved to the scheduler with different buckets and an additional metric for missed iterations

jtlisi added 29 commits July 23, 2019 16:36
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
jtlisi added 18 commits July 23, 2019 16:37
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
Signed-off-by: Jacob Lisi <[email protected]>
@jtlisi jtlisi force-pushed the 20190617_refactor_rulesdb branch from 5aa6de9 to 0fff90b Compare July 23, 2019 21:08
@jtlisi jtlisi closed this Jul 25, 2019
@jtlisi jtlisi reopened this Jul 25, 2019
@jtlisi jtlisi marked this pull request as ready for review July 25, 2019 21:02
@bboreham
Copy link
Contributor

Update the polling mechanism to use a KV store backed change detection system

can you explain this a bit more? "kv" doesn't appear in the design document.

@jtlisi
Copy link
Contributor Author

jtlisi commented Jul 29, 2019

@bboreham

Sorry about that, the KV system was a recent addition since the shortcomings of managing Polls within the GCS became clear. With the recent updates to the KV package it became rather simple to implement a simple change detection system.

As implemented, the KV store change detection system wraps the ruler and alertmanager api and tracks when configs are created/edited/deleted. Then when the scheduler polls for updated rule groups the kv store will be referenced to find rules that have been updated since the previous poll.

Initially the idea was to have the underlying implementation manage change detection and polling. However, by using the KV store it reduces the complexity of a rule storage backend, which will make it trivial to add additional config storage backends (S3, Azure Blob store, PostGres, boltdb, etc...).

I'll update the design document to account for the KV store asap.

@jtlisi
Copy link
Contributor Author

jtlisi commented Jul 31, 2019

Closing this PR in favor of smaller more readable PRs

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants