Skip to content

New ruler endpoints #620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Jan 11, 2018
Merged

New ruler endpoints #620

merged 13 commits into from
Jan 11, 2018

Conversation

jml
Copy link
Contributor

@jml jml commented Dec 6, 2017

Part of #619

  • creates new endpoints for getting & setting ruler configs; puts these endpoints on the ruler itself
    • these endpoints only exposed if db flags provided
  • creates new database methods that use the existing data structure but mask the fact that ruler & alertmanager configs are mushed together
  • updates ruler to use database for getting rules if provided right flag; falls back to config server otherwise

I was going to do alertmanager in the same PR, but this got big quickly.

I'm not wedded to putting the endpoints on the ruler itself.

Pros:

  • we can delete the configs service
  • the ruler scheduler doesn't need to poll the database

Cons:

  • if the ruler scheduler slows down, setting & getting configs on the UI will be slower (although given they both talk to the same DB, their performance will be correlated anyhow)

I say "ruler scheduler", because I'm intending to split the scheduler from workers as soon as #619 is done.

One thing I'll insist on is that these endpoints with new behaviours be at different URLs to the ones currently being served.

@jml jml force-pushed the 619-new-ruler-endpoints branch from 0a72c4e to e6431a3 Compare December 6, 2017 21:49
@jml
Copy link
Contributor Author

jml commented Dec 6, 2017

Just remembered I want to move over some API tests and run them against in memory and Postgres databases. The query over jsonb data is quite funky and I could easily get it wrong.

@jml jml requested a review from bboreham December 7, 2017 18:43
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks ok but I had a couple of questions.

// RulesAPI is what the ruler needs from a config store to process rules.
type RulesAPI interface {
// GetConfigs returns all Cortex configurations from a configs API server
// // TODO: hat have been updated after the given configs.ID was last updated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a TODO?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Emacs typo. t<TAB> triggers a TODO autocomplete.

// GetConfigs implements RulesAPI.
func (c configsClient) GetConfigs(since configs.ID) (map[string]configs.VersionedRulesConfig, error) {
suffix := ""
if since != 0 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't this make use of AlertManagerConfigsAPI.GetConfigs() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's for rules, not alertmanager configs, but "why doesn't this use RulerConfigsAPI.GetConfigs()?" is a valid question. The answer is that it replaces RulerConfigsAPI, and has a slightly different API (configs.VersionedRulesConfig is new in this PR).

We're replacing it because we don't want to have a client library per se. The ruler will be directly responsible for querying the database, so there's no need for code in another package that knows how to get ruler configs.

pkg/ruler/api.go Outdated
http.Error(w, "No configuration", http.StatusNotFound)
return
} else if err != nil {
// XXX: Untested
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the expectation with this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's cargo-culted from the config API. The vague expectation is that we'd test the error path one of these days. I've removed it & its like.

@jml
Copy link
Contributor Author

jml commented Dec 21, 2017

Still needs some more tests before I'm happy to merge it.

@jml jml changed the title New ruler endpoints [WIP] New ruler endpoints Dec 21, 2017
@jml
Copy link
Contributor Author

jml commented Dec 21, 2017

Also changed new API to CAS, which addresses part of #330

@jml jml force-pushed the 619-new-ruler-endpoints branch from 2f7e2c8 to 9c22c68 Compare January 5, 2018 17:29
@jml
Copy link
Contributor Author

jml commented Jan 5, 2018

Wrote the tests. I'm personally happy for this to be merged.

@bboreham PTAL?

@jml jml changed the title [WIP] New ruler endpoints New ruler endpoints Jan 5, 2018
Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't finished reviewing, but I got to a point where I can't see why it's correct, so please look at the comments so far and I'll come back to it when you respond.

Will you clean up the commit history? It's harder to work though the steps when they sometimes backtrace.

@@ -43,22 +44,22 @@ func cleanup(t *testing.T) {
}

// request makes a request to the configs API.
func request(t *testing.T, method, urlStr string, body io.Reader) *httptest.ResponseRecorder {
func request(t *testing.T, handler http.Handler, method, urlStr string, body io.Reader) *httptest.ResponseRecorder {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include this change in the commit description, or in a separate commit if that makes sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've split this into a different commit.

@@ -24,12 +24,12 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) {

// RulesDB has ruler-specific DB interfaces.
type RulesDB interface {
// GetRulesConfig gets the user's alertmanager config
// GetRulesConfig gets the user's ruler config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please include this change in the commit description, or move to a separate commit if that makes more sense.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved it to be part of 029f17c, which introduces this interface.

// The supplied oldConfig must match the current config. If no config
// exists, then oldConfig must be nil. Otherwise, it must exactly
// equal the existing config.
if !((err == sql.ErrNoRows && oldConfig == nil) || reflect.DeepEqual(current.Config.RulesFiles, oldConfig)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Less repetition if you moved the details of the check into a RulesConfig.Equal() method, and that could be coded as a straightforward loop rather than using reflect.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, in my early morning fog, it wasn't that straightforward. I have become too used to just writing deriving Eq.

Options("DISTINCT ON (owner_id)").
From("configs").
Where(filter).
Where("config ->> 'rules_files' <> '{}'").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I've added a fairly detailed comment, repasted here in case that assists your follow-up review.

		// `->>` gets a JSON object field as text. When a config row exists
		// and alertmanager config is provided but ruler config has not yet
		// been, the 'rules_files' key will have an empty JSON object as its
		// value. This is (probably) the most efficient way to test for a
		// non-empty `rules_files` key.
		//
		// This whole situation is way too complicated. See
		// https://github.com/weaveworks/cortex/issues/619 for the whole
		// story, and our plans to improve it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to assume we only ever go from blank to non-blank in the history; if it goes from non-blank to blank then you will select the previous version.

From("configs").
Where(filter).
Where("config ->> 'rules_files' <> '{}'").
OrderBy("owner_id, id DESC").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we have a Limit(1) here like in GetConfig() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because GetConfig() gets the configuration for a single user, whereas findRulesConfigs helps GetAllRulesConfigs() and GetRulesConfigs() get the active configurations for all users.

I've added a doc comment that highlights the purpose, but doesn't explicitly answer the "why no Limit(1)?" question:

// findRulesConfigs helps GetAllRulesConfigs and GetRulesConfigs retrieve the
// set of all active rules configurations across all our users.

I'm open to renaming some or all of these functions if you can suggest names that more clearly communicate intent.

type RulesDB interface {
// GetRulesConfig gets the user's ruler config
GetRulesConfig(userID string) (configs.VersionedRulesConfig, error)
// SetRulesConfig sets the user's ruler config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mention the CAS-ness of the call

jml added 10 commits January 10, 2018 10:45
Will fall back to using configs service if no DB information supplied.
Now an internal implementation detail of ruler while we migrate away from
configs server for ruler configs.
Comes with tests, and updates to our Makefile to ensure that the tests run
against both postgres and in-memory database.
This allows us to re-use the helpers to communicate to other endpoints served
by completely different API objects.
Fixes errors that occur when alertmanager config is set but rules files are not.
@jml jml force-pushed the 619-new-ruler-endpoints branch from 9c22c68 to 4305b19 Compare January 10, 2018 10:55
@jml
Copy link
Contributor Author

jml commented Jan 10, 2018

@bboreham I've addressed all your review points. I think I could sensibly split this into two PRs, where the first PR goes from master..3dc22b9 (Public RulesAPI no longer used. Delete it.) and changes the ruler to go directly to the database for getting all configs, and the second PR has the rest and adds the new endpoints & tests. I only want to do this if it accelerates getting both merged (as I think this is blocking #629).

PTAL.

@jml jml force-pushed the 619-new-ruler-endpoints branch from 4305b19 to fd9fc8d Compare January 10, 2018 11:48
// findRulesConfigs helps GetAllRulesConfigs and GetRulesConfigs retrieve the
// set of all active rules configurations across all our users.
func (d DB) findRulesConfigs(filter squirrel.Sqlizer) (map[string]configs.VersionedRulesConfig, error) {
rows, err := d.Select("id", "owner_id", "config ->> 'rules_files'").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I don't understand why this doesn't bring back the entire history, so you will end up with some random version.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, "SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows"
and you have order by id DESC below.

So effectively the same as LIMIT 1 elsewhere

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want me to add a comment along those lines? If so, can you share the link to the doc you just quoted?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still not sure about selecting for non-blank rules, but guess that will be rare in the wild.

@jml
Copy link
Contributor Author

jml commented Jan 11, 2018 via email

@jml jml merged commit 108f77f into master Jan 11, 2018
jml added a commit that referenced this pull request Jan 11, 2018
@jml jml mentioned this pull request Jan 11, 2018
jml added a commit that referenced this pull request Jan 11, 2018
jml added a commit that referenced this pull request Jan 11, 2018
jml added a commit that referenced this pull request Jan 12, 2018
* Revert "Revert "New ruler endpoints (#620)" (#648)"

This reverts commit 860d2fb.

* Use DB only if Configs API *not* provided

Works around the fact that `database.uri` has a default value, so we cannot
detect whether the user has explicitly asked for a database or not.
@tomwilkie tomwilkie deleted the 619-new-ruler-endpoints branch August 29, 2018 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants