-
Notifications
You must be signed in to change notification settings - Fork 816
New ruler endpoints #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New ruler endpoints #620
Conversation
0a72c4e
to
e6431a3
Compare
Just remembered I want to move over some API tests and run them against in memory and Postgres databases. The query over jsonb data is quite funky and I could easily get it wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly looks ok but I had a couple of questions.
pkg/ruler/configs.go
Outdated
// RulesAPI is what the ruler needs from a config store to process rules. | ||
type RulesAPI interface { | ||
// GetConfigs returns all Cortex configurations from a configs API server | ||
// // TODO: hat have been updated after the given configs.ID was last updated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Emacs typo. t<TAB>
triggers a TODO
autocomplete.
// GetConfigs implements RulesAPI. | ||
func (c configsClient) GetConfigs(since configs.ID) (map[string]configs.VersionedRulesConfig, error) { | ||
suffix := "" | ||
if since != 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why doesn't this make use of AlertManagerConfigsAPI.GetConfigs()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's for rules, not alertmanager configs, but "why doesn't this use RulerConfigsAPI.GetConfigs()
?" is a valid question. The answer is that it replaces RulerConfigsAPI
, and has a slightly different API (configs.VersionedRulesConfig
is new in this PR).
We're replacing it because we don't want to have a client library per se. The ruler will be directly responsible for querying the database, so there's no need for code in another package that knows how to get ruler configs.
pkg/ruler/api.go
Outdated
http.Error(w, "No configuration", http.StatusNotFound) | ||
return | ||
} else if err != nil { | ||
// XXX: Untested |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the expectation with this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's cargo-culted from the config API. The vague expectation is that we'd test the error path one of these days. I've removed it & its like.
Still needs some more tests before I'm happy to merge it. |
Also changed new API to CAS, which addresses part of #330 |
2f7e2c8
to
9c22c68
Compare
Wrote the tests. I'm personally happy for this to be merged. @bboreham PTAL? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't finished reviewing, but I got to a point where I can't see why it's correct, so please look at the comments so far and I'll come back to it when you respond.
Will you clean up the commit history? It's harder to work though the steps when they sometimes backtrace.
pkg/ruler/api_test.go
Outdated
@@ -43,22 +44,22 @@ func cleanup(t *testing.T) { | |||
} | |||
|
|||
// request makes a request to the configs API. | |||
func request(t *testing.T, method, urlStr string, body io.Reader) *httptest.ResponseRecorder { | |||
func request(t *testing.T, handler http.Handler, method, urlStr string, body io.Reader) *httptest.ResponseRecorder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include this change in the commit description, or in a separate commit if that makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split this into a different commit.
pkg/configs/db/db.go
Outdated
@@ -24,12 +24,12 @@ func (cfg *Config) RegisterFlags(f *flag.FlagSet) { | |||
|
|||
// RulesDB has ruler-specific DB interfaces. | |||
type RulesDB interface { | |||
// GetRulesConfig gets the user's alertmanager config | |||
// GetRulesConfig gets the user's ruler config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please include this change in the commit description, or move to a separate commit if that makes more sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've moved it to be part of 029f17c, which introduces this interface.
pkg/configs/db/postgres/postgres.go
Outdated
// The supplied oldConfig must match the current config. If no config | ||
// exists, then oldConfig must be nil. Otherwise, it must exactly | ||
// equal the existing config. | ||
if !((err == sql.ErrNoRows && oldConfig == nil) || reflect.DeepEqual(current.Config.RulesFiles, oldConfig)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Less repetition if you moved the details of the check into a RulesConfig.Equal()
method, and that could be coded as a straightforward loop rather than using reflect
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, in my early morning fog, it wasn't that straightforward. I have become too used to just writing deriving Eq
.
pkg/configs/db/postgres/postgres.go
Outdated
Options("DISTINCT ON (owner_id)"). | ||
From("configs"). | ||
Where(filter). | ||
Where("config ->> 'rules_files' <> '{}'"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. I've added a fairly detailed comment, repasted here in case that assists your follow-up review.
// `->>` gets a JSON object field as text. When a config row exists
// and alertmanager config is provided but ruler config has not yet
// been, the 'rules_files' key will have an empty JSON object as its
// value. This is (probably) the most efficient way to test for a
// non-empty `rules_files` key.
//
// This whole situation is way too complicated. See
// https://github.com/weaveworks/cortex/issues/619 for the whole
// story, and our plans to improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to assume we only ever go from blank to non-blank in the history; if it goes from non-blank to blank then you will select the previous version.
pkg/configs/db/postgres/postgres.go
Outdated
From("configs"). | ||
Where(filter). | ||
Where("config ->> 'rules_files' <> '{}'"). | ||
OrderBy("owner_id, id DESC"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we have a Limit(1)
here like in GetConfig()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because GetConfig()
gets the configuration for a single user, whereas findRulesConfigs
helps GetAllRulesConfigs()
and GetRulesConfigs()
get the active configurations for all users.
I've added a doc comment that highlights the purpose, but doesn't explicitly answer the "why no Limit(1)
?" question:
// findRulesConfigs helps GetAllRulesConfigs and GetRulesConfigs retrieve the
// set of all active rules configurations across all our users.
I'm open to renaming some or all of these functions if you can suggest names that more clearly communicate intent.
pkg/configs/db/db.go
Outdated
type RulesDB interface { | ||
// GetRulesConfig gets the user's ruler config | ||
GetRulesConfig(userID string) (configs.VersionedRulesConfig, error) | ||
// SetRulesConfig sets the user's ruler config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mention the CAS-ness of the call
Will fall back to using configs service if no DB information supplied.
Now an internal implementation detail of ruler while we migrate away from configs server for ruler configs.
Comes with tests, and updates to our Makefile to ensure that the tests run against both postgres and in-memory database.
This allows us to re-use the helpers to communicate to other endpoints served by completely different API objects.
Fixes errors that occur when alertmanager config is set but rules files are not.
Fixes 6d4f8be
9c22c68
to
4305b19
Compare
@bboreham I've addressed all your review points. I think I could sensibly split this into two PRs, where the first PR goes from master..3dc22b9 ( PTAL. |
4305b19
to
fd9fc8d
Compare
// findRulesConfigs helps GetAllRulesConfigs and GetRulesConfigs retrieve the | ||
// set of all active rules configurations across all our users. | ||
func (d DB) findRulesConfigs(filter squirrel.Sqlizer) (map[string]configs.VersionedRulesConfig, error) { | ||
rows, err := d.Select("id", "owner_id", "config ->> 'rules_files'"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now I don't understand why this doesn't bring back the entire history, so you will end up with some random version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, "SELECT DISTINCT ON ( expression [, ...] )
keeps only the first row of each set of rows"
and you have order by id DESC
below.
So effectively the same as LIMIT 1
elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want me to add a comment along those lines? If so, can you share the link to the doc you just quoted?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still not sure about selecting for non-blank rules, but guess that will be rare in the wild.
Thanks! My intent is that we won't be running that query for long, as we'll
be migrating the recording rules to a table entirely separate from the
alertmanager configs.
…On Thu, 11 Jan 2018 at 15:11 Bryan Boreham ***@***.***> wrote:
***@***.**** approved this pull request.
I'm still not sure about selecting for non-blank rules, but guess that
will be rare in the wild.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#620 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAHq6u9YXAKGiMN9JNw8TtmAVhYCXoS6ks5tJiSJgaJpZM4Q4pBm>
.
|
Part of #619
I was going to do alertmanager in the same PR, but this got big quickly.
I'm not wedded to putting the endpoints on the ruler itself.
Pros:
Cons:
I say "ruler scheduler", because I'm intending to split the scheduler from workers as soon as #619 is done.
One thing I'll insist on is that these endpoints with new behaviours be at different URLs to the ones currently being served.