Skip to content

Update prometheus to latest #763

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 6, 2018
Merged

Update prometheus to latest #763

merged 9 commits into from
Apr 6, 2018

Conversation

aaron7
Copy link
Contributor

@aaron7 aaron7 commented Mar 21, 2018

  • Updated github.com/prometheus/prometheus to latest
  • Updated github.com/prometheus/tsdb to latest
  • make update-vendor
  • Fixed API changes
  • Created querier.Config for MaxConcurrent and Timeout

Fixes #762 via prometheus/prometheus#3995 - fsnotify was moved and broke dep here.

@bboreham
Copy link
Contributor

Needs some tweak to compile.

@bboreham
Copy link
Contributor

OK, maybe more than tweaks.
Needs Go 1.10
Various changes to APIs.

@bboreham
Copy link
Contributor

I made those changes. I did not look more closely whether the underlying Prometheus changes let us remove some shim layers.

@aaron7
Copy link
Contributor Author

aaron7 commented Mar 26, 2018

I will take a look at test failures now.

@aaron7 aaron7 force-pushed the 762-update-prometheus branch from f798758 to 0571f24 Compare March 26, 2018 13:47
@aaron7 aaron7 requested a review from bboreham March 26, 2018 13:55
@aaron7 aaron7 force-pushed the 762-update-prometheus branch from 0571f24 to 871ab87 Compare March 26, 2018 13:59
@@ -91,13 +92,16 @@ func main() {
sampleQueryable := querier.NewQueryable(dist, chunkStore, false)
metadataQueryable := querier.NewQueryable(dist, chunkStore, true)

engine := promql.NewEngine(sampleQueryable, nil)
maxConcurrent := 20
timeout := 2 * time.Minute
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both of these should be command-line parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's put these in querier config

api := v1.NewAPI(
engine,
metadataQueryable,
querier.DummyTargetRetriever{},
querier.DummyAlertmanagerRetriever{},
func() config.Config { return config.Config{} },
map[string]string{}, // TODO: include configuration flags
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a recent addition to allow flags to be shown via api endpoint - prometheus/prometheus#3864

We will need to include querier flags here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK; I think we can happily pass blank.

@aaron7
Copy link
Contributor Author

aaron7 commented Mar 26, 2018

@bboreham I think the change you made here: https://github.com/weaveworks/cortex/pull/763/files#diff-bf05432384ae7a97e67ae71efaf68ffcR201 requires slightly different args.

  • cfg.NumWorkers - defaults to 1 - promql default engine has 20
  • cfg.GroupTimeout - defaults to 10s - promql default engine has 2 minutes

@aaron7
Copy link
Contributor Author

aaron7 commented Mar 26, 2018

Created querier.Config for query engine config and used in lite, querier and ruler with default values of 20 max concurrent queries and 2 minute query timeout.

@aaron7 aaron7 requested a review from tomwilkie March 26, 2018 14:29
@aaron7 aaron7 force-pushed the 762-update-prometheus branch 6 times, most recently from aaea966 to 883c457 Compare March 26, 2018 15:57
@bboreham
Copy link
Contributor

Why would we particularly want to follow the promql defaults?
The ruler already has its own configuration; we should use it.

@aaron7
Copy link
Contributor Author

aaron7 commented Mar 26, 2018

Why would we particularly want to follow the promql defaults?
The ruler already has its own configuration; we should use it.

The ruler config has:

  • ruler.num-workers default: 1 (we override to 10)- Number of rule evaluator worker routines in this process
  • ruler.group-timeout default: 10s (we override to 30s) - Timeout for rule group evaluation, including sending result to ingester

The new parameters we need to pass:

  • number of concurrent promql.Engine queries (prom default is 20)
  • timeout for individual queries (prom defaut is 2 mins)

ruler.num-workers seems different to number of concurrent engine queries. Should we share this?

Is the ruler.group-timeout default 10s suitable for query timeouts? We override to 30s, is this suitable? It seems like the answer is yes (there are some open GH issues about wanting to reduce the query timeout).

If we do share these values, should the config names be more generic and not ruler specific?

@bboreham
Copy link
Contributor

The number of workers in the ruler is also the maximum number of queries the ruler will issue at the same time. We don’t need another bound.

The group timeout is an upper bound on the time allowed for a single query; this seems close enough to just use it, in the ruler.

The querier is different. I can’t see an alternative to adding parameters for it. Maybe unify with the ruler, but removing the ruler-specific ones has to be done carefully.

@bboreham
Copy link
Contributor

bboreham commented Apr 5, 2018

Is it OK if I pick this up and fix it?

@aaron7 aaron7 force-pushed the 762-update-prometheus branch from 81d1588 to 1319a27 Compare April 6, 2018 13:24
@aaron7
Copy link
Contributor Author

aaron7 commented Apr 6, 2018

Sorry, other stuff took priority after on-call.

I pushed the change to reuse ruler config - after looking at how the ruler evals groups (sequentially) I agree we should be reusing ruler params.

Is it OK if I pick this up and fix it?

If there are any other changes to make, sure, please feel free to make them and I can then review.

@bboreham
Copy link
Contributor

bboreham commented Apr 6, 2018

Thanks! I added one commit, taking out a couple of TODOs

@bboreham
Copy link
Contributor

bboreham commented Apr 6, 2018

Couple of odd-looking files under vendor/github.com/json-iterator/go/ in that last commit.

@aaron7
Copy link
Contributor Author

aaron7 commented Apr 6, 2018

Couple of odd-looking files under vendor/github.com/json-iterator/go/ in that last commit.

Yep, these were removed in last commit after dep ensure - they shouldn't have been added during the rebase.

@aaron7 aaron7 merged commit 6252335 into master Apr 6, 2018
@aaron7 aaron7 deleted the 762-update-prometheus branch April 6, 2018 15:11
bboreham added a commit that referenced this pull request Apr 6, 2018
This reverts commit 6252335.

The new Prometheus web api is incompatible with the scheme Cortex uses
to differentiate sample from metadata queries.
@bboreham
Copy link
Contributor

bboreham commented Apr 6, 2018

Reverted in f2a3355 - see #787

@bboreham bboreham mentioned this pull request Apr 18, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants