Skip to content

Add start/end time range support to label names and values APIs #2848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pracucci opened this issue Jul 7, 2020 · 13 comments
Closed

Add start/end time range support to label names and values APIs #2848

pracucci opened this issue Jul 7, 2020 · 13 comments
Labels

Comments

@pracucci
Copy link
Contributor

pracucci commented Jul 7, 2020

Prometheus has introduced (prometheus/prometheus#7288) the optional support to start and end parameters for the following API endpoints:

  • /api/v1/labels
  • /api/v1/label/<label_name>/values

We should do the same in Cortex.

@stale
Copy link

stale bot commented Sep 5, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Sep 5, 2020
@pracucci pracucci added the storage/blocks Blocks storage engine label Sep 14, 2020
@stale stale bot removed the stale label Sep 14, 2020
@pracucci
Copy link
Contributor Author

Still valid

@yeya24
Copy link
Contributor

yeya24 commented Sep 17, 2020

Hi, I am looking at this issue.

For block storage, if we want to support time ranges, do you think it still makes sense to keep the behavior in https://github.com/cortexproject/cortex/blob/master/pkg/ingester/ingester_v2.go#L637? Only query the head range.

@pracucci
Copy link
Contributor Author

Thanks @yeya24 for your interested! We definitely want some help here.

Some preliminary thoughts:

  1. This feature is tricky because large time ranges could OOM ingesters (and we don't want it). For this reason, we may want to introduce a max time range limit on the "end - start" period, which - as all other limits in Cortex - should be overridable by user (easy to build in Cortex).
  2. We need to query both ingesters (as we do it right now) and blocks in the long-term storage (via store-gateway).

To keep the work easier, we may address (1) and (2) separately. Doing (1) first, and then (2). When the time range is not specified, I believe it's better to keep the current approach of querying the TSDB head only.

What do you think? /cc @pstibrany

@pstibrany
Copy link
Contributor

pstibrany commented Sep 17, 2020

  1. This feature is tricky because large time ranges could OOM ingesters (and we don't want it). For this reason, we may want to introduce a max time range limit on the "end - start" period, which - as all other limits in Cortex - should be overridable by user (easy to build in Cortex).

With extra effort we could modify LabelValues operation on ingester to use streaming. This would reduce memory pressure on ingesters, and put it onto queriers instead. But they already must merge and deduplicate different incoming results together. This could be combined with splitting big range into multiple smaller ranges – which can generate duplicate labels, but we can leave that for querier to handle. [I don't think we would need range splitting when doing streaming.]

  1. We need to query both ingesters (as we do it right now) and blocks in the long-term storage (via store-gateway).

👍

When the time range is not specified, I believe it's better to keep the current approach of querying the TSDB head only.

👍

@pstibrany
Copy link
Contributor

One more comment... limiting user's time range may not be very useful. Even small ranges can contain large number of labels, and on the other hand extending the range may not increase returned labels too much.

We could instead have a limit for returned number of labels.

@pstibrany
Copy link
Contributor

We could instead have a limit for returned number of labels.

Or two limits (but that's perhaps too low-level): one for ingesters, one for entire query.

@pracucci
Copy link
Contributor Author

With extra effort we could modify LabelValues operation on ingester to use streaming.

But in TSDB you can't query label names/values in a streaming fashion. Am I wrong?

We could instead have a limit for returned number of labels.

But you would have probably already paid most of the cost of fetching label names/values in a distributed system until you realise the limit has been reached.

@pstibrany
Copy link
Contributor

With extra effort we could modify LabelValues operation on ingester to use streaming.

But in TSDB you can't query label names/values in a streaming fashion. Am I wrong?

You're right. Somehow I was under impression that we could do that, but Querier API doesn't support that.

@gouthamve
Copy link
Contributor

When you say Querier API, which API are you referring to? Because we should change it :)

@pstibrany
Copy link
Contributor

When you say Querier API, which API are you referring to? Because we should change it :)

storage.Querier (which extends storage.LabelQuerier)

@pracucci
Copy link
Contributor Author

This issue is also related to #2681.

@gouthamve
Copy link
Contributor

I would close this based on #3347

I think the particular feature request here is addressed in #3347 and everything else is covered in #2681

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants