Skip to content

The query result may miss some data points when ingester is restarting. #4054

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wangzhao765 opened this issue Apr 7, 2021 · 3 comments
Closed
Labels

Comments

@wangzhao765
Copy link

Describe the bug
Query may return less data points than expect when restarting ingester.

To Reproduce
Steps to reproduce the behavior:

  1. Start Cortex v1.8, ingester pod=3 and replica=2.
  2. Run a range query in a loop with start=now-60s and end=now-10s, data interval and step is 10s.
  3. Restart a ingester pod.
  4. While ingester is restarting, the query may return a result with less than 6 data points. (response code is 200)

Expected behavior
The query should return a result with 6 data points.

Storage Engine

  • Blocks

Additional Context
Here are some query results. (Only differences in start time and end time, the queries are the same)

start=1617761918, end=1617761968
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"label": "series1"},"values":[[1617761910,"0.1"],[1617761920,"0"],[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"]]},{"metric":{"label": "series2"},"values":[[1617761910,"0.1"],[1617761920,"0"],[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"]]}]}}

start=1617761924, end=1617761974
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"label": "series2"},"values":[[1617761920,"0"],[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"],[1617761970,"0.1"]]}]}}

start=1617761931, end=1617761981
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"label": "series1"},"values":[[1617761980,"0"]]},{"metric":{"label": "series2"},"values":[[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"],[1617761970,"0.1"],[1617761980,"0"]]}]}}

start=1617761937, end=1617761987
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"label": "series1"},"values":[[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"],[1617761970,"0.1"],[1617761980,"0"]]},{"metric":{"label": "series2"},"values":[[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"],[1617761970,"0.1"],[1617761980,"0"]]}]}}

There are some missed data points in the result 2 and the result 3. And the missing data points are not the newest one, so the reason should not be the prometheus remote write deley.

@bboreham
Copy link
Contributor

bboreham commented Apr 7, 2021

Sounds like #731

@pracucci
Copy link
Contributor

Start Cortex v1.8, ingester pod=3 and replica=2.

If by "replica=2" you mean replication factor = 2, then I think it's a not-so-well tested configuration. I don't think we ever battle tested Cortex with RF=2. We typically run it with RF=3 and, my experience, you shouldn't have gaps in query results if RF=3 and -distributor.shard-by-all-labels enabled.

@stale
Copy link

stale bot commented Jul 20, 2021

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Jul 20, 2021
@stale stale bot closed this as completed Aug 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants