The query result may miss some data points when ingester is restarting. #4054

wangzhao765 · 2021-04-07T15:20:10Z

Describe the bug
Query may return less data points than expect when restarting ingester.

To Reproduce
Steps to reproduce the behavior:

Start Cortex v1.8, ingester pod=3 and replica=2.
Run a range query in a loop with start=now-60s and end=now-10s, data interval and step is 10s.
Restart a ingester pod.
While ingester is restarting, the query may return a result with less than 6 data points. (response code is 200)

Expected behavior
The query should return a result with 6 data points.

Storage Engine

Blocks

Additional Context
Here are some query results. (Only differences in start time and end time, the queries are the same)

start=1617761918, end=1617761968
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"label": "series1"},"values":[[1617761910,"0.1"],[1617761920,"0"],[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"]]},{"metric":{"label": "series2"},"values":[[1617761910,"0.1"],[1617761920,"0"],[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"]]}]}}

start=1617761924, end=1617761974
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"label": "series2"},"values":[[1617761920,"0"],[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"],[1617761970,"0.1"]]}]}}

start=1617761931, end=1617761981
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"label": "series1"},"values":[[1617761980,"0"]]},{"metric":{"label": "series2"},"values":[[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"],[1617761970,"0.1"],[1617761980,"0"]]}]}}

start=1617761937, end=1617761987
{"status":"success","data":{"resultType":"matrix","result":[{"metric":{"label": "series1"},"values":[[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"],[1617761970,"0.1"],[1617761980,"0"]]},{"metric":{"label": "series2"},"values":[[1617761930,"0"],[1617761940,"0.1"],[1617761950,"0"],[1617761960,"0"],[1617761970,"0.1"],[1617761980,"0"]]}]}}

There are some missed data points in the result 2 and the result 3. And the missing data points are not the newest one, so the reason should not be the prometheus remote write deley.

bboreham · 2021-04-07T16:09:07Z

Sounds like #731

pracucci · 2021-04-21T08:47:47Z

Start Cortex v1.8, ingester pod=3 and replica=2.

If by "replica=2" you mean replication factor = 2, then I think it's a not-so-well tested configuration. I don't think we ever battle tested Cortex with RF=2. We typically run it with RF=3 and, my experience, you shouldn't have gaps in query results if RF=3 and -distributor.shard-by-all-labels enabled.

stale · 2021-07-20T13:18:04Z

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

stale bot added the stale label Jul 20, 2021

stale bot closed this as completed Aug 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The query result may miss some data points when ingester is restarting. #4054

The query result may miss some data points when ingester is restarting. #4054

wangzhao765 commented Apr 7, 2021

bboreham commented Apr 7, 2021

pracucci commented Apr 21, 2021

stale bot commented Jul 20, 2021

The query result may miss some data points when ingester is restarting. #4054

The query result may miss some data points when ingester is restarting. #4054

Comments

wangzhao765 commented Apr 7, 2021

bboreham commented Apr 7, 2021

pracucci commented Apr 21, 2021

stale bot commented Jul 20, 2021