Skip to content

Long-term index caching misses chunks on lookups #1698

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gouthamve opened this issue Sep 25, 2019 · 0 comments · Fixed by #1699
Closed

Long-term index caching misses chunks on lookups #1698

gouthamve opened this issue Sep 25, 2019 · 0 comments · Fixed by #1699

Comments

@gouthamve
Copy link
Contributor

In GetChunksForSeries, we do a RangeValueStart on bucket.from. Now I thought if the bucket.hashKey is same, the bucket.from is also same, turns out that is soo not true.

bucket.from is the relative milliseconds from the actual bucket start time.

Now I did not realise this while building the long-term caching approach, I thought if the hashKey and RangeKey match, everything else matches:

// When deduping, the bucket values only influence TableName and HashValue
// and just checking those is enough.

But when we split the ranges, we split the actual from and to, to "cacheable" and "active" ranges here:

cFrom, cThrough, from, through := splitTimesByCacheability(from, through, model.TimeFromUnix(mtime.Now().Add(-s.cacheOlderThan).Unix()))

and end up picking only the "active" range query on merge. This means the bucket.from is higher than it should be and we end up filtering chunks out. This causes gaps in the queries and we're dropping entire chunks to the floor.

gouthamve added a commit to gouthamve/cortex that referenced this issue Oct 16, 2019
An attempt to fix cortexproject#1698

We don't mix things when the time-range for the query overlaps the
"active" time-range. We consider all index entries as active. This is
because the `IndexQuery` fields depend on the `from` value and changing
it might mess things up.

This is kinda only effective when paired with query-frontend as most
queries issued fall in the active-range, but the query-frontend with
it's splitting would make sure the queriers actually only see some
queries that are totally in the non-active range.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
gouthamve added a commit to gouthamve/cortex that referenced this issue Oct 18, 2019
gouthamve added a commit that referenced this issue Oct 18, 2019
Signed-off-by: Goutham Veeramachaneni <[email protected]>
cyriltovena pushed a commit to cyriltovena/loki that referenced this issue Jun 11, 2021
An attempt to fix cortexproject/cortex#1698

We don't mix things when the time-range for the query overlaps the
"active" time-range. We consider all index entries as active. This is
because the `IndexQuery` fields depend on the `from` value and changing
it might mess things up.

This is kinda only effective when paired with query-frontend as most
queries issued fall in the active-range, but the query-frontend with
it's splitting would make sure the queriers actually only see some
queries that are totally in the non-active range.

Signed-off-by: Goutham Veeramachaneni <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant