Improve Ingester Handover #1277

bboreham · 2019-03-13T11:20:34Z

Filing this as a stand-alone issue to back up our GSOC submission https://github.com/cncf/soc#improve-ingester-handover

Description: The ingester is a stateful component in the Cortex ecosystem that builds Prometheus chunks from incoming samples. In order to distribute load, a Distributed Hash Table is used to route requests to different Ingesters. The current implementation only allows users to scale up their ingester pools by 1 Ingester per 12 hour period, which is not great when load changes dramatically. This project will be to improve how Ingesters hand over their data when they are being created or deleted in order to easily scale.

The work should include extensive testing as this is a critical piece of code. Ideally repeatable, scripted, integrations tests (related: #1271)

We can break the subject down into sub-goals, to allow that the task may take more time or less time:

during a rolling update, hand-over from one ingester to another (currently data is "spilled" to other ingesters which is inefficient (Mysterious flush of underutilised chunks 1hr after ingester rollout #467))
adding an ingester (currently series simply end in some ingesters and start from blank in the new ingester - would be better to hand over)
removing an ingester (currently we stop accepting data and flush all partial chunks to disk which can take an hour - would be better to redistribute to remaining ingesters)

Other related issues: #775, #1220

rfratto · 2019-09-25T17:14:50Z

I've written up a design document that should cover the three sub-goals (preventing spillover, joining ingesters, and leaving ingesters). PTAL, I'm hoping to implement this so Loki can utilize the new handover as well: https://docs.google.com/document/d/1y2TdfEQ9ZKh6CpBVB4o6BYjCr-plNRL9jGD6fJ9bMW0/edit#

bboreham · 2020-07-30T14:54:35Z

When using WAL we don't do hand-overs, so this has not received any attention.
Also partial hand-over is very difficult to do correctly when using blocks storage.

bboreham added component/ingester help wanted labels Mar 17, 2019

rfratto mentioned this issue Oct 28, 2019

Incrementally transfer chunks per token to improve handover #1764

Closed

bboreham closed this as completed Jul 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Ingester Handover #1277

Improve Ingester Handover #1277

bboreham commented Mar 13, 2019 •

edited

Loading

rfratto commented Sep 25, 2019

bboreham commented Jul 30, 2020

Improve Ingester Handover #1277

Improve Ingester Handover #1277

Comments

bboreham commented Mar 13, 2019 • edited Loading

rfratto commented Sep 25, 2019

bboreham commented Jul 30, 2020

bboreham commented Mar 13, 2019 •

edited

Loading