-
Notifications
You must be signed in to change notification settings - Fork 833
Description
Filing this as a stand-alone issue to back up our GSOC submission https://github.com/cncf/soc#improve-ingester-handover
Description: The ingester is a stateful component in the Cortex ecosystem that builds Prometheus chunks from incoming samples. In order to distribute load, a Distributed Hash Table is used to route requests to different Ingesters. The current implementation only allows users to scale up their ingester pools by 1 Ingester per 12 hour period, which is not great when load changes dramatically. This project will be to improve how Ingesters hand over their data when they are being created or deleted in order to easily scale.
The work should include extensive testing as this is a critical piece of code. Ideally repeatable, scripted, integrations tests (related: #1271)
We can break the subject down into sub-goals, to allow that the task may take more time or less time:
- during a rolling update, hand-over from one ingester to another (currently data is "spilled" to other ingesters which is inefficient (Mysterious flush of underutilised chunks 1hr after ingester rollout #467))
- adding an ingester (currently series simply end in some ingesters and start from blank in the new ingester - would be better to hand over)
- removing an ingester (currently we stop accepting data and flush all partial chunks to disk which can take an hour - would be better to redistribute to remaining ingesters)