Skip to content

Improve Ingester Handover #1277

@bboreham

Description

@bboreham

Filing this as a stand-alone issue to back up our GSOC submission https://github.com/cncf/soc#improve-ingester-handover

Description: The ingester is a stateful component in the Cortex ecosystem that builds Prometheus chunks from incoming samples. In order to distribute load, a Distributed Hash Table is used to route requests to different Ingesters. The current implementation only allows users to scale up their ingester pools by 1 Ingester per 12 hour period, which is not great when load changes dramatically. This project will be to improve how Ingesters hand over their data when they are being created or deleted in order to easily scale.

The work should include extensive testing as this is a critical piece of code. Ideally repeatable, scripted, integrations tests (related: #1271)

We can break the subject down into sub-goals, to allow that the task may take more time or less time:

  • during a rolling update, hand-over from one ingester to another (currently data is "spilled" to other ingesters which is inefficient (Mysterious flush of underutilised chunks 1hr after ingester rollout #467))
  • adding an ingester (currently series simply end in some ingesters and start from blank in the new ingester - would be better to hand over)
  • removing an ingester (currently we stop accepting data and flush all partial chunks to disk which can take an hour - would be better to redistribute to remaining ingesters)

Other related issues: #775, #1220

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions