Skip to content

Improve Ingester Handover #1277

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
bboreham opened this issue Mar 13, 2019 · 2 comments
Closed

Improve Ingester Handover #1277

bboreham opened this issue Mar 13, 2019 · 2 comments

Comments

@bboreham
Copy link
Contributor

bboreham commented Mar 13, 2019

Filing this as a stand-alone issue to back up our GSOC submission https://github.com/cncf/soc#improve-ingester-handover

Description: The ingester is a stateful component in the Cortex ecosystem that builds Prometheus chunks from incoming samples. In order to distribute load, a Distributed Hash Table is used to route requests to different Ingesters. The current implementation only allows users to scale up their ingester pools by 1 Ingester per 12 hour period, which is not great when load changes dramatically. This project will be to improve how Ingesters hand over their data when they are being created or deleted in order to easily scale.

The work should include extensive testing as this is a critical piece of code. Ideally repeatable, scripted, integrations tests (related: #1271)

We can break the subject down into sub-goals, to allow that the task may take more time or less time:

  • during a rolling update, hand-over from one ingester to another (currently data is "spilled" to other ingesters which is inefficient (Mysterious flush of underutilised chunks 1hr after ingester rollout #467))
  • adding an ingester (currently series simply end in some ingesters and start from blank in the new ingester - would be better to hand over)
  • removing an ingester (currently we stop accepting data and flush all partial chunks to disk which can take an hour - would be better to redistribute to remaining ingesters)

Other related issues: #775, #1220

@rfratto
Copy link
Contributor

rfratto commented Sep 25, 2019

I've written up a design document that should cover the three sub-goals (preventing spillover, joining ingesters, and leaving ingesters). PTAL, I'm hoping to implement this so Loki can utilize the new handover as well: https://docs.google.com/document/d/1y2TdfEQ9ZKh6CpBVB4o6BYjCr-plNRL9jGD6fJ9bMW0/edit#

@bboreham
Copy link
Contributor Author

When using WAL we don't do hand-overs, so this has not received any attention.
Also partial hand-over is very difficult to do correctly when using blocks storage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants