Skip to content

Do not consider PENDING state as healthy #1866

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conversation

pracucci
Copy link
Contributor

@pracucci pracucci commented Dec 2, 2019

What this PR does:
While working on the PR #1818 I've realized that an ingester is considered healthy for read operations while PENDING but not while JOINING. From my perspective, an ingester shouldn't be considered healthy while PENDING, so I'm suggesting to:

  1. Do not consider PENDING state as healthy
  2. Explicitly enumerate healthy ingester states for the read path

In the read path, there are some API endpoints like LabelNames and LabelValuesForLabelName for which we query all healthy ingesters (see distributor.forAllIngesters()). A PENDING ingester shouldn't hold any data yet, so shouldn't be required to be hit.

A comment I've received in a previous discussion on this topic is that this change may introduce quorum issues during ingesters rollout. However, I can't see a real difference compared to when the ingester switches from PENDING to JOINING, considered that we already consider the JOINING state as unhealthy.

Which issue(s) this PR fixes:
No issue

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

… not consider PENDING state as healthy

Signed-off-by: Marco Pracucci <[email protected]>
@pstibrany
Copy link
Contributor

Personally, I would find it easier to read IsHealthy method, if "true" cases were more explicit. As it is now, each time I read it, I find it confusing. :-(

gouthamve
gouthamve previously approved these changes Dec 6, 2019
Copy link
Contributor

@gouthamve gouthamve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I I think this is better yeah.

@csmarchbanks
Copy link
Contributor

I am concerned that this could break queries after a single ingester failure. If an ingester pod vanishes (node issue or something), it will still be unhealthy in the ring, and now the new pending pod will also be considered unhealthy causing all reads to fail. Would you be able to test that?

@pstibrany pstibrany mentioned this pull request Dec 6, 2019
3 tasks
@bboreham
Copy link
Contributor

bboreham commented Dec 9, 2019

I think Chris is correct, and I think the solution is not to count pending ingesters as an error.
That implies a change to Ring.GetAll().
Maybe we can lose reallyAll at the same time.

@stale
Copy link

stale bot commented Mar 17, 2020

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale label Mar 17, 2020
@gouthamve gouthamve added the keepalive Skipped by stale bot label Mar 17, 2020
@stale stale bot removed the stale label Mar 17, 2020
@pracucci
Copy link
Contributor Author

pracucci commented Jul 1, 2020

Closing for now

@pracucci pracucci closed this Jul 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
keepalive Skipped by stale bot
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants