Skip to content

Frequently getting errors when querying cortex #61

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jml opened this issue Oct 20, 2016 · 14 comments
Closed

Frequently getting errors when querying cortex #61

jml opened this issue Oct 20, 2016 · 14 comments
Milestone

Comments

@jml
Copy link
Contributor

jml commented Oct 20, 2016

On Weave Cloud dev & prod, getting:

Error executing query: too few successful reads, last error was: <nil>

when executing queries.

I haven't had a chance to investigate, but my guess is that deployments no longer visibly break the cluster (per #19) but they do silently break it, partly because we're not yet alerting on 500s for cortex at weaveworks (c.f. weaveworks/service#904).

This is probably something that will be fixed by clean shut-downs (which, at weaveworks, are blocked on weaveworks/service-conf#111), but I wanted to track something something symptom-oriented so I could have something to search for and also make notes on other things I found along the way.

Debugging this would be made easier by addressing #57, #58, #59, and #60.

@jml jml changed the title Frequently getting errors when querying prism Frequently getting errors when querying cortex Oct 25, 2016
@jml
Copy link
Contributor Author

jml commented Oct 25, 2016

c891ab5 might help a bit too.

@jml
Copy link
Contributor Author

jml commented Nov 2, 2016

weaveworks/service-conf#111 is fixed, but we're still not getting clean shutdowns. We're instrumenting more & getting better logging, and experimenting with extending the grace period.

@jml
Copy link
Contributor Author

jml commented Nov 2, 2016

Error is now:

Error executing query: could only find 1 ingesters for query. Need at least 2

@tomwilkie
Copy link
Contributor

Monitoring in place to catch this in the future.

Work to fix it is being tracked in #87

@jml
Copy link
Contributor Author

jml commented Nov 4, 2016

A whole bunch of people had pinged me about this problem & I've been directing them here to fix it. Think we should have kept this open until the problem it describes is actually fixed.

@tomwilkie
Copy link
Contributor

Shall we direct them to #87? Or close #87? No point in having two issues for the same thing.

5 similar comments
@tomwilkie
Copy link
Contributor

Shall we direct them to #87? Or close #87? No point in having two issues for the same thing.

@tomwilkie
Copy link
Contributor

Shall we direct them to #87? Or close #87? No point in having two issues for the same thing.

@tomwilkie
Copy link
Contributor

Shall we direct them to #87? Or close #87? No point in having two issues for the same thing.

@tomwilkie
Copy link
Contributor

Shall we direct them to #87? Or close #87? No point in having two issues for the same thing.

@tomwilkie
Copy link
Contributor

Shall we direct them to #87? Or close #87? No point in having two issues for the same thing.

@jml
Copy link
Contributor Author

jml commented Nov 4, 2016

Well, I think it is worth having separate things for symptom & cause, but I don't care super deeply about it. Let's direct to #87 (which I've added to the milestone). I'll paste the error message there too.

@tomwilkie tomwilkie reopened this Nov 4, 2016
@tomwilkie
Copy link
Contributor

Reopened at fixing #87 hasn't fixed the whole thing.

We're looking into ways of forcing k8s to do a one-at-a-time upgrade of the ingesters view readiness hooks.

@tomwilkie
Copy link
Contributor

With bug fixes yesterday, we no longer get 500s during the rollout.

tomwilkie added a commit that referenced this issue Jan 12, 2017
ed8e757 Merge pull request #61 from weaveworks/test-defaults
57856e6 Merge pull request #56 from weaveworks/remove-wcloud
dd5f3e6 Add -p flag to test, run test in parallel
62f6f94 Make no-go-get the default, and don't assume -tags netgo
8946588 Merge pull request #60 from weaveworks/2647-gc-weave-net-tests
4085df9 Scheduler now also garbage-collects VMs from weave-net-tests.
4b7d5c6 Merge pull request #59 from weaveworks/57-fix-lint-properly
b7f0e69 Merge pull request #58 from weaveworks/fix-lint
794702c Pin version of shfmt
ab1b11d Fix lint
d1a5e46 Remove wcloud cli tool
81d80f3 Merge pull request #55 from weaveworks/lint-tf
05ad5f2 Review feedback
4c0d046 Use hclfmt to lint terraform.

git-subtree-dir: tools
git-subtree-split: ed8e757406d337c234468febfecdb8a51d6a5bab
tomwilkie added a commit that referenced this issue Jan 16, 2017
2b3a1bb Merge pull request #62 from weaveworks/revert-61-test-defaults
8c3883a Revert "Make no-go-get the default, and don't assume -tags netgo"
e75c226 Fix bug in GC of firewall rules.
e49754e Merge pull request #51 from weaveworks/gc-firewall-rules
191f487 Add flag to enale/disable firewall rules' GC.
567905c Add GC of firewall rules for weave-net-tests to scheduler.
03119e1 Fix typo in GC of firewall rules.
bbe3844 Fix regular expression for firewall rules.
c5c23ce Pre-change refactoring: splitted gc_project function into smaller methods for better readability.
ed5529f GC firewall rules
ed8e757 Merge pull request #61 from weaveworks/test-defaults
57856e6 Merge pull request #56 from weaveworks/remove-wcloud
dd5f3e6 Add -p flag to test, run test in parallel
62f6f94 Make no-go-get the default, and don't assume -tags netgo
8946588 Merge pull request #60 from weaveworks/2647-gc-weave-net-tests
4085df9 Scheduler now also garbage-collects VMs from weave-net-tests.
4b7d5c6 Merge pull request #59 from weaveworks/57-fix-lint-properly
b7f0e69 Merge pull request #58 from weaveworks/fix-lint
794702c Pin version of shfmt
ab1b11d Fix lint
d1a5e46 Remove wcloud cli tool
81d80f3 Merge pull request #55 from weaveworks/lint-tf
05ad5f2 Review feedback
4c0d046 Use hclfmt to lint terraform.

git-subtree-dir: tools
git-subtree-split: 2b3a1bbfd122056e51212ee2a53970c9390d2aa7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants