-
Notifications
You must be signed in to change notification settings - Fork 816
Frequently getting errors when querying cortex #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
c891ab5 might help a bit too. |
weaveworks/service-conf#111 is fixed, but we're still not getting clean shutdowns. We're instrumenting more & getting better logging, and experimenting with extending the grace period. |
Error is now:
|
Monitoring in place to catch this in the future. Work to fix it is being tracked in #87 |
A whole bunch of people had pinged me about this problem & I've been directing them here to fix it. Think we should have kept this open until the problem it describes is actually fixed. |
5 similar comments
Well, I think it is worth having separate things for symptom & cause, but I don't care super deeply about it. Let's direct to #87 (which I've added to the milestone). I'll paste the error message there too. |
Reopened at fixing #87 hasn't fixed the whole thing. We're looking into ways of forcing k8s to do a one-at-a-time upgrade of the ingesters view readiness hooks. |
With bug fixes yesterday, we no longer get 500s during the rollout. |
ed8e757 Merge pull request #61 from weaveworks/test-defaults 57856e6 Merge pull request #56 from weaveworks/remove-wcloud dd5f3e6 Add -p flag to test, run test in parallel 62f6f94 Make no-go-get the default, and don't assume -tags netgo 8946588 Merge pull request #60 from weaveworks/2647-gc-weave-net-tests 4085df9 Scheduler now also garbage-collects VMs from weave-net-tests. 4b7d5c6 Merge pull request #59 from weaveworks/57-fix-lint-properly b7f0e69 Merge pull request #58 from weaveworks/fix-lint 794702c Pin version of shfmt ab1b11d Fix lint d1a5e46 Remove wcloud cli tool 81d80f3 Merge pull request #55 from weaveworks/lint-tf 05ad5f2 Review feedback 4c0d046 Use hclfmt to lint terraform. git-subtree-dir: tools git-subtree-split: ed8e757406d337c234468febfecdb8a51d6a5bab
2b3a1bb Merge pull request #62 from weaveworks/revert-61-test-defaults 8c3883a Revert "Make no-go-get the default, and don't assume -tags netgo" e75c226 Fix bug in GC of firewall rules. e49754e Merge pull request #51 from weaveworks/gc-firewall-rules 191f487 Add flag to enale/disable firewall rules' GC. 567905c Add GC of firewall rules for weave-net-tests to scheduler. 03119e1 Fix typo in GC of firewall rules. bbe3844 Fix regular expression for firewall rules. c5c23ce Pre-change refactoring: splitted gc_project function into smaller methods for better readability. ed5529f GC firewall rules ed8e757 Merge pull request #61 from weaveworks/test-defaults 57856e6 Merge pull request #56 from weaveworks/remove-wcloud dd5f3e6 Add -p flag to test, run test in parallel 62f6f94 Make no-go-get the default, and don't assume -tags netgo 8946588 Merge pull request #60 from weaveworks/2647-gc-weave-net-tests 4085df9 Scheduler now also garbage-collects VMs from weave-net-tests. 4b7d5c6 Merge pull request #59 from weaveworks/57-fix-lint-properly b7f0e69 Merge pull request #58 from weaveworks/fix-lint 794702c Pin version of shfmt ab1b11d Fix lint d1a5e46 Remove wcloud cli tool 81d80f3 Merge pull request #55 from weaveworks/lint-tf 05ad5f2 Review feedback 4c0d046 Use hclfmt to lint terraform. git-subtree-dir: tools git-subtree-split: 2b3a1bbfd122056e51212ee2a53970c9390d2aa7
On Weave Cloud dev & prod, getting:
when executing queries.
I haven't had a chance to investigate, but my guess is that deployments no longer visibly break the cluster (per #19) but they do silently break it, partly because we're not yet alerting on 500s for cortex at weaveworks (c.f. weaveworks/service#904).
This is probably something that will be fixed by clean shut-downs (which, at weaveworks, are blocked on weaveworks/service-conf#111), but I wanted to track something something symptom-oriented so I could have something to search for and also make notes on other things I found along the way.
Debugging this would be made easier by addressing #57, #58, #59, and #60.
The text was updated successfully, but these errors were encountered: