-
Notifications
You must be signed in to change notification settings - Fork 818
out-of-order labelset in compactor #5419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
|
It's important to know if the block having the issues is a block generated by a ingester or is a new block created by the compactor. |
The meta.json for the block in s3 says
Is that already after the vertical compaction or is level 1 the starting point? Will see if I can find something in ingester logs. |
According to this cortex/vendor/github.com/prometheus/prometheus/tsdb/compact.go Line 495 in 3d94719
Unless I'm mistaken. |
hi @nschad, do you have compression enabled on your GRPC client? if so, which one? |
Ok.. Im not sure but this may be related to #5193 not playing well with grpc/grpc-go#6355 in cases of timeout. I will create a PR to not reuse the request in case of error just in case and release 1.15.2. |
grpc compression is disabled. |
@alanprot We have run 1.15.3 since Friday and haven't experienced the issue yet. |
@nschad Thanks. Feel free to close the issue when you are confident that this was the problem. |
out-of-order Labelset in compactor
We have noticed that sometimes (since a couple of days) blocks appear with "out-of-order" errors in the compactor log. Example below. We had different and seemingly random errors, affecting different labelsets and different metrics. Sometimes labels were duped and literally appearing twice in the same set with different values. In this example you can see that, what is supposed to be
beta_kubernetes_instance_type
, is completely corrupted. Because of this corruption the required sorting of the labels in the set is wrong, thus the error. This is also the only occurence in the block. Running tsdb analyze shows that other metrics do not have this buggy label.Additionally the Grafana Metric Browser dashboard sometimes shows labels like these. Which is interesting to me since the labels there are fetched through the prometheus

/api/v1/labels
API, right? For me it means the problem already exists in the ingesters and we can remove potential data corruption at the s3 layer.To Reproduce
Steps to reproduce the behavior:
Expected behavior
non corrupted data
Environment:
Additional Context
Also we don't run any prometheus below 2.35
The text was updated successfully, but these errors were encountered: