-
Notifications
You must be signed in to change notification settings - Fork 816
Tune chunk size #11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
From @jml Why should it be 1hr? |
From @tomwilkie We should investigate what the right number is, but the parameters are:
Ticket should really say "max 1hr" to bound the loss, if that give good utilization |
This is possibly related to the dynamo errors we are seeing in #85 |
Oh wow yeah, the default chunk max age of 10 minutes seems way too low. I'm wondering why we're still achieving such decent chunk utilization ( |
I suspect it can't flush chunks quickly enough, and therefore they are On Wednesday, 2 November 2016, Julius Volz [email protected] wrote:
|
At least the failures should not have a big effect because during normal operation, only ~4% of chunk puts fail ( |
Actually make sense, since we're on doubledelta (not varbit). So its about 3.3 bytes per sample, at 15s scrape interval == about 20mins per chunk. With 10mins, you'd expect 50% utilisation. |
Hmm, how do you get to 20 mins per chunk at 15s scrape interval and 3.3 bytes per sample? 1024 / 3.3 = 310 samples per chunk, but 20 minutes of samples would only be 4 * 20 = 80 samples? So a chunk should be full after ~ 310 / 4 = 77 minutes. Or am I missing something stupid? |
Nope, I was being stupid. I did 300/15 not 300*15. |
Okay, bit more progress: 99th percentil chunk "age" is 27mins on flush. This could explain the higher utilisation. Just added a dashboard for it, will link to it when it live. http://frontend.dev.weave.works/admin/grafana/dashboard/file/cortex-chunks.json |
So, the question is why are some chunks 27mins old? Thoughts:
|
Except:
|
Average number of entries per chunk is 8.6 here And its no coincident that 8.6 * 3min is 27mins - which is the 99%ile chunk age... |
With the latest change, we may be writing chunks more than once. Needs fixing. |
Set to 1hr and behaving as expected in #118 |
…e/e02797ac7f3b68f08c7b778c95c60ba82303f81b Pre release/e02797ac7f3b68f08c7b778c95c60ba82303f81b
From @tomwilkie
Currently 10mins, should be 1hr.
Copied from original issue: tomwilkie/frankenstein#10
The text was updated successfully, but these errors were encountered: