Skip to content

Ingesters are routinely taking longer than 20mins to flush. #158

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 of 6 tasks
tomwilkie opened this issue Nov 24, 2016 · 16 comments
Closed
4 of 6 tasks

Ingesters are routinely taking longer than 20mins to flush. #158

tomwilkie opened this issue Nov 24, 2016 · 16 comments
Assignees

Comments

@tomwilkie
Copy link
Contributor

tomwilkie commented Nov 24, 2016

Discussed on slack this morning, there are 2 things going on here:

  1. we are doing too many writes to dynamodb
  2. we're not getting anyway near our dynamodb provisioned throughput.

To help with (1), we plan to reduce the number of writes to dynamodb by:

To help with (2), we plan on moving to weekly dynamodb tables, to minimise the number of shards per table, and reduce the impact of poorly-balanced writes to the dynamo shards. We will need:

  • ensure the IAM policies allow cortex to write to any cortex table
  • decide whose responsibility it is to create the tables and manage the provisioned capacity of the tables
  • update the cortex chunk store to write the "big bucket" to the correct table and read from the correct table.

Other considerations:

  • It's required that the weekly tables align with the big buckets (such that buckets don't span tables).
  • As we plan on going with 24hr buckets, we'll ensure they are in UTC and run from midnight to midnight (/cc @juliusv).
  • We'll then ensure the weekly tables start and stop in the middle of week (Wednesday at midnight UTC), such that is something goes wrong the chance someone is around to fix it is higher.
  • For backwards compatibility, we'll have a flag that specifies when the switch over to weekly tables happens.

Open questions:

  • whose responsibility it is to create the tables and manage the provisioned capacity?
  • how to calculate the week wrt leap seconds etc?
@juliusv juliusv added this to the Dogfooding milestone Jan 3, 2017
@tomwilkie
Copy link
Contributor Author

The code to manage the tables will need to:

  • Create the new table ahead of time, if it doesn't already exist. Some random jitter should be good enough to ensure multiple replicas can all do this concurrently.
  • Add the configured provisioned write throughput to said table (in addition to the existing table).
  • At an appropriate time in the future, once we can guarantee no more writes will go to the old table, reduce the provisioned throughput to zero.

So, when can we guarantee no more writes will go to the old table? If the max chunk age is fixed (it isn't right now), then we can - see #127.

@jml
Copy link
Contributor

jml commented Jan 3, 2017

I made some cosmetic edits to the original post to make it a bit easier to read.

we are doing too many writes to dynamodb

  • how many are we doing? (got a link to a grafana graph?)
  • how many is too many?
  • how many is few enough?

we plan on moving to weekly dynamodb tables, to minimise the number of shards per table, and reduce the impact of poorly-balanced writes to the dynamo shards

I'm pretty sure I understood the logic for this last year, but I've forgotten. Can you please explain a bit more?

whose responsibility it is to create the tables and manage the provisioned capacity?

I'd prefer a new, distinct job for the following reasons:

  • isolates privileges
  • makes the system architecture more comprehensible
  • it's quite AWS specific, and I have a vague wish that cortex's design allow for backends other than dynamodb

@tomwilkie
Copy link
Contributor Author

tomwilkie commented Jan 3, 2017

how many are we doing? (got a link to a grafana graph?)

More interesting is the number of writes per chunk: http://frontend.dev.weave.works/admin/grafana/dashboard/file/cortex-chunks.json?panelId=8&fullscreen&from=1483430633952&to=1483459433952

Peaked at over 30 whilst flushing some ingester that had been running for a while, needs to be less than 10 for the pricing to work out.

I'm pretty sure I understood the logic for this last year, but I've forgotten. Can you please explain a bit more?

See https://aws.amazon.com/premiumsupport/knowledge-center/throttled-ddb/

My DynamoDB table has been throttled, but my consumed capacity units are still below my provisioned capacity units

The dev table was 900GB, and each shard is 10GB, so we had 90 shards. At 2000 write capacity, thats only 20 writes/shard.

The entropy we use (user id, metric name) to distribute amongst the ingesters is the same entropy we use in the hash (distribution) key in dynamodb - (user id, metrics name and time) - so when shutting down one ingester, we shouldn't expect uniform distribution of writes across the shards, hence we can't saturate our provisioned capacity when flushing.

Migrating to an empty table (with a single shard) allowed us to get much closer to the provisioned throughput, which IMO confirms this theory.

I'd prefer a new, distinct job for the following reasons:

I am also cool with this.

@tomwilkie
Copy link
Contributor Author

Re: aligning buckets and tables, discussed on slack and decided this is not necessary (and is hard). Instead, we'll go with the rule that a bucket lives in the table containing the buckets start time - allowing bucket time ranges to 'span' table time ranges, but only be written once.

Something along the lines of:

bucket = timestamp / bucket_size
table = bucket * bucket_size / table_size

@jml
Copy link
Contributor

jml commented Jan 3, 2017

The entropy we use (user id, metric name) to distribute amongst the ingesters is the same entropy we use in the hash (distribution) key in dynamodb - (user id, metrics name and time)

Can we change that?

@tomwilkie
Copy link
Contributor Author

Can we change that?

We can, if you can think of a better scheme.

@juliusv
Copy link
Contributor

juliusv commented Jan 3, 2017

@jml More entropy = better distribution, but harder to find things again. Would need a cleverer indexing scheme for queries, and we haven't been able to come up with one yet.

@jml
Copy link
Contributor

jml commented Jan 3, 2017

Yeah, I'm thinking about it now and basically anything that lets an ingester find it again by computation doesn't actually change the entropy

@jml
Copy link
Contributor

jml commented Jan 3, 2017

Although I guess you could do something crazy and append a random number between [0, 4) on write and then read from all 4 every time.

@tomwilkie
Copy link
Contributor Author

Yup, we could do that. Id prefer to investigate alternative indexing schemes first though.

@tomwilkie
Copy link
Contributor Author

(And by that I mean #13)

@juliusv
Copy link
Contributor

juliusv commented Jan 3, 2017

Yeah, except #13 would only help with the ingester load balancing, not DynamoDB (more entropy in DynamoDB would then require a multi-stage index, probably not helping the cause).

@tomwilkie
Copy link
Contributor Author

Going to use a different ticket to track the weekly buckets work, as this ticket tracks a bunch of stuff: #189

@tomwilkie
Copy link
Contributor Author

#201 is probably a major cause.

@tomwilkie
Copy link
Contributor Author

With #201 fixed, and the daily buckets and weekly tables, ingesters are taking 10 mins to flush in dev!

@tomwilkie
Copy link
Contributor Author

Is in prod, should go live tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants