Ingesters are routinely taking longer than 20mins to flush. #158

tomwilkie · 2016-11-24T20:42:26Z

Discussed on slack this morning, there are 2 things going on here:

we are doing too many writes to dynamodb
we're not getting anyway near our dynamodb provisioned throughput.

To help with (1), we plan to reduce the number of writes to dynamodb by:

only writing to a quorum of the ingesters (2/3 load) - see Don't try and read/write all ingesters in a set; only do it if one fails/times out #100
varbit encoding -> longer chunks (1/2 load) - see Switch to varbit chunk encoding to improve compression #130
bigger big buckets (currently each chunk gets written to multiple big bucket, as chunks are bigger than buckets - 1/3 load) - see Larger buckets in chunk index #10

To help with (2), we plan on moving to weekly dynamodb tables, to minimise the number of shards per table, and reduce the impact of poorly-balanced writes to the dynamo shards. We will need:

ensure the IAM policies allow cortex to write to any cortex table
decide whose responsibility it is to create the tables and manage the provisioned capacity of the tables
update the cortex chunk store to write the "big bucket" to the correct table and read from the correct table.

Other considerations:

It's required that the weekly tables align with the big buckets (such that buckets don't span tables).
As we plan on going with 24hr buckets, we'll ensure they are in UTC and run from midnight to midnight (/cc @juliusv).
We'll then ensure the weekly tables start and stop in the middle of week (Wednesday at midnight UTC), such that is something goes wrong the chance someone is around to fix it is higher.
For backwards compatibility, we'll have a flag that specifies when the switch over to weekly tables happens.

Open questions:

whose responsibility it is to create the tables and manage the provisioned capacity?
how to calculate the week wrt leap seconds etc?

tomwilkie · 2017-01-03T15:40:23Z

The code to manage the tables will need to:

Create the new table ahead of time, if it doesn't already exist. Some random jitter should be good enough to ensure multiple replicas can all do this concurrently.
Add the configured provisioned write throughput to said table (in addition to the existing table).
At an appropriate time in the future, once we can guarantee no more writes will go to the old table, reduce the provisioned throughput to zero.

So, when can we guarantee no more writes will go to the old table? If the max chunk age is fixed (it isn't right now), then we can - see #127.

jml · 2017-01-03T15:56:56Z

I made some cosmetic edits to the original post to make it a bit easier to read.

we are doing too many writes to dynamodb

how many are we doing? (got a link to a grafana graph?)
how many is too many?
how many is few enough?

we plan on moving to weekly dynamodb tables, to minimise the number of shards per table, and reduce the impact of poorly-balanced writes to the dynamo shards

I'm pretty sure I understood the logic for this last year, but I've forgotten. Can you please explain a bit more?

whose responsibility it is to create the tables and manage the provisioned capacity?

I'd prefer a new, distinct job for the following reasons:

isolates privileges
makes the system architecture more comprehensible
it's quite AWS specific, and I have a vague wish that cortex's design allow for backends other than dynamodb

tomwilkie · 2017-01-03T16:11:14Z

how many are we doing? (got a link to a grafana graph?)

More interesting is the number of writes per chunk: http://frontend.dev.weave.works/admin/grafana/dashboard/file/cortex-chunks.json?panelId=8&fullscreen&from=1483430633952&to=1483459433952

Peaked at over 30 whilst flushing some ingester that had been running for a while, needs to be less than 10 for the pricing to work out.

I'm pretty sure I understood the logic for this last year, but I've forgotten. Can you please explain a bit more?

See https://aws.amazon.com/premiumsupport/knowledge-center/throttled-ddb/

My DynamoDB table has been throttled, but my consumed capacity units are still below my provisioned capacity units

The dev table was 900GB, and each shard is 10GB, so we had 90 shards. At 2000 write capacity, thats only 20 writes/shard.

The entropy we use (user id, metric name) to distribute amongst the ingesters is the same entropy we use in the hash (distribution) key in dynamodb - (user id, metrics name and time) - so when shutting down one ingester, we shouldn't expect uniform distribution of writes across the shards, hence we can't saturate our provisioned capacity when flushing.

Migrating to an empty table (with a single shard) allowed us to get much closer to the provisioned throughput, which IMO confirms this theory.

I'd prefer a new, distinct job for the following reasons:

I am also cool with this.

tomwilkie · 2017-01-03T16:14:49Z

Re: aligning buckets and tables, discussed on slack and decided this is not necessary (and is hard). Instead, we'll go with the rule that a bucket lives in the table containing the buckets start time - allowing bucket time ranges to 'span' table time ranges, but only be written once.

Something along the lines of:

bucket = timestamp / bucket_size
table = bucket * bucket_size / table_size

jml · 2017-01-03T16:20:18Z

The entropy we use (user id, metric name) to distribute amongst the ingesters is the same entropy we use in the hash (distribution) key in dynamodb - (user id, metrics name and time)

Can we change that?

tomwilkie · 2017-01-03T16:20:48Z

Can we change that?

We can, if you can think of a better scheme.

juliusv · 2017-01-03T16:22:49Z

@jml More entropy = better distribution, but harder to find things again. Would need a cleverer indexing scheme for queries, and we haven't been able to come up with one yet.

jml · 2017-01-03T16:27:24Z

Yeah, I'm thinking about it now and basically anything that lets an ingester find it again by computation doesn't actually change the entropy

jml · 2017-01-03T17:28:56Z

Although I guess you could do something crazy and append a random number between [0, 4) on write and then read from all 4 every time.

tomwilkie · 2017-01-03T17:29:48Z

Yup, we could do that. Id prefer to investigate alternative indexing schemes first though.

tomwilkie · 2017-01-03T17:30:08Z

(And by that I mean #13)

juliusv · 2017-01-03T23:17:20Z

Yeah, except #13 would only help with the ingester load balancing, not DynamoDB (more entropy in DynamoDB would then require a multi-stage index, probably not helping the cause).

tomwilkie · 2017-01-10T11:40:50Z

Going to use a different ticket to track the weekly buckets work, as this ticket tracks a bunch of stuff: #189

tomwilkie · 2017-01-12T12:02:25Z

#201 is probably a major cause.

tomwilkie · 2017-01-12T13:33:52Z

With #201 fixed, and the daily buckets and weekly tables, ingesters are taking 10 mins to flush in dev!

tomwilkie · 2017-01-18T12:03:18Z

Is in prod, should go live tomorrow.

juliusv added this to the Dogfooding milestone Jan 3, 2017

tomwilkie self-assigned this Jan 4, 2017

tomwilkie mentioned this issue Jan 4, 2017

Use weekly tables in the chunk store #181

Merged

5 tasks

tomwilkie removed this from the Dogfooding milestone Jan 6, 2017

This was referenced Jan 9, 2017

Prevent infinite loop in the table manager #187

Merged

Add ingester flags for periodic tables #188

Merged

Use weekly DynamoDB tables #189

Closed

tomwilkie mentioned this issue Jan 12, 2017

Remove dynamo client goroutine pool #203

Merged

tomwilkie closed this as completed Jan 18, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ingesters are routinely taking longer than 20mins to flush. #158

Ingesters are routinely taking longer than 20mins to flush. #158

tomwilkie commented Nov 24, 2016 •

edited

Loading

tomwilkie commented Jan 3, 2017

jml commented Jan 3, 2017

tomwilkie commented Jan 3, 2017 •

edited

Loading

tomwilkie commented Jan 3, 2017

jml commented Jan 3, 2017

tomwilkie commented Jan 3, 2017

juliusv commented Jan 3, 2017

jml commented Jan 3, 2017

jml commented Jan 3, 2017

tomwilkie commented Jan 3, 2017

tomwilkie commented Jan 3, 2017

juliusv commented Jan 3, 2017

tomwilkie commented Jan 10, 2017

tomwilkie commented Jan 12, 2017

tomwilkie commented Jan 12, 2017

tomwilkie commented Jan 18, 2017

Ingesters are routinely taking longer than 20mins to flush. #158

Ingesters are routinely taking longer than 20mins to flush. #158

Comments

tomwilkie commented Nov 24, 2016 • edited Loading

tomwilkie commented Jan 3, 2017

jml commented Jan 3, 2017

tomwilkie commented Jan 3, 2017 • edited Loading

tomwilkie commented Jan 3, 2017

jml commented Jan 3, 2017

tomwilkie commented Jan 3, 2017

juliusv commented Jan 3, 2017

jml commented Jan 3, 2017

jml commented Jan 3, 2017

tomwilkie commented Jan 3, 2017

tomwilkie commented Jan 3, 2017

juliusv commented Jan 3, 2017

tomwilkie commented Jan 10, 2017

tomwilkie commented Jan 12, 2017

tomwilkie commented Jan 12, 2017

tomwilkie commented Jan 18, 2017

tomwilkie commented Nov 24, 2016 •

edited

Loading

tomwilkie commented Jan 3, 2017 •

edited

Loading