DynamoDB auto-scaling should use max(queue) not sum #1812

bboreham · 2019-11-12T08:58:52Z

We had a situation where one ingester was somehow hitting more throttling than the others, and the queue overall didn't get big enough to trigger a scale-up but that one ingester OOMed.

Any lines like this should use max() not sum():

cortex/pkg/chunk/aws/metrics_autoscaling.go

Line 30 in 9fe46d2

    
           defaultQueueLenQuery = `sum(avg_over_time(cortex_ingester_flush_queue_length{job="cortex/ingester"}[2m]))`

Need to be a bit careful releasing that change, as it gives a very different meaning to --metrics.target-queue-length. Now I think about it, maybe we could have both - a target value which guides gentle scaling, and a max value for one ingester which triggers more urgent measures.

See also #921

The text was updated successfully, but these errors were encountered:

stale · 2020-02-03T10:56:20Z

This issue has been automatically marked as stale because it has not had any activity in the past 30 days. It will be closed in 7 days if no further activity occurs. Thank you for your contributions.

bboreham · 2020-02-05T11:48:54Z

let's keep this alive at least a little longer.

stale · 2020-04-05T11:59:06Z

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

bboreham added the size/small label Nov 12, 2019

gouthamve added the component/aws label Dec 3, 2019

stale bot added the stale label Feb 3, 2020

stale bot removed the stale label Feb 5, 2020

stale bot added the stale label Apr 5, 2020

stale bot closed this as completed Apr 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DynamoDB auto-scaling should use max(queue) not sum #1812

DynamoDB auto-scaling should use max(queue) not sum #1812

bboreham commented Nov 12, 2019

stale bot commented Feb 3, 2020

bboreham commented Feb 5, 2020

stale bot commented Apr 5, 2020

DynamoDB auto-scaling should use max(queue) not sum #1812

DynamoDB auto-scaling should use max(queue) not sum #1812

Comments

bboreham commented Nov 12, 2019

stale bot commented Feb 3, 2020

bboreham commented Feb 5, 2020

stale bot commented Apr 5, 2020