Compactors can't keep up with the load #3753

agardiman · 2021-01-28T10:21:59Z

Describe the bug
The number of blocks per tenant increases over time instead of going down.
At any given time some compactors are idle (and basically remain idle for all the time until an eventual restart) even if there are many compactions still needed for tenants that are not under compaction in a given moment.

To Reproduce
Steps to reproduce the behavior:

9 tenants, about 42M active time series per tenant
12 compactors
In the compactor v1.6 dashboard, both number of blocks per each tenant and the average number of blocks are increasing over time

Expected behavior

the number of blocks for every tenant and the overall average to decrease over time
if there are X tenant and X compactors, all the X compactors to be busy compacting

Environment:

Infrastructure: Kubernetes
Deployment tool: jsonnet
AWS, S3
12 compactors with 2CPUs and 5GB of RAM (50GB limit)
9 tenants with a total of 381M active time series and 6M reqps. The metrics are evenly split between tenants.

Storage Engine

[X ] Blocks
Chunks

Additional Context
There are 3 issues:

a compactor does not keep up with the load of one tenant if the tenant is big enough. I tried initially with 7 tenants with 55M active series per tenant but even if a tenant was compacted by a compactor, its blocks kept increasing. So I tried splitting the 381M active time series between 9 tenants, reducing the number of active time series per tenant to 42. But the number of blocks per tenant is still increasing over time.
If there are a few tenants and a few compactors, the chance that a compactor will not be responsible for any tenant and another compactor is responsible for more than one is high because of the hashing distribution probably not working well when number of tenants is relatively low.
it's not clear from logs or dashboards how to find the bottleneck or if there is anything wrong

pstibrany · 2021-01-28T10:36:31Z

Just an aside note: each tenant can only "belong" to a single compactor at the moment, so running 12 compactors with 9 tenants only will always keep some compactors unused.

agardiman · 2021-01-28T11:07:23Z

Yeah the thing is that I tried first with 7 compactors and 7 "big" tenants and it didn't work. So I increased the compactor's replicas to 12 as an attempt to increase the chance for each compactor to have max 1 tenant only, so to "spend" all its time on just that.

stale · 2021-04-30T03:01:15Z

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

pracucci · 2021-04-30T08:30:33Z

Things are slowing improving, but this issue is still valid.

stale · 2021-07-29T17:16:47Z

This issue has been automatically marked as stale because it has not had any activity in the past 60 days. It will be closed in 15 days if no further activity occurs. Thank you for your contributions.

pracucci added component/compactor storage/blocks Blocks storage engine labels Jan 29, 2021

agardiman mentioned this issue Jan 29, 2021

Massive number of S3 API calls and 4xx errors #3759

Closed

2 tasks

stale bot added the stale label Apr 30, 2021

stale bot removed the stale label Apr 30, 2021

roystchiang mentioned this issue Jun 9, 2021

Add proposal for parallel compaction by time interval #4272

Merged

3 tasks

stale bot added the stale label Jul 29, 2021

stale bot closed this as completed Aug 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compactors can't keep up with the load #3753

Compactors can't keep up with the load #3753

agardiman commented Jan 28, 2021 •

edited

Loading

pstibrany commented Jan 28, 2021 •

edited

Loading

agardiman commented Jan 28, 2021

stale bot commented Apr 30, 2021

pracucci commented Apr 30, 2021

stale bot commented Jul 29, 2021

Compactors can't keep up with the load #3753

Compactors can't keep up with the load #3753

Comments

agardiman commented Jan 28, 2021 • edited Loading

pstibrany commented Jan 28, 2021 • edited Loading

agardiman commented Jan 28, 2021

stale bot commented Apr 30, 2021

pracucci commented Apr 30, 2021

stale bot commented Jul 29, 2021

agardiman commented Jan 28, 2021 •

edited

Loading

pstibrany commented Jan 28, 2021 •

edited

Loading