Skip to content

Enhancement for Load Balancing During High-Concurrency Scenarios in EPP #1700

@delavet

Description

@delavet

What would you like to be added:

I would like to propose improvements to EPP's load balancing mechanism to better handle high-concurrency scenarios. Currently, when a large number of concurrent requests arrive simultaneously, EPP's routing decisions based on backend model server metrics snapshots can result in all requests being directed to the same model server, causing load imbalance.

Why is this needed:

During our customers' usage of GAIE's EPP, we identified a load imbalance issue in high-concurrency scenarios. When bursts of requests arrive simultaneously, since EPP makes routing decisions based on point-in-time snapshots of backend model server metrics, all concurrent requests tend to be scheduled to the same model server, resulting in load imbalance.

The following figure is a screenshot of the vllm Grafana dashboard, used to illustrate this issue. It can be observed that one of the pods was momentarily assigned significantly more requests than the others.

Image

This behavior can lead to suboptimal resource utilization and potential performance degradation during traffic spikes.

We believe there's a opportunity to enhance EPP's routing algorithm to better distribute load during burst scenarios while maintaining its existing strengths. Therefore we implemented a BurstScorer plugin that tries to address this issue by tracking request frequencies to pods and adjusting scores accordingly during burst scenarios. This provides a simple solution to this problem.

This approach would differ from more complex solutions like latency prediction (as proposed in #1323) and could provide a more fundamental fix for burst-induced load imbalances. The idea shares some similarities with concepts discussed in #678 (Solution 3).

We're interested in community feedback on how to resolve this issue systematically and would appreciate guidance on potential approaches to address it. We're willing to contribute to implementing a solution if the community agrees this is a valuable enhancement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-triageIndicates an issue or PR lacks a `triage/foo` label and requires one.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions