-
-
Notifications
You must be signed in to change notification settings - Fork 7.5k
Litellm adaptive routing #26049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
yuneng-berri
merged 21 commits into
litellm_internal_staging
from
litellm_adaptive_routing
Apr 22, 2026
Merged
Litellm adaptive routing #26049
Changes from 3 commits
Commits
Show all changes
21 commits
Select commit
Hold shift + click to select a range
dd4a1d2
feat: add adaptive routing to litellm
krrish-berri-2 924fa6a
feat: commit new adaptive routing
krrish-berri-2 70caf5a
docs: update docs
krrish-berri-2 43d23e9
chore: revert UI build artifacts
krrish-berri-2 dedc219
fix: minor improvements
krrish-berri-2 3cf0460
chore: revert uv.lock to match main
krrish-berri-2 f0efc5f
test: cover _finalize_adaptive_router_if_configured
krrish-berri-2 24a2e3e
fix: address CI violations for adaptive router
krrish-berri-2 fba736c
fix(adaptive_router): 3 P1 review defects
krrish-berri-2 0cfcec6
fix(adaptive_router/hooks): populate tool_results so failure signal f…
krrish-berri-2 b6fc75b
Merge branch 'litellm_internal_staging' into litellm_adaptive_routing
krrish-berri-2 d053355
style: apply black formatting
krrish-berri-2 e99955a
test(adaptive_router/hooks): align stale tests with current hook API
krrish-berri-2 bcc093d
fix(adaptive_router): enforce satisfaction gate, stop false-flagging …
krrish-berri-2 bd3ee98
fix(adaptive_router): bound owner cache, drop PK from upsert update, …
krrish-berri-2 c7342bd
Merge branch 'litellm_internal_staging' into litellm_adaptive_routing
krrish-berri-2 ecd9a83
fix(adaptive_router): P2 review items — @updatedAt + snapshot samples
krrish-berri-2 f1da202
fix(adaptive_router): P1 flusher hot-reload + P2 hook accumulation + CI
krrish-berri-2 37fc6f6
fix(adaptive_router/signals): rename 'args' to 'call_args' in _signature
krrish-berri-2 1965c67
style: black format signals.py
krrish-berri-2 e50f945
refactor(adaptive_router): move update_queue out of litellm.proxy
krrish-berri-2 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,155 @@ | ||
| # [BETA] Adaptive Router | ||
|
|
||
| :::info | ||
|
|
||
| Beta feature. Share feedback on [Discord](https://discord.gg/wuPM9dRgDw) or [Slack](https://join.slack.com/t/litellmossslack/shared_invite/zt-3o7nkuyfr-p_kbNJj8taRfXGgQI1~YyA). | ||
|
|
||
| ::: | ||
|
|
||
| **Requirements:** LiteLLM Proxy with a Postgres database. Quality estimates are stored in Postgres and loaded on startup — without a database the router works but forgets everything learned on restart. | ||
|
|
||
| You have a cheap model and an expensive one. You want to use the cheap one when it's good enough, and the expensive one when it actually matters — without hardcoding rules you'll spend months tuning. | ||
|
|
||
| The adaptive router does this automatically. It tracks which model performs best for each type of request (code, writing, analysis, etc.) and routes accordingly, balancing quality against cost based on weights you control. | ||
|
|
||
| ## Quick start | ||
|
|
||
| ```yaml | ||
| model_list: | ||
| - model_name: gpt-4o | ||
| litellm_params: | ||
| model: openai/gpt-4o | ||
| model_info: | ||
| input_cost_per_token: 0.0000025 | ||
| adaptive_router_preferences: | ||
| quality_tier: 3 # 1=budget, 2=mid, 3=frontier | ||
| strengths: ["code_generation", "analytical_reasoning"] | ||
|
|
||
| - model_name: gpt-4o-mini | ||
| litellm_params: | ||
| model: openai/gpt-4o-mini | ||
| model_info: | ||
| input_cost_per_token: 0.00000015 | ||
| adaptive_router_preferences: | ||
| quality_tier: 2 | ||
| strengths: ["factual_lookup"] | ||
|
|
||
| - model_name: my-router | ||
| litellm_params: | ||
| model: adaptive_router/smart-router | ||
| adaptive_router_config: | ||
| available_models: ["gpt-4o", "gpt-4o-mini"] | ||
| weights: | ||
| quality: 0.7 # raise this if quality complaints; lower if bill too high | ||
| cost: 0.3 # must sum to 1.0 with quality | ||
| ``` | ||
|
|
||
| Route to it by setting `model` to your adaptive router's name: | ||
|
|
||
| ```bash | ||
| curl -X POST {{baseURL}}/v1/chat/completions \ | ||
| -H "Content-Type: application/json" \ | ||
| -H "Authorization: Bearer $LITELLM_API_KEY" \ | ||
| -d '{ | ||
| "model": "my-router", | ||
| "messages": [ | ||
| {"role": "user", "content": "build me a python script that parses CSV"}, | ||
| {"role": "assistant", "content": "Here is a script using csv.DictReader..."}, | ||
| {"role": "user", "content": "now add error handling for missing files"}, | ||
| {"role": "assistant", "content": "Wrap the open() call in a try/except FileNotFoundError..."}, | ||
| {"role": "user", "content": "perfect, that worked. thanks!"} | ||
| ] | ||
| }' | ||
| ``` | ||
|
|
||
| The response includes a header telling you which model was actually picked: | ||
|
|
||
| ``` | ||
| x-litellm-adaptive-router-model: gpt-4o | ||
| ``` | ||
|
|
||
| The "thanks!" turn in the example above fires a satisfaction signal — that's what moves the bandit. | ||
|
|
||
| ## Tuning cost vs. quality | ||
|
|
||
| The `weights` are your main lever: | ||
|
|
||
| | Goal | quality | cost | | ||
| |---|---|---| | ||
| | Minimize cost, quality is secondary | 0.3 | 0.7 | | ||
| | Balanced | 0.5 | 0.5 | | ||
| | Quality-first (default) | 0.7 | 0.3 | | ||
| | Quality non-negotiable | 0.9 | 0.1 | | ||
|
|
||
| The router learns over time. For the first ~10 requests per model, it relies on the tiers you declared. After that, real performance data takes over. | ||
|
|
||
| ## Force a minimum quality tier per request | ||
|
|
||
| If a specific request needs a frontier model regardless of cost, pass this header: | ||
|
|
||
| ``` | ||
| x-litellm-min-quality-tier: 3 | ||
| ``` | ||
|
|
||
| You can also pass `min_quality_tier` via request metadata instead of a header. | ||
|
|
||
| ## What's being learned | ||
|
|
||
| The router classifies each request into one of 7 types and tracks how each model performs on each independently. A model that's great at factual lookup but poor at code will win factual requests and lose code requests — even if it's cheaper overall. | ||
|
|
||
| | Type | Example | | ||
| |---|---| | ||
| | `code_generation` | "write me a Python sort function" | | ||
| | `code_understanding` | "explain what this function does" | | ||
| | `technical_design` | "how should I design this API?" | | ||
| | `analytical_reasoning` | "calculate the probability that..." | | ||
| | `writing` | "draft an email to my team about..." | | ||
| | `factual_lookup` | "what is the capital of France?" | | ||
| | `general` | anything else | | ||
|
|
||
| [**See classifier code**](https://github.com/BerriAI/litellm/blob/litellm_adaptive_routing/litellm/router_strategy/adaptive_router/classifier.py) | ||
|
|
||
| Learning signals are inspired by [Signals: Trajectory Sampling and Triage for Agentic Interactions](https://arxiv.org/pdf/2604.00356). | ||
|
|
||
| ## Inspect the current state | ||
|
|
||
| ``` | ||
| GET /adaptive_router/{router_name}/state | ||
| ``` | ||
|
|
||
| Returns current quality estimates per model per request type. Useful for understanding why a model is or isn't being picked. | ||
|
|
||
| ```json | ||
| { | ||
| "routers": [ | ||
| { | ||
| "router_name": "smart-cheap-router", | ||
| "available_models": ["fast", "smart"], | ||
| "weights": { "quality": 0.7, "cost": 0.3 }, | ||
| "cells": [ | ||
| { | ||
| "request_type": "analytical_reasoning", | ||
| "model": "fast", | ||
| "quality_mean": 0.5, | ||
| "samples": 10.0 | ||
| }, | ||
| { | ||
| "request_type": "analytical_reasoning", | ||
| "model": "smart", | ||
| "quality_mean": 0.95, | ||
| "samples": 10.0 | ||
| } | ||
| ] | ||
| } | ||
| ] | ||
| } | ||
| ``` | ||
|
|
||
| `quality_mean` is the key number — it's the router's current estimate of how well that model handles that request type. `samples` counts how many real observations have moved the prior (starts at 10, the cold-start mass). | ||
|
|
||
| ## Known limitations | ||
|
|
||
| - Latency isn't scored — a slow model can still win on quality + cost | ||
| - Signals are regex-based and English-biased — no LLM judge | ||
| - Hard cap of 200 observations per cell; no decay yet | ||
| - Once a model is picked for a session, other models' turns in that session don't contribute to learning | ||
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
39 changes: 39 additions & 0 deletions
39
...s/litellm_proxy_extras/migrations/20260418000000_add_adaptive_router_tables/migration.sql
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| -- One row per (router, request_type, model). Hot path on every routing decision. | ||
| CREATE TABLE "LiteLLM_AdaptiveRouterState" ( | ||
| router_name TEXT NOT NULL, | ||
| request_type TEXT NOT NULL, | ||
| model_name TEXT NOT NULL, | ||
| alpha DOUBLE PRECISION NOT NULL, | ||
| beta DOUBLE PRECISION NOT NULL, | ||
| total_samples INTEGER NOT NULL DEFAULT 0, | ||
| last_updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, | ||
| PRIMARY KEY (router_name, request_type, model_name) | ||
| ); | ||
|
|
||
| -- One row per (session, router, model). Updated per turn via the queue. | ||
| CREATE TABLE "LiteLLM_AdaptiveRouterSession" ( | ||
| session_id TEXT NOT NULL, | ||
| router_name TEXT NOT NULL, | ||
| model_name TEXT NOT NULL, | ||
| classified_type TEXT NOT NULL, | ||
| misalignment_count INTEGER DEFAULT 0, | ||
| stagnation_count INTEGER DEFAULT 0, | ||
| disengagement_count INTEGER DEFAULT 0, | ||
| satisfaction_count INTEGER DEFAULT 0, | ||
| failure_count INTEGER DEFAULT 0, | ||
| loop_count INTEGER DEFAULT 0, | ||
| exhaustion_count INTEGER DEFAULT 0, | ||
| last_user_content TEXT, | ||
| last_assistant_content TEXT, | ||
| tool_call_history JSONB DEFAULT '[]', | ||
| pending_tool_calls JSONB DEFAULT '{}', | ||
| turn_count INTEGER DEFAULT 0, | ||
| last_processed_turn INTEGER DEFAULT -1, | ||
| clean_credit_awarded BOOLEAN DEFAULT FALSE, | ||
| terminal_status INTEGER, | ||
| last_activity_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP, | ||
| PRIMARY KEY (session_id, router_name, model_name) | ||
| ); | ||
|
|
||
| CREATE INDEX "idx_adaptive_router_session_activity" | ||
| ON "LiteLLM_AdaptiveRouterSession" (last_activity_at); |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,32 +1,83 @@ | ||
| model_list: | ||
| # model_list: | ||
| # - model_name: claude-sonnet-4-6 | ||
| # litellm_params: {model: anthropic/claude-sonnet-4-6} | ||
| # model_info: | ||
| # litellm_routing_preferences: | ||
| # quality_tier: 1 | ||
| # keywords: [tin] | ||
| # - model_name: gpt-4o-mini | ||
| # litellm_params: {model: openai/gpt-4o-mini} | ||
| # model_info: | ||
| # litellm_routing_preferences: | ||
| # quality_tier: 1 | ||
| # keywords: [] | ||
| # - model_name: gpt-4o | ||
| # litellm_params: {model: openai/gpt-4o} | ||
| # model_info: | ||
| # litellm_routing_preferences: | ||
| # quality_tier: 2 | ||
| # keywords: [vision, function_calling] | ||
| # - model_name: opus | ||
| # litellm_params: {model: anthropic/claude-opus-4-7} | ||
| # model_info: | ||
| # litellm_routing_preferences: | ||
| # quality_tier: 3 | ||
| # keywords: ["architecture", "design"] | ||
| # - model_name: my-quality-router | ||
| # litellm_params: | ||
| # model: auto_router/adaptive_router | ||
| # adaptive_router_default_model: gpt-4o-mini | ||
| # adaptive_router_config: | ||
| # available_models: [gpt-4o-mini, gpt-4o, opus, claude-sonnet-4-6] | ||
| # Example proxy config for the adaptive router (v0). | ||
| # | ||
| # Wires one logical router ("smart-cheap-router") that adaptively picks between | ||
| # two real deployments ("fast" and "smart") based on per-session feedback signals. | ||
| # | ||
| # How to use from a client: | ||
| # POST /v1/chat/completions { "model": "smart-cheap-router", ... } | ||
| # Add { "metadata": { "litellm_session_id": "<your-session-id>" } } to enable | ||
| # sticky-session routing within a conversation. | ||
| # | ||
| # Required env vars: OPENAI_API_KEY, DATABASE_URL. | ||
|
|
||
| # OpenAI model for /v1/chat/completions test — 200x custom pricing | ||
| - model_name: "gpt-4.1-mini" | ||
| model_list: | ||
| # ---- The adaptive router "control" deployment ------------------------- | ||
| # `model_name` is what clients call. `available_models` lists the underlying | ||
| # deployments the router is allowed to pick from (must match other model_name | ||
| # entries in this list). | ||
| - model_name: smart-cheap-router | ||
| litellm_params: | ||
| model: openai/gpt-4.1-mini | ||
| api_key: os.environ/OPENAI_API_KEY | ||
| model_info: | ||
| id: gpt-4.1-mini-custom-pricing | ||
| input_cost_per_token: 0.00004 # 100x standard ($0.40/1M = $0.0000004) | ||
| output_cost_per_token: 0.00016 # 100x standard ($1.60/1M = $0.0000016) | ||
| model: auto_router/adaptive_router | ||
| adaptive_router_config: | ||
| available_models: ["fast", "smart"] | ||
| weights: | ||
| quality: 0.7 | ||
| cost: 0.3 | ||
|
|
||
| # OpenAI model for /v1/responses test — 100x custom pricing | ||
| - model_name: "gpt-5" | ||
| # ---- Underlying deployments the router picks from --------------------- | ||
| - model_name: fast | ||
| litellm_params: | ||
| model: openai/gpt-5 | ||
| api_key: os.environ/OPENAI_API_KEY | ||
| model: anthropic/claude-sonnet-4-6 | ||
| api_key: os.environ/ANTHROPIC_API_KEY | ||
| input_cost_per_token: 0.00000015 | ||
| model_info: | ||
| id: gpt-5-custom-pricing | ||
| mode: "chat" | ||
| input_cost_per_token: 125 # 100x standard ($1.25/1M = $0.00000125) | ||
| output_cost_per_token: 10 # 100x standard ($10.00/1M = $0.00001) | ||
| adaptive_router_preferences: | ||
| quality_tier: 2 | ||
| strengths: [] | ||
|
|
||
| # Anthropic model for /v1/messages test — 100x custom pricing | ||
| - model_name: "claude-sonnet-4-20250514" | ||
| - model_name: smart | ||
| litellm_params: | ||
| model: anthropic/claude-sonnet-4-20250514 | ||
| model: anthropic/claude-opus-4-7 | ||
| api_key: os.environ/ANTHROPIC_API_KEY | ||
| input_cost_per_token: 0.0000050 | ||
| model_info: | ||
| id: claude-sonnet-4-custom-pricing | ||
| input_cost_per_token: 0.0003 # 100x standard ($0.000003) | ||
| output_cost_per_token: 0.0015 # 100x standard ($0.000015) | ||
| adaptive_router_preferences: | ||
| quality_tier: 3 | ||
| strengths: ["code_generation", "technical_design", "analytical_reasoning"] | ||
|
|
||
| litellm_settings: | ||
| drop_params: True | ||
|
|
||
| general_settings: | ||
| master_key: sk-1234 # REPLACE in production |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs (and PR description) advertise
GET /adaptive_router/{router_name}/state, but the actual implementation inproxy_server.pyregisters it asGET /adaptive_router/statewith no path parameter. Any caller following this documentation will receive a 404, and there is no per-router filtering available through the current endpoint shape.Either update the docs to
GET /adaptive_router/stateor add the{router_name}path parameter to the implementation and filterllm_router.adaptive_routersby it.