Skip to content
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
dd4a1d2
feat: add adaptive routing to litellm
krrish-berri-2 Apr 18, 2026
924fa6a
feat: commit new adaptive routing
krrish-berri-2 Apr 19, 2026
70caf5a
docs: update docs
krrish-berri-2 Apr 19, 2026
43d23e9
chore: revert UI build artifacts
krrish-berri-2 Apr 20, 2026
dedc219
fix: minor improvements
krrish-berri-2 Apr 20, 2026
3cf0460
chore: revert uv.lock to match main
krrish-berri-2 Apr 20, 2026
f0efc5f
test: cover _finalize_adaptive_router_if_configured
krrish-berri-2 Apr 20, 2026
24a2e3e
fix: address CI violations for adaptive router
krrish-berri-2 Apr 20, 2026
fba736c
fix(adaptive_router): 3 P1 review defects
krrish-berri-2 Apr 20, 2026
0cfcec6
fix(adaptive_router/hooks): populate tool_results so failure signal f…
krrish-berri-2 Apr 20, 2026
b6fc75b
Merge branch 'litellm_internal_staging' into litellm_adaptive_routing
krrish-berri-2 Apr 20, 2026
d053355
style: apply black formatting
krrish-berri-2 Apr 20, 2026
e99955a
test(adaptive_router/hooks): align stale tests with current hook API
krrish-berri-2 Apr 20, 2026
bcc093d
fix(adaptive_router): enforce satisfaction gate, stop false-flagging …
krrish-berri-2 Apr 21, 2026
bd3ee98
fix(adaptive_router): bound owner cache, drop PK from upsert update, …
krrish-berri-2 Apr 21, 2026
c7342bd
Merge branch 'litellm_internal_staging' into litellm_adaptive_routing
krrish-berri-2 Apr 21, 2026
ecd9a83
fix(adaptive_router): P2 review items — @updatedAt + snapshot samples
krrish-berri-2 Apr 21, 2026
f1da202
fix(adaptive_router): P1 flusher hot-reload + P2 hook accumulation + CI
krrish-berri-2 Apr 22, 2026
37fc6f6
fix(adaptive_router/signals): rename 'args' to 'call_args' in _signature
krrish-berri-2 Apr 22, 2026
1965c67
style: black format signals.py
krrish-berri-2 Apr 22, 2026
e50f945
refactor(adaptive_router): move update_queue out of litellm.proxy
krrish-berri-2 Apr 22, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
155 changes: 155 additions & 0 deletions docs/my-website/docs/adaptive_router.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# [BETA] Adaptive Router

:::info

Beta feature. Share feedback on [Discord](https://discord.gg/wuPM9dRgDw) or [Slack](https://join.slack.com/t/litellmossslack/shared_invite/zt-3o7nkuyfr-p_kbNJj8taRfXGgQI1~YyA).

:::

**Requirements:** LiteLLM Proxy with a Postgres database. Quality estimates are stored in Postgres and loaded on startup — without a database the router works but forgets everything learned on restart.

You have a cheap model and an expensive one. You want to use the cheap one when it's good enough, and the expensive one when it actually matters — without hardcoding rules you'll spend months tuning.

The adaptive router does this automatically. It tracks which model performs best for each type of request (code, writing, analysis, etc.) and routes accordingly, balancing quality against cost based on weights you control.

## Quick start

```yaml
model_list:
- model_name: gpt-4o
litellm_params:
model: openai/gpt-4o
model_info:
input_cost_per_token: 0.0000025
adaptive_router_preferences:
quality_tier: 3 # 1=budget, 2=mid, 3=frontier
strengths: ["code_generation", "analytical_reasoning"]

- model_name: gpt-4o-mini
litellm_params:
model: openai/gpt-4o-mini
model_info:
input_cost_per_token: 0.00000015
adaptive_router_preferences:
quality_tier: 2
strengths: ["factual_lookup"]

- model_name: my-router
litellm_params:
model: adaptive_router/smart-router
adaptive_router_config:
available_models: ["gpt-4o", "gpt-4o-mini"]
weights:
quality: 0.7 # raise this if quality complaints; lower if bill too high
cost: 0.3 # must sum to 1.0 with quality
```

Route to it by setting `model` to your adaptive router's name:

```bash
curl -X POST {{baseURL}}/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $LITELLM_API_KEY" \
-d '{
"model": "my-router",
"messages": [
{"role": "user", "content": "build me a python script that parses CSV"},
{"role": "assistant", "content": "Here is a script using csv.DictReader..."},
{"role": "user", "content": "now add error handling for missing files"},
{"role": "assistant", "content": "Wrap the open() call in a try/except FileNotFoundError..."},
{"role": "user", "content": "perfect, that worked. thanks!"}
]
}'
```

The response includes a header telling you which model was actually picked:

```
x-litellm-adaptive-router-model: gpt-4o
```

The "thanks!" turn in the example above fires a satisfaction signal — that's what moves the bandit.

## Tuning cost vs. quality

The `weights` are your main lever:

| Goal | quality | cost |
|---|---|---|
| Minimize cost, quality is secondary | 0.3 | 0.7 |
| Balanced | 0.5 | 0.5 |
| Quality-first (default) | 0.7 | 0.3 |
| Quality non-negotiable | 0.9 | 0.1 |

The router learns over time. For the first ~10 requests per model, it relies on the tiers you declared. After that, real performance data takes over.

## Force a minimum quality tier per request

If a specific request needs a frontier model regardless of cost, pass this header:

```
x-litellm-min-quality-tier: 3
```

You can also pass `min_quality_tier` via request metadata instead of a header.

## What's being learned

The router classifies each request into one of 7 types and tracks how each model performs on each independently. A model that's great at factual lookup but poor at code will win factual requests and lose code requests — even if it's cheaper overall.

| Type | Example |
|---|---|
| `code_generation` | "write me a Python sort function" |
| `code_understanding` | "explain what this function does" |
| `technical_design` | "how should I design this API?" |
| `analytical_reasoning` | "calculate the probability that..." |
| `writing` | "draft an email to my team about..." |
| `factual_lookup` | "what is the capital of France?" |
| `general` | anything else |

[**See classifier code**](https://github.com/BerriAI/litellm/blob/litellm_adaptive_routing/litellm/router_strategy/adaptive_router/classifier.py)

Learning signals are inspired by [Signals: Trajectory Sampling and Triage for Agentic Interactions](https://arxiv.org/pdf/2604.00356).

## Inspect the current state

```
GET /adaptive_router/{router_name}/state
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 State endpoint path mismatch — documented route returns 404

The docs (and PR description) advertise GET /adaptive_router/{router_name}/state, but the actual implementation in proxy_server.py registers it as GET /adaptive_router/state with no path parameter. Any caller following this documentation will receive a 404, and there is no per-router filtering available through the current endpoint shape.

Either update the docs to GET /adaptive_router/state or add the {router_name} path parameter to the implementation and filter llm_router.adaptive_routers by it.

```

Returns current quality estimates per model per request type. Useful for understanding why a model is or isn't being picked.

```json
{
"routers": [
{
"router_name": "smart-cheap-router",
"available_models": ["fast", "smart"],
"weights": { "quality": 0.7, "cost": 0.3 },
"cells": [
{
"request_type": "analytical_reasoning",
"model": "fast",
"quality_mean": 0.5,
"samples": 10.0
},
{
"request_type": "analytical_reasoning",
"model": "smart",
"quality_mean": 0.95,
"samples": 10.0
}
]
}
]
}
```

`quality_mean` is the key number — it's the router's current estimate of how well that model handles that request type. `samples` counts how many real observations have moved the prior (starts at 10, the cold-start mass).

## Known limitations

- Latency isn't scored — a slow model can still win on quality + cost
- Signals are regex-based and English-biased — no LLM judge
- Hard cap of 200 observations per cell; no decay yet
- Once a model is picked for a session, other models' turns in that session don't contribute to learning
1 change: 1 addition & 0 deletions docs/my-website/package-lock.json

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

1 change: 1 addition & 0 deletions docs/my-website/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@
},
"devDependencies": {
"@docusaurus/module-type-aliases": "3.8.1",
"ajv": "^8.18.0",
"dotenv": "16.6.1"
},
"browserslist": {
Expand Down
1 change: 1 addition & 0 deletions docs/my-website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -1052,6 +1052,7 @@ const sidebars = {
},
items: [
"routing",
"adaptive_router",
"scheduler",
"proxy/auto_routing",
"proxy/load_balancing",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
-- One row per (router, request_type, model). Hot path on every routing decision.
CREATE TABLE "LiteLLM_AdaptiveRouterState" (
router_name TEXT NOT NULL,
request_type TEXT NOT NULL,
model_name TEXT NOT NULL,
alpha DOUBLE PRECISION NOT NULL,
beta DOUBLE PRECISION NOT NULL,
total_samples INTEGER NOT NULL DEFAULT 0,
last_updated_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (router_name, request_type, model_name)
);

-- One row per (session, router, model). Updated per turn via the queue.
CREATE TABLE "LiteLLM_AdaptiveRouterSession" (
session_id TEXT NOT NULL,
router_name TEXT NOT NULL,
model_name TEXT NOT NULL,
classified_type TEXT NOT NULL,
misalignment_count INTEGER DEFAULT 0,
stagnation_count INTEGER DEFAULT 0,
disengagement_count INTEGER DEFAULT 0,
satisfaction_count INTEGER DEFAULT 0,
failure_count INTEGER DEFAULT 0,
loop_count INTEGER DEFAULT 0,
exhaustion_count INTEGER DEFAULT 0,
last_user_content TEXT,
last_assistant_content TEXT,
tool_call_history JSONB DEFAULT '[]',
pending_tool_calls JSONB DEFAULT '{}',
turn_count INTEGER DEFAULT 0,
last_processed_turn INTEGER DEFAULT -1,
clean_credit_awarded BOOLEAN DEFAULT FALSE,
terminal_status INTEGER,
last_activity_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (session_id, router_name, model_name)
);

CREATE INDEX "idx_adaptive_router_session_activity"
ON "LiteLLM_AdaptiveRouterSession" (last_activity_at);
43 changes: 43 additions & 0 deletions litellm-proxy-extras/litellm_proxy_extras/schema.prisma
Original file line number Diff line number Diff line change
Expand Up @@ -1219,3 +1219,46 @@ model LiteLLM_ClaudeCodePluginTable {

@@map("LiteLLM_ClaudeCodePluginTable")
}

// Per-(router, request_type, model) Beta posterior for the adaptive router.
model LiteLLM_AdaptiveRouterState {
router_name String
request_type String
model_name String
alpha Float
beta Float
total_samples Int @default(0)
last_updated_at DateTime @default(now())

@@id([router_name, request_type, model_name])
}

// Per-(session, router, model) signal counters for the adaptive router.
model LiteLLM_AdaptiveRouterSession {
session_id String
router_name String
model_name String
classified_type String

misalignment_count Int @default(0)
stagnation_count Int @default(0)
disengagement_count Int @default(0)
satisfaction_count Int @default(0)
failure_count Int @default(0)
loop_count Int @default(0)
exhaustion_count Int @default(0)

last_user_content String?
last_assistant_content String?
tool_call_history Json @default("[]")
pending_tool_calls Json @default("{}")

turn_count Int @default(0)
last_processed_turn Int @default(-1)
clean_credit_awarded Boolean @default(false)
terminal_status Int?
last_activity_at DateTime @default(now())

@@id([session_id, router_name, model_name])
@@index([last_activity_at])
}
1 change: 1 addition & 0 deletions litellm/constants.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,7 @@
LITELLM_UI_ALLOW_HEADERS = [
"x-litellm-semantic-filter",
"x-litellm-semantic-filter-tools",
"x-litellm-adaptive-router-model",
]

# Gemini model-specific minimal thinking budget constants
Expand Down
97 changes: 74 additions & 23 deletions litellm/proxy/_new_secret_config.yaml
Original file line number Diff line number Diff line change
@@ -1,32 +1,83 @@
model_list:
# model_list:
# - model_name: claude-sonnet-4-6
# litellm_params: {model: anthropic/claude-sonnet-4-6}
# model_info:
# litellm_routing_preferences:
# quality_tier: 1
# keywords: [tin]
# - model_name: gpt-4o-mini
# litellm_params: {model: openai/gpt-4o-mini}
# model_info:
# litellm_routing_preferences:
# quality_tier: 1
# keywords: []
# - model_name: gpt-4o
# litellm_params: {model: openai/gpt-4o}
# model_info:
# litellm_routing_preferences:
# quality_tier: 2
# keywords: [vision, function_calling]
# - model_name: opus
# litellm_params: {model: anthropic/claude-opus-4-7}
# model_info:
# litellm_routing_preferences:
# quality_tier: 3
# keywords: ["architecture", "design"]
# - model_name: my-quality-router
# litellm_params:
# model: auto_router/adaptive_router
# adaptive_router_default_model: gpt-4o-mini
# adaptive_router_config:
# available_models: [gpt-4o-mini, gpt-4o, opus, claude-sonnet-4-6]
# Example proxy config for the adaptive router (v0).
#
# Wires one logical router ("smart-cheap-router") that adaptively picks between
# two real deployments ("fast" and "smart") based on per-session feedback signals.
#
# How to use from a client:
# POST /v1/chat/completions { "model": "smart-cheap-router", ... }
# Add { "metadata": { "litellm_session_id": "<your-session-id>" } } to enable
# sticky-session routing within a conversation.
#
# Required env vars: OPENAI_API_KEY, DATABASE_URL.

# OpenAI model for /v1/chat/completions test — 200x custom pricing
- model_name: "gpt-4.1-mini"
model_list:
# ---- The adaptive router "control" deployment -------------------------
# `model_name` is what clients call. `available_models` lists the underlying
# deployments the router is allowed to pick from (must match other model_name
# entries in this list).
- model_name: smart-cheap-router
litellm_params:
model: openai/gpt-4.1-mini
api_key: os.environ/OPENAI_API_KEY
model_info:
id: gpt-4.1-mini-custom-pricing
input_cost_per_token: 0.00004 # 100x standard ($0.40/1M = $0.0000004)
output_cost_per_token: 0.00016 # 100x standard ($1.60/1M = $0.0000016)
model: auto_router/adaptive_router
adaptive_router_config:
available_models: ["fast", "smart"]
weights:
quality: 0.7
cost: 0.3

# OpenAI model for /v1/responses test — 100x custom pricing
- model_name: "gpt-5"
# ---- Underlying deployments the router picks from ---------------------
- model_name: fast
litellm_params:
model: openai/gpt-5
api_key: os.environ/OPENAI_API_KEY
model: anthropic/claude-sonnet-4-6
api_key: os.environ/ANTHROPIC_API_KEY
input_cost_per_token: 0.00000015
model_info:
id: gpt-5-custom-pricing
mode: "chat"
input_cost_per_token: 125 # 100x standard ($1.25/1M = $0.00000125)
output_cost_per_token: 10 # 100x standard ($10.00/1M = $0.00001)
adaptive_router_preferences:
quality_tier: 2
strengths: []

# Anthropic model for /v1/messages test — 100x custom pricing
- model_name: "claude-sonnet-4-20250514"
- model_name: smart
litellm_params:
model: anthropic/claude-sonnet-4-20250514
model: anthropic/claude-opus-4-7
api_key: os.environ/ANTHROPIC_API_KEY
input_cost_per_token: 0.0000050
model_info:
id: claude-sonnet-4-custom-pricing
input_cost_per_token: 0.0003 # 100x standard ($0.000003)
output_cost_per_token: 0.0015 # 100x standard ($0.000015)
adaptive_router_preferences:
quality_tier: 3
strengths: ["code_generation", "technical_design", "analytical_reasoning"]

litellm_settings:
drop_params: True

general_settings:
master_key: sk-1234 # REPLACE in production
Loading
Loading