BerriAI · yuneng-berri · Apr 22, 2026 · Apr 18, 2026 · Apr 19, 2026 · Apr 19, 2026
diff --git a/docs/my-website/docs/adaptive_router.md b/docs/my-website/docs/adaptive_router.md
@@ -0,0 +1,155 @@
+# [BETA] Adaptive Router
+
+:::info
+
+Beta feature. Share feedback on [Discord](https://discord.gg/wuPM9dRgDw) or [Slack](https://join.slack.com/t/litellmossslack/shared_invite/zt-3o7nkuyfr-p_kbNJj8taRfXGgQI1~YyA).
+
+:::
+
+**Requirements:** LiteLLM Proxy with a Postgres database. Quality estimates are stored in Postgres and loaded on startup — without a database the router works but forgets everything learned on restart.
+
+You have a cheap model and an expensive one. You want to use the cheap one when it's good enough, and the expensive one when it actually matters — without hardcoding rules you'll spend months tuning.
+
+The adaptive router does this automatically. It tracks which model performs best for each type of request (code, writing, analysis, etc.) and routes accordingly, balancing quality against cost based on weights you control.
+
+## Quick start
+
+```yaml
+model_list:
+  - model_name: gpt-4o
+    litellm_params:
+      model: openai/gpt-4o
+    model_info:
+      input_cost_per_token: 0.0000025
+      adaptive_router_preferences:
+        quality_tier: 3        # 1=budget, 2=mid, 3=frontier
+        strengths: ["code_generation", "analytical_reasoning"]
+
+  - model_name: gpt-4o-mini
+    litellm_params:
+      model: openai/gpt-4o-mini
+    model_info:
+      input_cost_per_token: 0.00000015
+      adaptive_router_preferences:
+        quality_tier: 2
+        strengths: ["factual_lookup"]
+
+  - model_name: my-router
+    litellm_params:
+      model: adaptive_router/smart-router
+      adaptive_router_config:
+        available_models: ["gpt-4o", "gpt-4o-mini"]
+        weights:
+          quality: 0.7   # raise this if quality complaints; lower if bill too high
+          cost: 0.3      # must sum to 1.0 with quality
+```
+
+Route to it by setting `model` to your adaptive router's name:
+
+```bash
+curl -X POST {{baseURL}}/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -H "Authorization: Bearer $LITELLM_API_KEY" \
+  -d '{
+    "model": "my-router",
+    "messages": [
+      {"role": "user", "content": "build me a python script that parses CSV"},
+      {"role": "assistant", "content": "Here is a script using csv.DictReader..."},
+      {"role": "user", "content": "now add error handling for missing files"},
+      {"role": "assistant", "content": "Wrap the open() call in a try/except FileNotFoundError..."},
+      {"role": "user", "content": "perfect, that worked. thanks!"}
+    ]
+  }'
+```
+
+The response includes a header telling you which model was actually picked:
+
+```
+x-litellm-adaptive-router-model: gpt-4o
+```
+
+The "thanks!" turn in the example above fires a satisfaction signal — that's what moves the bandit.
+
+## Tuning cost vs. quality
+
+The `weights` are your main lever:
+
+| Goal | quality | cost |
+|---|---|---|
+| Minimize cost, quality is secondary | 0.3 | 0.7 |
+| Balanced | 0.5 | 0.5 |
+| Quality-first (default) | 0.7 | 0.3 |
+| Quality non-negotiable | 0.9 | 0.1 |
+
+The router learns over time. For the first ~10 requests per model, it relies on the tiers you declared. After that, real performance data takes over.
+
+## Force a minimum quality tier per request
+
+If a specific request needs a frontier model regardless of cost, pass this header:
+
+```
+x-litellm-min-quality-tier: 3
+```
+
+You can also pass `min_quality_tier` via request metadata instead of a header.
+
+## What's being learned
+
+The router classifies each request into one of 7 types and tracks how each model performs on each independently. A model that's great at factual lookup but poor at code will win factual requests and lose code requests — even if it's cheaper overall.
+
+| Type | Example |
+|---|---|
+| `code_generation` | "write me a Python sort function" |
+| `code_understanding` | "explain what this function does" |
+| `technical_design` | "how should I design this API?" |
+| `analytical_reasoning` | "calculate the probability that..." |
+| `writing` | "draft an email to my team about..." |
+| `factual_lookup` | "what is the capital of France?" |
+| `general` | anything else |
+
+[**See classifier code**](https://github.com/BerriAI/litellm/blob/litellm_adaptive_routing/litellm/router_strategy/adaptive_router/classifier.py)
+
+Learning signals are inspired by [Signals: Trajectory Sampling and Triage for Agentic Interactions](https://arxiv.org/pdf/2604.00356).
+
+## Inspect the current state
+
+```
+GET /adaptive_router/{router_name}/state
+```
+
+Returns current quality estimates per model per request type. Useful for understanding why a model is or isn't being picked.
+
+```json
+{
+  "routers": [
+    {
+      "router_name": "smart-cheap-router",
+      "available_models": ["fast", "smart"],
+      "weights": { "quality": 0.7, "cost": 0.3 },
+      "cells": [
+        {
+          "request_type": "analytical_reasoning",
+          "model": "fast",
+          "quality_mean": 0.5,
+          "samples": 10.0
+        },
+        {
+          "request_type": "analytical_reasoning",
+          "model": "smart",
+          "quality_mean": 0.95,
+          "samples": 10.0
+        }
+      ]
+    }
+  ]
+}
+```
+
+`quality_mean` is the key number — it's the router's current estimate of how well that model handles that request type. `samples` counts how many real observations have moved the prior (starts at 10, the cold-start mass).
+
+## Known limitations
+
+- Latency isn't scored — a slow model can still win on quality + cost
+- Signals are regex-based and English-biased — no LLM judge
+- Hard cap of 200 observations per cell; no decay yet
+- Once a model is picked for a session, other models' turns in that session don't contribute to learning
diff --git a/docs/my-website/package-lock.json b/docs/my-website/package-lock.json
diff --git a/docs/my-website/package.json b/docs/my-website/package.json
@@ -30,6 +30,7 @@
   },
   "devDependencies": {
     "@docusaurus/module-type-aliases": "3.8.1",
+    "ajv": "^8.18.0",
     "dotenv": "16.6.1"
   },
   "browserslist": {

diff --git a/docs/my-website/sidebars.js b/docs/my-website/sidebars.js
@@ -1052,6 +1052,7 @@ const sidebars = {
       },
       items: [
         "routing",
+        "adaptive_router",
         "scheduler",
         "proxy/auto_routing",
         "proxy/load_balancing",

diff --git a/...s/litellm_proxy_extras/migrations/20260418000000_add_adaptive_router_tables/migration.sql b/...s/litellm_proxy_extras/migrations/20260418000000_add_adaptive_router_tables/migration.sql
@@ -0,0 +1,39 @@
+-- One row per (router, request_type, model). Hot path on every routing decision.
+CREATE TABLE "LiteLLM_AdaptiveRouterState" (
+    router_name      TEXT NOT NULL,
+    request_type     TEXT NOT NULL,
+    model_name       TEXT NOT NULL,
+    alpha            DOUBLE PRECISION NOT NULL,
+    beta             DOUBLE PRECISION NOT NULL,
+    total_samples    INTEGER NOT NULL DEFAULT 0,
+    last_updated_at  TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    PRIMARY KEY (router_name, request_type, model_name)
+);
+
+-- One row per (session, router, model). Updated per turn via the queue.
+CREATE TABLE "LiteLLM_AdaptiveRouterSession" (
+    session_id              TEXT NOT NULL,
+    router_name             TEXT NOT NULL,
+    model_name              TEXT NOT NULL,
+    classified_type         TEXT NOT NULL,
+    misalignment_count      INTEGER DEFAULT 0,
+    stagnation_count        INTEGER DEFAULT 0,
+    disengagement_count     INTEGER DEFAULT 0,
+    satisfaction_count      INTEGER DEFAULT 0,
+    failure_count           INTEGER DEFAULT 0,
+    loop_count              INTEGER DEFAULT 0,
+    exhaustion_count        INTEGER DEFAULT 0,
+    last_user_content       TEXT,
+    last_assistant_content  TEXT,
+    tool_call_history       JSONB DEFAULT '[]',
+    pending_tool_calls      JSONB DEFAULT '{}',
+    turn_count              INTEGER DEFAULT 0,
+    last_processed_turn     INTEGER DEFAULT -1,
+    clean_credit_awarded    BOOLEAN DEFAULT FALSE,
+    terminal_status         INTEGER,
+    last_activity_at        TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP,
+    PRIMARY KEY (session_id, router_name, model_name)
+);
+
+CREATE INDEX "idx_adaptive_router_session_activity"
+    ON "LiteLLM_AdaptiveRouterSession" (last_activity_at);
diff --git a/litellm-proxy-extras/litellm_proxy_extras/schema.prisma b/litellm-proxy-extras/litellm_proxy_extras/schema.prisma
@@ -1219,3 +1219,46 @@ model LiteLLM_ClaudeCodePluginTable {
 
   @@map("LiteLLM_ClaudeCodePluginTable")
 }
+
+// Per-(router, request_type, model) Beta posterior for the adaptive router.
+model LiteLLM_AdaptiveRouterState {
+  router_name     String
+  request_type    String
+  model_name      String
+  alpha           Float
+  beta            Float
+  total_samples   Int      @default(0)
+  last_updated_at DateTime @default(now())
+
+  @@id([router_name, request_type, model_name])
+}
+
+// Per-(session, router, model) signal counters for the adaptive router.
+model LiteLLM_AdaptiveRouterSession {
+  session_id             String
+  router_name            String
+  model_name             String
+  classified_type        String
+
+  misalignment_count     Int      @default(0)
+  stagnation_count       Int      @default(0)
+  disengagement_count    Int      @default(0)
+  satisfaction_count     Int      @default(0)
+  failure_count          Int      @default(0)
+  loop_count             Int      @default(0)
+  exhaustion_count       Int      @default(0)
+
+  last_user_content      String?
+  last_assistant_content String?
+  tool_call_history      Json     @default("[]")
+  pending_tool_calls     Json     @default("{}")
+
+  turn_count             Int      @default(0)
+  last_processed_turn    Int      @default(-1)
+  clean_credit_awarded   Boolean  @default(false)
+  terminal_status        Int?
+  last_activity_at       DateTime @default(now())
+
+  @@id([session_id, router_name, model_name])
+  @@index([last_activity_at])
+}
diff --git a/litellm/constants.py b/litellm/constants.py
@@ -164,6 +164,7 @@
 LITELLM_UI_ALLOW_HEADERS = [
     "x-litellm-semantic-filter",
     "x-litellm-semantic-filter-tools",
+    "x-litellm-adaptive-router-model",
 ]
 
 # Gemini model-specific minimal thinking budget constants

diff --git a/litellm/proxy/_experimental/out/404.html → ...lm/proxy/_experimental/out/404/index.html b/litellm/proxy/_experimental/out/404.html → ...lm/proxy/_experimental/out/404/index.html
diff --git a/...m/proxy/_experimental/out/_not-found.html → ...y/_experimental/out/_not-found/index.html b/...m/proxy/_experimental/out/_not-found.html → ...y/_experimental/out/_not-found/index.html
diff --git a/...roxy/_experimental/out/api-reference.html → ...experimental/out/api-reference/index.html b/...roxy/_experimental/out/api-reference.html → ...experimental/out/api-reference/index.html
diff --git a/litellm/proxy/_experimental/out/chat.html → ...m/proxy/_experimental/out/chat/index.html b/litellm/proxy/_experimental/out/chat.html → ...m/proxy/_experimental/out/chat/index.html
diff --git a/...ntal/out/experimental/api-playground.html → ...ut/experimental/api-playground/index.html b/...ntal/out/experimental/api-playground.html → ...ut/experimental/api-playground/index.html
diff --git a/...xperimental/out/experimental/budgets.html → ...ental/out/experimental/budgets/index.html b/...xperimental/out/experimental/budgets.html → ...ental/out/experimental/budgets/index.html
diff --git a/...xperimental/out/experimental/caching.html → ...ental/out/experimental/caching/index.html b/...xperimental/out/experimental/caching.html → ...ental/out/experimental/caching/index.html
diff --git a/...out/experimental/claude-code-plugins.html → ...perimental/claude-code-plugins/index.html b/...out/experimental/claude-code-plugins.html → ...perimental/claude-code-plugins/index.html
diff --git a/...erimental/out/experimental/old-usage.html → ...tal/out/experimental/old-usage/index.html b/...erimental/out/experimental/old-usage.html → ...tal/out/experimental/old-usage/index.html
diff --git a/...xperimental/out/experimental/prompts.html → ...ental/out/experimental/prompts/index.html b/...xperimental/out/experimental/prompts.html → ...ental/out/experimental/prompts/index.html
diff --git a/...ntal/out/experimental/tag-management.html → ...ut/experimental/tag-management/index.html b/...ntal/out/experimental/tag-management.html → ...ut/experimental/tag-management/index.html
diff --git a/...m/proxy/_experimental/out/guardrails.html → ...y/_experimental/out/guardrails/index.html b/...m/proxy/_experimental/out/guardrails.html → ...y/_experimental/out/guardrails/index.html
diff --git a/litellm/proxy/_experimental/out/login.html → .../proxy/_experimental/out/login/index.html b/litellm/proxy/_experimental/out/login.html → .../proxy/_experimental/out/login/index.html
diff --git a/litellm/proxy/_experimental/out/logs.html → ...m/proxy/_experimental/out/logs/index.html b/litellm/proxy/_experimental/out/logs.html → ...m/proxy/_experimental/out/logs/index.html
diff --git a/..._experimental/out/mcp/oauth/callback.html → ...imental/out/mcp/oauth/callback/index.html b/..._experimental/out/mcp/oauth/callback.html → ...imental/out/mcp/oauth/callback/index.html
diff --git a/...lm/proxy/_experimental/out/model-hub.html → ...xy/_experimental/out/model-hub/index.html b/...lm/proxy/_experimental/out/model-hub.html → ...xy/_experimental/out/model-hub/index.html
diff --git a/...lm/proxy/_experimental/out/model_hub.html → ...xy/_experimental/out/model_hub/index.html b/...lm/proxy/_experimental/out/model_hub.html → ...xy/_experimental/out/model_hub/index.html
diff --git a/...xy/_experimental/out/model_hub_table.html → ...perimental/out/model_hub_table/index.html b/...xy/_experimental/out/model_hub_table.html → ...perimental/out/model_hub_table/index.html
diff --git a/...xperimental/out/models-and-endpoints.html → ...ental/out/models-and-endpoints/index.html b/...xperimental/out/models-and-endpoints.html → ...ental/out/models-and-endpoints/index.html
diff --git a/...m/proxy/_experimental/out/onboarding.html → ...y/_experimental/out/onboarding/index.html b/...m/proxy/_experimental/out/onboarding.html → ...y/_experimental/out/onboarding/index.html
diff --git a/...roxy/_experimental/out/organizations.html → ...experimental/out/organizations/index.html b/...roxy/_experimental/out/organizations.html → ...experimental/out/organizations/index.html
diff --git a/...m/proxy/_experimental/out/playground.html → ...y/_experimental/out/playground/index.html b/...m/proxy/_experimental/out/playground.html → ...y/_experimental/out/playground/index.html
diff --git a/...llm/proxy/_experimental/out/policies.html → ...oxy/_experimental/out/policies/index.html b/...llm/proxy/_experimental/out/policies.html → ...oxy/_experimental/out/policies/index.html
diff --git a/...rimental/out/settings/admin-settings.html → ...al/out/settings/admin-settings/index.html b/...rimental/out/settings/admin-settings.html → ...al/out/settings/admin-settings/index.html
diff --git a/...ntal/out/settings/logging-and-alerts.html → ...ut/settings/logging-and-alerts/index.html b/...ntal/out/settings/logging-and-alerts.html → ...ut/settings/logging-and-alerts/index.html
diff --git a/...imental/out/settings/router-settings.html → ...l/out/settings/router-settings/index.html b/...imental/out/settings/router-settings.html → ...l/out/settings/router-settings/index.html
diff --git a/.../_experimental/out/settings/ui-theme.html → ...rimental/out/settings/ui-theme/index.html b/.../_experimental/out/settings/ui-theme.html → ...rimental/out/settings/ui-theme/index.html
diff --git a/litellm/proxy/_experimental/out/teams.html → .../proxy/_experimental/out/teams/index.html b/litellm/proxy/_experimental/out/teams.html → .../proxy/_experimental/out/teams/index.html
diff --git a/...llm/proxy/_experimental/out/test-key.html → ...oxy/_experimental/out/test-key/index.html b/...llm/proxy/_experimental/out/test-key.html → ...oxy/_experimental/out/test-key/index.html
diff --git a/.../_experimental/out/tools/mcp-servers.html → ...rimental/out/tools/mcp-servers/index.html b/.../_experimental/out/tools/mcp-servers.html → ...rimental/out/tools/mcp-servers/index.html
diff --git a/...experimental/out/tools/vector-stores.html → ...mental/out/tools/vector-stores/index.html b/...experimental/out/tools/vector-stores.html → ...mental/out/tools/vector-stores/index.html
diff --git a/litellm/proxy/_experimental/out/usage.html → .../proxy/_experimental/out/usage/index.html b/litellm/proxy/_experimental/out/usage.html → .../proxy/_experimental/out/usage/index.html
diff --git a/litellm/proxy/_experimental/out/users.html → .../proxy/_experimental/out/users/index.html b/litellm/proxy/_experimental/out/users.html → .../proxy/_experimental/out/users/index.html
diff --git a/...proxy/_experimental/out/virtual-keys.html → ..._experimental/out/virtual-keys/index.html b/...proxy/_experimental/out/virtual-keys.html → ..._experimental/out/virtual-keys/index.html
diff --git a/litellm/proxy/_new_secret_config.yaml b/litellm/proxy/_new_secret_config.yaml
@@ -1,32 +1,83 @@
-model_list:
+# model_list:
+#   - model_name: claude-sonnet-4-6
+#     litellm_params: {model: anthropic/claude-sonnet-4-6}
+#     model_info:
+#       litellm_routing_preferences:
+#         quality_tier: 1
+#         keywords: [tin]
+#   - model_name: gpt-4o-mini
+#     litellm_params: {model: openai/gpt-4o-mini}
+#     model_info:
+#       litellm_routing_preferences:
+#         quality_tier: 1
+#         keywords: []
+#   - model_name: gpt-4o
+#     litellm_params: {model: openai/gpt-4o}
+#     model_info:
+#       litellm_routing_preferences:
+#         quality_tier: 2
+#         keywords: [vision, function_calling]
+#   - model_name: opus
+#     litellm_params: {model: anthropic/claude-opus-4-7}
+#     model_info:
+#       litellm_routing_preferences:
+#         quality_tier: 3
+#         keywords: ["architecture", "design"]
+#   - model_name: my-quality-router
+#     litellm_params:
+#       model: auto_router/adaptive_router
+#       adaptive_router_default_model: gpt-4o-mini
+#       adaptive_router_config:
+#         available_models: [gpt-4o-mini, gpt-4o, opus, claude-sonnet-4-6]
+# Example proxy config for the adaptive router (v0).
+#
+# Wires one logical router ("smart-cheap-router") that adaptively picks between
+# two real deployments ("fast" and "smart") based on per-session feedback signals.
+#
+# How to use from a client:
+#   POST /v1/chat/completions { "model": "smart-cheap-router", ... }
+#   Add { "metadata": { "litellm_session_id": "<your-session-id>" } } to enable
+#   sticky-session routing within a conversation.
+#
+# Required env vars: OPENAI_API_KEY, DATABASE_URL.
 
-  # OpenAI model for /v1/chat/completions test — 200x custom pricing
-  - model_name: "gpt-4.1-mini"
+model_list:
+  # ---- The adaptive router "control" deployment -------------------------
+  # `model_name` is what clients call. `available_models` lists the underlying
+  # deployments the router is allowed to pick from (must match other model_name
+  # entries in this list).
+  - model_name: smart-cheap-router
     litellm_params:
-      model: openai/gpt-4.1-mini
-      api_key: os.environ/OPENAI_API_KEY
-    model_info:
-      id: gpt-4.1-mini-custom-pricing
-      input_cost_per_token: 0.00004        # 100x standard ($0.40/1M = $0.0000004)
-      output_cost_per_token: 0.00016       # 100x standard ($1.60/1M = $0.0000016)
+      model: auto_router/adaptive_router
+      adaptive_router_config:
+        available_models: ["fast", "smart"]
+        weights:
+          quality: 0.7
+          cost: 0.3
 
-  # OpenAI model for /v1/responses test — 100x custom pricing
-  - model_name: "gpt-5"
+  # ---- Underlying deployments the router picks from ---------------------
+  - model_name: fast
     litellm_params:
-      model: openai/gpt-5
-      api_key: os.environ/OPENAI_API_KEY
+      model: anthropic/claude-sonnet-4-6
+      api_key: os.environ/ANTHROPIC_API_KEY
+      input_cost_per_token: 0.00000015
     model_info:
-      id: gpt-5-custom-pricing
-      mode: "chat"
-      input_cost_per_token: 125        # 100x standard ($1.25/1M = $0.00000125)
-      output_cost_per_token: 10         # 100x standard ($10.00/1M = $0.00001)
+      adaptive_router_preferences:
+        quality_tier: 2
+        strengths: []
 
-  # Anthropic model for /v1/messages test — 100x custom pricing
-  - model_name: "claude-sonnet-4-20250514"
+  - model_name: smart
     litellm_params:
-      model: anthropic/claude-sonnet-4-20250514
+      model: anthropic/claude-opus-4-7
       api_key: os.environ/ANTHROPIC_API_KEY
+      input_cost_per_token: 0.0000050
     model_info:
-      id: claude-sonnet-4-custom-pricing
-      input_cost_per_token: 0.0003         # 100x standard ($0.000003)
-      output_cost_per_token: 0.0015         # 100x standard ($0.000015)
+      adaptive_router_preferences:
+        quality_tier: 3
+        strengths: ["code_generation", "technical_design", "analytical_reasoning"]
+
+litellm_settings:
+  drop_params: True
+
+general_settings:
+  master_key: sk-1234 # REPLACE in production