Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 63 additions & 3 deletions litellm/router_strategy/complexity_router/complexity_router.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@

Inspired by ClawRouter: https://github.com/BlockRunAI/ClawRouter
"""

import re
from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union

Expand Down Expand Up @@ -331,6 +332,57 @@ def get_model_for_tier(self, tier: ComplexityTier) -> str:
f"No model configured for tier {tier_key} and no default_model set"
)

def _get_provider_prefix(self, model: str) -> str:
"""Extract provider prefix from model name."""
if "/" in model:
return model.split("/")[0]
for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"):
if model.startswith(prefix):
return prefix
Comment on lines +337 to +341
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_provider_prefix() only detects Anthropic when the model string is prefixed (e.g., anthropic/...) or starts with anthropic. In this codebase, the default ComplexityRouter tier model is claude-sonnet-4-20250514 (no provider prefix), so this will be treated as its own provider string and _should_strip_thinking_blocks() may not strip when routing away from Claude/Anthropic models. Consider using a provider-inference utility (e.g., Router/LiteLLM provider resolution) or at least a Claude heuristic (model.lower().startswith('claude') / contains claude) so Anthropic models are consistently recognized.

Suggested change
if "/" in model:
return model.split("/")[0]
for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"):
if model.startswith(prefix):
return prefix
normalized_model = model.strip().lower()
if "/" in normalized_model:
return normalized_model.split("/")[0]
for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"):
if normalized_model.startswith(prefix):
return prefix
if normalized_model.startswith("claude") or "claude" in normalized_model:
return "anthropic"

Copilot uses AI. Check for mistakes.
return model
Comment on lines +335 to +342
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Provider-specific logic outside llms/

_get_provider_prefix and _should_strip_thinking_blocks hard-code provider names (vertex_ai, anthropic, bedrock, …) directly in the complexity-router strategy file. The project rule is to keep provider-specific code in the llms/ directory so that provider-specific concerns are centralised and new providers don't require changes in multiple places. Consider moving the stripping logic into a shared utility under llms/ and calling it from here.

Rule Used: What: Avoid writing provider-specific code outside... (source)

Comment on lines +335 to +342
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Hardcoded provider capability list

providers_with_incompatible_thinking is a static tuple that will need a code change every time a new provider with thinking blocks is added. Per the project convention, provider capabilities should live in model_prices_and_context_window.json (and be queried via get_model_info) rather than being hardcoded, so that support for new models/providers takes effect without a litellm upgrade.

Rule Used: What: Do not hardcode model-specific flags in the ... (source)

Comment on lines +339 to +342
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve provider before deciding whether to strip thinking

_get_provider_prefix() falls back to returning the raw model string when there is no explicit provider prefix, but async_pre_routing_hook() is usually called with a complexity-router alias as model and many tier configs use unprefixed model IDs (e.g. claude-sonnet-*). In that common path, _should_strip_thinking_blocks() gets values like smart-router and claude-sonnet-4, neither matches ("vertex_ai", "anthropic"), and incompatible thinking blocks are not stripped, so the 400 this change is targeting can still occur.

Useful? React with πŸ‘Β / πŸ‘Ž.


def _should_strip_thinking_blocks(
self, original_model: str, new_model: str
) -> bool:
"""Determine if thinking blocks should be stripped when switching models."""
original_provider = self._get_provider_prefix(original_model)
new_provider = self._get_provider_prefix(new_model)
if original_provider == new_provider:
return False
providers_with_incompatible_thinking = ("vertex_ai", "anthropic")
return (
original_provider in providers_with_incompatible_thinking
or new_provider in providers_with_incompatible_thinking
)
Comment on lines +344 to +356
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Overly broad stripping β€” loses context for bedrock/Claude multi-turn

_should_strip_thinking_blocks returns True whenever either provider is vertex_ai or anthropic, even when the destination can handle thinking blocks just fine. For example, routing from anthropic β†’ bedrock (which serves Claude models that also support thinking) would trigger stripping. In a multi-turn conversation the thinking blocks carry reasoning context back to the model; silently dropping them degrades response quality or causes the receiving model to lose chain-of-thought continuity.

The condition should be narrowed to the actual incompatible pair instead of "at least one side is in the list":

def _should_strip_thinking_blocks(self, original_model: str, new_model: str) -> bool:
    original_provider = self._get_provider_prefix(original_model)
    new_provider = self._get_provider_prefix(new_model)
    if original_provider == new_provider:
        return False
    incompatible_pairs = {
        frozenset({"vertex_ai", "anthropic"}),
    }
    return frozenset({original_provider, new_provider}) in incompatible_pairs

Comment on lines +344 to +356
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation strips thinking blocks whenever the providers differ and either side is in (vertex_ai, anthropic). The PR description says stripping should occur specifically when switching between vertex_ai and anthropic; if that narrower behavior is intended, this should be an explicit pairwise check (vertex_ai→anthropic or anthropic→vertex_ai) rather than the current broader condition.

Copilot uses AI. Check for mistakes.

def _strip_thinking_blocks_from_messages(
self, messages: List[Dict[str, Any]]
) -> List[Dict[str, Any]]:
"""Strip thinking/redacted_thinking blocks from messages."""
import copy

cleaned: List[Dict[str, Any]] = []
for msg in messages:
if not isinstance(msg, dict):
cleaned.append(msg)
continue
msg_copy = copy.deepcopy(msg)
content = msg_copy.get("content")
if isinstance(content, list):
filtered = [
block
for block in content
if not (
isinstance(block, dict)
and block.get("type") in ("thinking", "redacted_thinking")
)
]
if not filtered:
continue
msg_copy["content"] = filtered
Comment on lines +378 to +382
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Silent message drop when all content is thinking blocks

If an assistant turn contains only thinking/redacted_thinking blocks and no text, filtered will be empty and the entire message is silently dropped via continue. This can break the alternating user/assistant turn requirement that Anthropic and many other providers enforce, causing a downstream 400 error. Consider replacing the dropped message with a minimal placeholder or logging a warning so the caller is aware of the data loss.

cleaned.append(msg_copy)
Comment on lines +361 to +383
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When stripping is enabled, this function deepcopys every message dict even if it contains no list-based content blocks, which can be costly on large histories. Consider only copying when a message actually needs modification (e.g., scan for content lists containing thinking blocks first, then shallow-copy the outer message + filtered content) and moving the import copy to module scope to avoid repeated imports.

Copilot uses AI. Check for mistakes.
return cleaned
Comment on lines +362 to +384
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_strip_thinking_blocks_from_messages() duplicates the existing helper strip_thinking_blocks_from_anthropic_messages() in litellm/llms/anthropic/common_utils.py (which already handles deep-copying and omitting empty content arrays). Reusing the existing utility (or moving a provider-agnostic version to a shared module) would reduce duplication and the risk of the behaviors diverging over time.

Suggested change
import copy
cleaned: List[Dict[str, Any]] = []
for msg in messages:
if not isinstance(msg, dict):
cleaned.append(msg)
continue
msg_copy = copy.deepcopy(msg)
content = msg_copy.get("content")
if isinstance(content, list):
filtered = [
block
for block in content
if not (
isinstance(block, dict)
and block.get("type") in ("thinking", "redacted_thinking")
)
]
if not filtered:
continue
msg_copy["content"] = filtered
cleaned.append(msg_copy)
return cleaned
from litellm.llms.anthropic.common_utils import (
strip_thinking_blocks_from_anthropic_messages,
)
return strip_thinking_blocks_from_anthropic_messages(messages)

Copilot uses AI. Check for mistakes.
Comment on lines +358 to +384
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Duplicates an existing function in llms/anthropic/common_utils.py

_strip_thinking_blocks_from_messages is line-for-line identical to strip_thinking_blocks_from_anthropic_messages already defined in litellm/llms/anthropic/common_utils.py (lines 780–808), including the same deepcopy approach, the same filter predicate, and the same silent-drop-on-empty logic. Having two copies means any future fix to the original (e.g. for the silent-drop issue noted in prior review) won't propagate here.

Replace the duplicated implementation with an import of the existing helper:

from litellm.llms.anthropic.common_utils import strip_thinking_blocks_from_anthropic_messages

and call it at the call site:

cleaned_messages = strip_thinking_blocks_from_anthropic_messages(messages)

Rule Used: What: Avoid writing provider-specific code outside... (source)


async def async_pre_routing_hook(
self,
model: str,
Expand Down Expand Up @@ -400,11 +452,19 @@ async def async_pre_routing_hook(
routed_model = self.get_model_for_tier(tier)

verbose_router_logger.info(
f"ComplexityRouter: tier={tier.value}, score={score:.3f}, "
f"signals={signals}, routed_model={routed_model}"
f"ComplexityRouter: tier={tier.value}, score={score:.3f}, signals={signals}, routed_model={routed_model}"
)

# Strip thinking blocks when switching between providers with incompatible thinking formats
cleaned_messages = messages
if self._should_strip_thinking_blocks(model, routed_model):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use the actual source provider before stripping thinking

async_pre_routing_hook() calls _should_strip_thinking_blocks(model, routed_model) using model from the request, but in normal complexity-router flow that value is the router alias (for example smart-router), not the provider of the existing assistant history. When tiers are configured with prefixed targets like anthropic/claude-*, this evaluates as a provider switch on every turn and strips thinking blocks even when the conversation stays on Anthropic, which drops valid context and breaks the β€œonly on provider switch” behavior this change is meant to enforce.

Useful? React with πŸ‘Β / πŸ‘Ž.

cleaned_messages = self._strip_thinking_blocks_from_messages(messages)
if cleaned_messages != messages:
verbose_router_logger.debug(
f"ComplexityRouter: stripped thinking blocks when switching from {model} to {routed_model}"
)

return PreRoutingHookResponse(
model=routed_model,
messages=messages,
messages=cleaned_messages,
)
81 changes: 64 additions & 17 deletions tests/test_litellm/router_strategy/test_complexity_router.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,15 @@

Tests the rule-based complexity scoring and tier assignment logic.
"""

import os
import sys
from typing import Dict, List
from unittest.mock import MagicMock

import pytest

sys.path.insert(
0, os.path.abspath("../../..")
) # Adds the parent directory to the system path
sys.path.insert(0, os.path.abspath("../../..")) # Adds the parent directory to the system path

from litellm import Router
from litellm.router_strategy.complexity_router.complexity_router import (
Expand Down Expand Up @@ -321,12 +320,15 @@ async def test_pre_routing_hook_simple_message(self, complexity_router):
async def test_pre_routing_hook_complex_message(self, complexity_router):
"""Test pre-routing hook with a message containing technical content."""
messages = [
{"role": "user", "content": (
"Design a distributed microservice architecture with Kubernetes "
"orchestration, implementing proper authentication, encryption, "
"and database optimization for high throughput. Think step by step "
"about the performance implications and scalability requirements."
)}
{
"role": "user",
"content": (
"Design a distributed microservice architecture with Kubernetes "
"orchestration, implementing proper authentication, encryption, "
"and database optimization for high throughput. Think step by step "
"about the performance implications and scalability requirements."
),
}
]
result = await complexity_router.async_pre_routing_hook(
model="test-model",
Expand Down Expand Up @@ -376,16 +378,63 @@ async def test_pre_routing_hook_with_system_prompt(self, complexity_router):
@pytest.mark.asyncio
async def test_pre_routing_hook_reasoning_message(self, complexity_router):
"""Test pre-routing hook with reasoning markers."""
messages = [{"role": "user", "content": "Let's think step by step and reason through this problem carefully."}]
result = await complexity_router.async_pre_routing_hook(
model="test-model",
request_kwargs={},
messages=messages,
)
assert result is not None
assert result.model == "o1-preview" # REASONING tier model

@pytest.mark.asyncio
async def test_pre_routing_hook_strips_thinking_blocks_on_provider_switch(self, complexity_router):
"""Test thinking blocks are stripped when switching from vertex_ai to anthropic."""
messages = [
{"role": "user", "content": "Let's think step by step and reason through this problem carefully."}
{"role": "user", "content": "Hello!"},
{
"role": "assistant",
"content": [
{"type": "text", "text": "Sure!"},
{"type": "thinking", "thinking": "User said hello", "signature": "abc123"},
],
},
]
result = await complexity_router.async_pre_routing_hook(
model="vertex_ai/test-model",
request_kwargs={},
Comment on lines +391 to +405
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring says this test covers switching from vertex_ai to Anthropic, but with the basic_config fixture the routed model for a simple "Hello!" prompt is gpt-4o-mini (OpenAI), not an Anthropic model. Either adjust the tier mapping / prompt so routed_model is actually Anthropic, or update the test name/docstring to reflect what’s being exercised.

Copilot uses AI. Check for mistakes.
messages=messages,
)
assert result is not None
# Should strip thinking blocks from assistant message
content = result.messages[1]["content"]
assert isinstance(content, list)
assert all(block["type"] == "text" for block in content)
Comment on lines +391 to +412
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Test doesn't exercise the described bug scenario

The test routes "Hello!" (SIMPLE tier) to "gpt-4o-mini", not to an Anthropic model, so it only verifies stripping when going vertex_ai β†’ openai β€” not the original crash scenario of vertex_ai (GLM) β†’ anthropic. Adding a fixture with SIMPLE: "anthropic/claude-3-haiku-20240307" would confirm the exact provider pair from the bug report is handled.


@pytest.mark.asyncio
async def test_pre_routing_hook_preserves_thinking_blocks_on_same_provider(self, complexity_router):
"""Test thinking blocks are preserved when staying within same provider."""
messages = [
{"role": "user", "content": "Hello!"},
{
"role": "assistant",
"content": [
{"type": "text", "text": "Sure!"},
{"type": "thinking", "thinking": "User said hello", "signature": "abc123"},
],
},
]
# Using model without provider prefix for both - should preserve thinking blocks
result = await complexity_router.async_pre_routing_hook(
model="test-model",
request_kwargs={},
messages=messages,
)
assert result is not None
assert result.model == "o1-preview" # REASONING tier model
# Should preserve thinking blocks since no provider switch
content = result.messages[1]["content"]
Comment on lines +415 to +435
Copy link

Copilot AI Apr 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test claims it preserves thinking blocks "when staying within same provider" / "since no provider switch", but it actually routes from model="test-model" to the SIMPLE tier model (gpt-4o-mini) which is a different model string. If the intent is to validate the provider-equality branch (original_provider == new_provider), you’ll need a case where both model and routed_model resolve to the same provider; otherwise, update the test name/comments to match the behavior under test (no stripping because neither side is considered incompatible).

Copilot uses AI. Check for mistakes.
assert isinstance(content, list)
assert len(content) == 2 # Both text and thinking preserved
Comment on lines +435 to +437
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 IndexError on result.messages[1] β€” only one message in input

The messages list passed to async_pre_routing_hook contains a single user-turn at index 0. The hook returns that same list unchanged (no thinking blocks to strip, same provider), so result.messages[1] raises IndexError: list index out of range. The test claims to verify that thinking blocks are preserved, but it never actually includes any thinking blocks in the input, making both the assertion and the intent incorrect.



class TestConfigOverrides:
Expand All @@ -412,9 +461,7 @@ def test_custom_tier_boundaries(self, mock_router_instance):
complexity_router_config=config,
)
# With very low thresholds, even neutral prompts should be COMPLEX or higher
tier, score, signals = router.classify(
"Explain how HTTP works with REST APIs and distributed systems"
)
tier, score, signals = router.classify("Explain how HTTP works with REST APIs and distributed systems")
# With boundaries this low, should be at least MEDIUM (anything above -0.5)
assert tier != ComplexityTier.SIMPLE, f"Expected non-SIMPLE tier, got {tier} with score {score}"

Expand Down Expand Up @@ -575,22 +622,22 @@ def test_default_config_not_mutated(self, mock_router_instance):

# Get original default
original_default = ComplexityRouterConfig().default_model

# Create router with empty config and custom default_model
router1 = ComplexityRouter(
model_name="test-router-1",
litellm_router_instance=mock_router_instance,
complexity_router_config=None,
default_model="custom-fallback",
)

# Create another router without config
router2 = ComplexityRouter(
model_name="test-router-2",
litellm_router_instance=mock_router_instance,
complexity_router_config=None,
)

# Router2 should have fresh defaults, not router1's custom default_model
# Create a fresh config to check
fresh_config = ComplexityRouterConfig()
Expand Down