-
-
Notifications
You must be signed in to change notification settings - Fork 7.4k
fix(router): strip thinking blocks when switching between providers with incompatible formats #26012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: litellm_internal_staging
Are you sure you want to change the base?
fix(router): strip thinking blocks when switching between providers with incompatible formats #26012
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -8,6 +8,7 @@ | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Inspired by ClawRouter: https://github.com/BlockRunAI/ClawRouter | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import re | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| from typing import TYPE_CHECKING, Any, Dict, List, Optional, Tuple, Union | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
@@ -331,6 +332,57 @@ def get_model_for_tier(self, tier: ComplexityTier) -> str: | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| f"No model configured for tier {tier_key} and no default_model set" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def _get_provider_prefix(self, model: str) -> str: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Extract provider prefix from model name.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if "/" in model: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return model.split("/")[0] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if model.startswith(prefix): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return prefix | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return model | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+335
to
+342
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Rule Used: What: Avoid writing provider-specific code outside... (source)
Comment on lines
+335
to
+342
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Rule Used: What: Do not hardcode model-specific flags in the ... (source)
Comment on lines
+339
to
+342
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Useful? React with πΒ / π. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def _should_strip_thinking_blocks( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self, original_model: str, new_model: str | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) -> bool: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Determine if thinking blocks should be stripped when switching models.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| original_provider = self._get_provider_prefix(original_model) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| new_provider = self._get_provider_prefix(new_model) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if original_provider == new_provider: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return False | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| providers_with_incompatible_thinking = ("vertex_ai", "anthropic") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return ( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| original_provider in providers_with_incompatible_thinking | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| or new_provider in providers_with_incompatible_thinking | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+344
to
+356
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The condition should be narrowed to the actual incompatible pair instead of "at least one side is in the list": def _should_strip_thinking_blocks(self, original_model: str, new_model: str) -> bool:
original_provider = self._get_provider_prefix(original_model)
new_provider = self._get_provider_prefix(new_model)
if original_provider == new_provider:
return False
incompatible_pairs = {
frozenset({"vertex_ai", "anthropic"}),
}
return frozenset({original_provider, new_provider}) in incompatible_pairs
Comment on lines
+344
to
+356
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| def _strip_thinking_blocks_from_messages( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| self, messages: List[Dict[str, Any]] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) -> List[Dict[str, Any]]: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| """Strip thinking/redacted_thinking blocks from messages.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import copy | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| cleaned: List[Dict[str, Any]] = [] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| for msg in messages: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not isinstance(msg, dict): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| cleaned.append(msg) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| continue | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| msg_copy = copy.deepcopy(msg) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| content = msg_copy.get("content") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if isinstance(content, list): | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| filtered = [ | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| block | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| for block in content | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not ( | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| isinstance(block, dict) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| and block.get("type") in ("thinking", "redacted_thinking") | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| if not filtered: | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| continue | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| msg_copy["content"] = filtered | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+378
to
+382
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
If an assistant turn contains only thinking/redacted_thinking blocks and no text, |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| cleaned.append(msg_copy) | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+361
to
+383
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| return cleaned | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Comment on lines
+362
to
+384
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| import copy | |
| cleaned: List[Dict[str, Any]] = [] | |
| for msg in messages: | |
| if not isinstance(msg, dict): | |
| cleaned.append(msg) | |
| continue | |
| msg_copy = copy.deepcopy(msg) | |
| content = msg_copy.get("content") | |
| if isinstance(content, list): | |
| filtered = [ | |
| block | |
| for block in content | |
| if not ( | |
| isinstance(block, dict) | |
| and block.get("type") in ("thinking", "redacted_thinking") | |
| ) | |
| ] | |
| if not filtered: | |
| continue | |
| msg_copy["content"] = filtered | |
| cleaned.append(msg_copy) | |
| return cleaned | |
| from litellm.llms.anthropic.common_utils import ( | |
| strip_thinking_blocks_from_anthropic_messages, | |
| ) | |
| return strip_thinking_blocks_from_anthropic_messages(messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicates an existing function in
llms/anthropic/common_utils.py
_strip_thinking_blocks_from_messages is line-for-line identical to strip_thinking_blocks_from_anthropic_messages already defined in litellm/llms/anthropic/common_utils.py (lines 780β808), including the same deepcopy approach, the same filter predicate, and the same silent-drop-on-empty logic. Having two copies means any future fix to the original (e.g. for the silent-drop issue noted in prior review) won't propagate here.
Replace the duplicated implementation with an import of the existing helper:
from litellm.llms.anthropic.common_utils import strip_thinking_blocks_from_anthropic_messagesand call it at the call site:
cleaned_messages = strip_thinking_blocks_from_anthropic_messages(messages)Rule Used: What: Avoid writing provider-specific code outside... (source)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use the actual source provider before stripping thinking
async_pre_routing_hook() calls _should_strip_thinking_blocks(model, routed_model) using model from the request, but in normal complexity-router flow that value is the router alias (for example smart-router), not the provider of the existing assistant history. When tiers are configured with prefixed targets like anthropic/claude-*, this evaluates as a provider switch on every turn and strips thinking blocks even when the conversation stays on Anthropic, which drops valid context and breaks the βonly on provider switchβ behavior this change is meant to enforce.
Useful? React with πΒ / π.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,16 +3,15 @@ | |
|
|
||
| Tests the rule-based complexity scoring and tier assignment logic. | ||
| """ | ||
|
|
||
| import os | ||
| import sys | ||
| from typing import Dict, List | ||
| from unittest.mock import MagicMock | ||
|
|
||
| import pytest | ||
|
|
||
| sys.path.insert( | ||
| 0, os.path.abspath("../../..") | ||
| ) # Adds the parent directory to the system path | ||
| sys.path.insert(0, os.path.abspath("../../..")) # Adds the parent directory to the system path | ||
|
|
||
| from litellm import Router | ||
| from litellm.router_strategy.complexity_router.complexity_router import ( | ||
|
|
@@ -321,12 +320,15 @@ async def test_pre_routing_hook_simple_message(self, complexity_router): | |
| async def test_pre_routing_hook_complex_message(self, complexity_router): | ||
| """Test pre-routing hook with a message containing technical content.""" | ||
| messages = [ | ||
| {"role": "user", "content": ( | ||
| "Design a distributed microservice architecture with Kubernetes " | ||
| "orchestration, implementing proper authentication, encryption, " | ||
| "and database optimization for high throughput. Think step by step " | ||
| "about the performance implications and scalability requirements." | ||
| )} | ||
| { | ||
| "role": "user", | ||
| "content": ( | ||
| "Design a distributed microservice architecture with Kubernetes " | ||
| "orchestration, implementing proper authentication, encryption, " | ||
| "and database optimization for high throughput. Think step by step " | ||
| "about the performance implications and scalability requirements." | ||
| ), | ||
| } | ||
| ] | ||
| result = await complexity_router.async_pre_routing_hook( | ||
| model="test-model", | ||
|
|
@@ -376,16 +378,63 @@ async def test_pre_routing_hook_with_system_prompt(self, complexity_router): | |
| @pytest.mark.asyncio | ||
| async def test_pre_routing_hook_reasoning_message(self, complexity_router): | ||
| """Test pre-routing hook with reasoning markers.""" | ||
| messages = [{"role": "user", "content": "Let's think step by step and reason through this problem carefully."}] | ||
| result = await complexity_router.async_pre_routing_hook( | ||
| model="test-model", | ||
| request_kwargs={}, | ||
| messages=messages, | ||
| ) | ||
| assert result is not None | ||
| assert result.model == "o1-preview" # REASONING tier model | ||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_pre_routing_hook_strips_thinking_blocks_on_provider_switch(self, complexity_router): | ||
| """Test thinking blocks are stripped when switching from vertex_ai to anthropic.""" | ||
| messages = [ | ||
| {"role": "user", "content": "Let's think step by step and reason through this problem carefully."} | ||
| {"role": "user", "content": "Hello!"}, | ||
| { | ||
| "role": "assistant", | ||
| "content": [ | ||
| {"type": "text", "text": "Sure!"}, | ||
| {"type": "thinking", "thinking": "User said hello", "signature": "abc123"}, | ||
| ], | ||
| }, | ||
| ] | ||
| result = await complexity_router.async_pre_routing_hook( | ||
| model="vertex_ai/test-model", | ||
| request_kwargs={}, | ||
|
Comment on lines
+391
to
+405
|
||
| messages=messages, | ||
| ) | ||
| assert result is not None | ||
| # Should strip thinking blocks from assistant message | ||
| content = result.messages[1]["content"] | ||
| assert isinstance(content, list) | ||
| assert all(block["type"] == "text" for block in content) | ||
|
Comment on lines
+391
to
+412
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The test routes |
||
|
|
||
| @pytest.mark.asyncio | ||
| async def test_pre_routing_hook_preserves_thinking_blocks_on_same_provider(self, complexity_router): | ||
| """Test thinking blocks are preserved when staying within same provider.""" | ||
| messages = [ | ||
| {"role": "user", "content": "Hello!"}, | ||
| { | ||
| "role": "assistant", | ||
| "content": [ | ||
| {"type": "text", "text": "Sure!"}, | ||
| {"type": "thinking", "thinking": "User said hello", "signature": "abc123"}, | ||
| ], | ||
| }, | ||
| ] | ||
| # Using model without provider prefix for both - should preserve thinking blocks | ||
| result = await complexity_router.async_pre_routing_hook( | ||
| model="test-model", | ||
| request_kwargs={}, | ||
| messages=messages, | ||
| ) | ||
| assert result is not None | ||
| assert result.model == "o1-preview" # REASONING tier model | ||
| # Should preserve thinking blocks since no provider switch | ||
| content = result.messages[1]["content"] | ||
|
Comment on lines
+415
to
+435
|
||
| assert isinstance(content, list) | ||
| assert len(content) == 2 # Both text and thinking preserved | ||
|
Comment on lines
+435
to
+437
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
The |
||
|
|
||
|
|
||
| class TestConfigOverrides: | ||
|
|
@@ -412,9 +461,7 @@ def test_custom_tier_boundaries(self, mock_router_instance): | |
| complexity_router_config=config, | ||
| ) | ||
| # With very low thresholds, even neutral prompts should be COMPLEX or higher | ||
| tier, score, signals = router.classify( | ||
| "Explain how HTTP works with REST APIs and distributed systems" | ||
| ) | ||
| tier, score, signals = router.classify("Explain how HTTP works with REST APIs and distributed systems") | ||
| # With boundaries this low, should be at least MEDIUM (anything above -0.5) | ||
| assert tier != ComplexityTier.SIMPLE, f"Expected non-SIMPLE tier, got {tier} with score {score}" | ||
|
|
||
|
|
@@ -575,22 +622,22 @@ def test_default_config_not_mutated(self, mock_router_instance): | |
|
|
||
| # Get original default | ||
| original_default = ComplexityRouterConfig().default_model | ||
|
|
||
| # Create router with empty config and custom default_model | ||
| router1 = ComplexityRouter( | ||
| model_name="test-router-1", | ||
| litellm_router_instance=mock_router_instance, | ||
| complexity_router_config=None, | ||
| default_model="custom-fallback", | ||
| ) | ||
|
|
||
| # Create another router without config | ||
| router2 = ComplexityRouter( | ||
| model_name="test-router-2", | ||
| litellm_router_instance=mock_router_instance, | ||
| complexity_router_config=None, | ||
| ) | ||
|
|
||
| # Router2 should have fresh defaults, not router1's custom default_model | ||
| # Create a fresh config to check | ||
| fresh_config = ComplexityRouterConfig() | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_provider_prefix()only detects Anthropic when the model string is prefixed (e.g.,anthropic/...) or starts withanthropic. In this codebase, the default ComplexityRouter tier model isclaude-sonnet-4-20250514(no provider prefix), so this will be treated as its own provider string and_should_strip_thinking_blocks()may not strip when routing away from Claude/Anthropic models. Consider using a provider-inference utility (e.g., Router/LiteLLM provider resolution) or at least a Claude heuristic (model.lower().startswith('claude')/ containsclaude) so Anthropic models are consistently recognized.