fix(router): strip thinking blocks when switching between providers with incompatible formats#26012
Conversation
Greptile SummaryThis PR adds three methods to
Confidence Score: 3/5Not safe to merge: a new P1 duplicate-code issue compounds several previously raised but still-open P1 defects. The new _strip_thinking_blocks_from_messages is a near-identical copy of an existing function in llms/anthropic/common_utils.py, creating two diverging implementations of the same bug-prone logic. Combined with the unresolved overly-broad stripping, silent message-drop, and hardcoded provider-capability tuple from prior rounds, the PR has multiple P1 issues that need addressing before merge. litellm/router_strategy/complexity_router/complexity_router.py — the three new methods need revision; tests/test_litellm/router_strategy/test_complexity_router.py — new tests don't cover the exact bug scenario
|
| Filename | Overview |
|---|---|
| litellm/router_strategy/complexity_router/complexity_router.py | Adds three methods for thinking-block stripping; _strip_thinking_blocks_from_messages is a near-identical duplicate of strip_thinking_blocks_from_anthropic_messages already in llms/anthropic/common_utils.py; _should_strip_thinking_blocks uses a hardcoded provider tuple (violates custom rule 2605a1b1) and overly broad logic (both previously flagged); silent assistant-message drop on empty content also previously flagged. |
| tests/test_litellm/router_strategy/test_complexity_router.py | Adds two new mock-only tests for thinking-block stripping; tests are structurally valid but the strip scenario routes to gpt-4o-mini rather than an anthropic model (the actual bug target), and the same-provider test passes trivially since neither model has a recognized provider prefix — both concerns were previously raised. |
Sequence Diagram
sequenceDiagram
participant Client
participant ComplexityRouter
participant StripUtil as llms/anthropic/common_utils
participant TargetLLM as Target LLM
Client->>ComplexityRouter: async_pre_routing_hook(model, messages)
ComplexityRouter->>ComplexityRouter: classify(user_message) → tier
ComplexityRouter->>ComplexityRouter: get_model_for_tier(tier) → routed_model
ComplexityRouter->>ComplexityRouter: _should_strip_thinking_blocks(model, routed_model)
alt provider switch detected
ComplexityRouter->>ComplexityRouter: _strip_thinking_blocks_from_messages(messages) ⚠ duplicates StripUtil
Note over ComplexityRouter: Should call StripUtil instead
ComplexityRouter-->>Client: PreRoutingHookResponse(model=routed_model, messages=cleaned)
else same provider
ComplexityRouter-->>Client: PreRoutingHookResponse(model=routed_model, messages=messages)
end
Client->>TargetLLM: completion(model=routed_model, messages=cleaned)
Reviews (3): Last reviewed commit: "fix(router): strip thinking blocks when ..." | Re-trigger Greptile
| def _should_strip_thinking_blocks( | ||
| self, original_model: str, new_model: str | ||
| ) -> bool: | ||
| """Determine if thinking blocks should be stripped when switching models.""" | ||
| original_provider = self._get_provider_prefix(original_model) | ||
| new_provider = self._get_provider_prefix(new_model) | ||
| if original_provider == new_provider: | ||
| return False | ||
| providers_with_incompatible_thinking = ("vertex_ai", "anthropic") | ||
| return ( | ||
| original_provider in providers_with_incompatible_thinking | ||
| or new_provider in providers_with_incompatible_thinking | ||
| ) |
There was a problem hiding this comment.
Overly broad stripping — loses context for bedrock/Claude multi-turn
_should_strip_thinking_blocks returns True whenever either provider is vertex_ai or anthropic, even when the destination can handle thinking blocks just fine. For example, routing from anthropic → bedrock (which serves Claude models that also support thinking) would trigger stripping. In a multi-turn conversation the thinking blocks carry reasoning context back to the model; silently dropping them degrades response quality or causes the receiving model to lose chain-of-thought continuity.
The condition should be narrowed to the actual incompatible pair instead of "at least one side is in the list":
def _should_strip_thinking_blocks(self, original_model: str, new_model: str) -> bool:
original_provider = self._get_provider_prefix(original_model)
new_provider = self._get_provider_prefix(new_model)
if original_provider == new_provider:
return False
incompatible_pairs = {
frozenset({"vertex_ai", "anthropic"}),
}
return frozenset({original_provider, new_provider}) in incompatible_pairs| def _get_provider_prefix(self, model: str) -> str: | ||
| """Extract provider prefix from model name.""" | ||
| if "/" in model: | ||
| return model.split("/")[0] | ||
| for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"): | ||
| if model.startswith(prefix): | ||
| return prefix | ||
| return model |
There was a problem hiding this comment.
Provider-specific logic outside
llms/
_get_provider_prefix and _should_strip_thinking_blocks hard-code provider names (vertex_ai, anthropic, bedrock, …) directly in the complexity-router strategy file. The project rule is to keep provider-specific code in the llms/ directory so that provider-specific concerns are centralised and new providers don't require changes in multiple places. Consider moving the stripping logic into a shared utility under llms/ and calling it from here.
Rule Used: What: Avoid writing provider-specific code outside... (source)
| def _get_provider_prefix(self, model: str) -> str: | ||
| """Extract provider prefix from model name.""" | ||
| if "/" in model: | ||
| return model.split("/")[0] | ||
| for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"): | ||
| if model.startswith(prefix): | ||
| return prefix | ||
| return model |
There was a problem hiding this comment.
Hardcoded provider capability list
providers_with_incompatible_thinking is a static tuple that will need a code change every time a new provider with thinking blocks is added. Per the project convention, provider capabilities should live in model_prices_and_context_window.json (and be queried via get_model_info) rather than being hardcoded, so that support for new models/providers takes effect without a litellm upgrade.
Rule Used: What: Do not hardcode model-specific flags in the ... (source)
| ) | ||
| ] | ||
| if not filtered: | ||
| continue | ||
| msg_copy["content"] = filtered |
There was a problem hiding this comment.
Silent message drop when all content is thinking blocks
If an assistant turn contains only thinking/redacted_thinking blocks and no text, filtered will be empty and the entire message is silently dropped via continue. This can break the alternating user/assistant turn requirement that Anthropic and many other providers enforce, causing a downstream 400 error. Consider replacing the dropped message with a minimal placeholder or logging a warning so the caller is aware of the data loss.
| async def test_pre_routing_hook_strips_thinking_blocks_on_provider_switch(self, complexity_router): | ||
| """Test thinking blocks are stripped when switching from vertex_ai to anthropic.""" | ||
| messages = [ | ||
| {"role": "user", "content": "Hello!"}, | ||
| { | ||
| "role": "assistant", | ||
| "content": [ | ||
| {"type": "text", "text": "Sure!"}, | ||
| {"type": "thinking", "thinking": "User said hello", "signature": "abc123"}, | ||
| ], | ||
| }, | ||
| ] | ||
| result = await complexity_router.async_pre_routing_hook( | ||
| model="vertex_ai/test-model", | ||
| request_kwargs={}, | ||
| messages=messages, | ||
| ) | ||
| assert result is not None | ||
| # Should strip thinking blocks from assistant message | ||
| content = result.messages[1]["content"] | ||
| assert isinstance(content, list) | ||
| assert all(block["type"] == "text" for block in content) |
There was a problem hiding this comment.
Test doesn't exercise the described bug scenario
The test routes "Hello!" (SIMPLE tier) to "gpt-4o-mini", not to an Anthropic model, so it only verifies stripping when going vertex_ai → openai — not the original crash scenario of vertex_ai (GLM) → anthropic. Adding a fixture with SIMPLE: "anthropic/claude-3-haiku-20240307" would confirm the exact provider pair from the bug report is handled.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 32a5d902a5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"): | ||
| if model.startswith(prefix): | ||
| return prefix | ||
| return model |
There was a problem hiding this comment.
Resolve provider before deciding whether to strip thinking
_get_provider_prefix() falls back to returning the raw model string when there is no explicit provider prefix, but async_pre_routing_hook() is usually called with a complexity-router alias as model and many tier configs use unprefixed model IDs (e.g. claude-sonnet-*). In that common path, _should_strip_thinking_blocks() gets values like smart-router and claude-sonnet-4, neither matches ("vertex_ai", "anthropic"), and incompatible thinking blocks are not stripped, so the 400 this change is targeting can still occur.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR addresses complexity_router failures when routing between providers that represent “thinking” content blocks differently by stripping incompatible thinking/redacted_thinking blocks from message histories before forwarding.
Changes:
- Added provider-identification and “should strip thinking blocks” logic to
ComplexityRouter.async_pre_routing_hook(). - Added message-sanitization helper to remove
thinking/redacted_thinkingcontent blocks during provider switches. - Added/updated tests validating routing behavior and thinking-block stripping.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
litellm/router_strategy/complexity_router/complexity_router.py |
Adds provider-switch detection and strips thinking blocks from messages before returning the routed model. |
tests/test_litellm/router_strategy/test_complexity_router.py |
Adds tests for thinking-block stripping/preservation and minor formatting cleanups. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """Strip thinking/redacted_thinking blocks from messages.""" | ||
| import copy | ||
|
|
||
| cleaned: List[Dict[str, Any]] = [] | ||
| for msg in messages: | ||
| if not isinstance(msg, dict): | ||
| cleaned.append(msg) | ||
| continue | ||
| msg_copy = copy.deepcopy(msg) | ||
| content = msg_copy.get("content") | ||
| if isinstance(content, list): | ||
| filtered = [ | ||
| block | ||
| for block in content | ||
| if not ( | ||
| isinstance(block, dict) | ||
| and block.get("type") in ("thinking", "redacted_thinking") | ||
| ) | ||
| ] | ||
| if not filtered: | ||
| continue | ||
| msg_copy["content"] = filtered | ||
| cleaned.append(msg_copy) |
There was a problem hiding this comment.
When stripping is enabled, this function deepcopys every message dict even if it contains no list-based content blocks, which can be costly on large histories. Consider only copying when a message actually needs modification (e.g., scan for content lists containing thinking blocks first, then shallow-copy the outer message + filtered content) and moving the import copy to module scope to avoid repeated imports.
| async def test_pre_routing_hook_strips_thinking_blocks_on_provider_switch(self, complexity_router): | ||
| """Test thinking blocks are stripped when switching from vertex_ai to anthropic.""" | ||
| messages = [ | ||
| {"role": "user", "content": "Hello!"}, | ||
| { | ||
| "role": "assistant", | ||
| "content": [ | ||
| {"type": "text", "text": "Sure!"}, | ||
| {"type": "thinking", "thinking": "User said hello", "signature": "abc123"}, | ||
| ], | ||
| }, | ||
| ] | ||
| result = await complexity_router.async_pre_routing_hook( | ||
| model="vertex_ai/test-model", | ||
| request_kwargs={}, |
There was a problem hiding this comment.
The docstring says this test covers switching from vertex_ai to Anthropic, but with the basic_config fixture the routed model for a simple "Hello!" prompt is gpt-4o-mini (OpenAI), not an Anthropic model. Either adjust the tier mapping / prompt so routed_model is actually Anthropic, or update the test name/docstring to reflect what’s being exercised.
| async def test_pre_routing_hook_preserves_thinking_blocks_on_same_provider(self, complexity_router): | ||
| """Test thinking blocks are preserved when staying within same provider.""" | ||
| messages = [ | ||
| {"role": "user", "content": "Let's think step by step and reason through this problem carefully."} | ||
| {"role": "user", "content": "Hello!"}, | ||
| { | ||
| "role": "assistant", | ||
| "content": [ | ||
| {"type": "text", "text": "Sure!"}, | ||
| {"type": "thinking", "thinking": "User said hello", "signature": "abc123"}, | ||
| ], | ||
| }, | ||
| ] | ||
| # Using model without provider prefix for both - should preserve thinking blocks | ||
| result = await complexity_router.async_pre_routing_hook( | ||
| model="test-model", | ||
| request_kwargs={}, | ||
| messages=messages, | ||
| ) | ||
| assert result is not None | ||
| assert result.model == "o1-preview" # REASONING tier model | ||
| # Should preserve thinking blocks since no provider switch | ||
| content = result.messages[1]["content"] |
There was a problem hiding this comment.
This test claims it preserves thinking blocks "when staying within same provider" / "since no provider switch", but it actually routes from model="test-model" to the SIMPLE tier model (gpt-4o-mini) which is a different model string. If the intent is to validate the provider-equality branch (original_provider == new_provider), you’ll need a case where both model and routed_model resolve to the same provider; otherwise, update the test name/comments to match the behavior under test (no stripping because neither side is considered incompatible).
| if "/" in model: | ||
| return model.split("/")[0] | ||
| for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"): | ||
| if model.startswith(prefix): | ||
| return prefix |
There was a problem hiding this comment.
_get_provider_prefix() only detects Anthropic when the model string is prefixed (e.g., anthropic/...) or starts with anthropic. In this codebase, the default ComplexityRouter tier model is claude-sonnet-4-20250514 (no provider prefix), so this will be treated as its own provider string and _should_strip_thinking_blocks() may not strip when routing away from Claude/Anthropic models. Consider using a provider-inference utility (e.g., Router/LiteLLM provider resolution) or at least a Claude heuristic (model.lower().startswith('claude') / contains claude) so Anthropic models are consistently recognized.
| if "/" in model: | |
| return model.split("/")[0] | |
| for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"): | |
| if model.startswith(prefix): | |
| return prefix | |
| normalized_model = model.strip().lower() | |
| if "/" in normalized_model: | |
| return normalized_model.split("/")[0] | |
| for prefix in ("vertex_ai", "anthropic", "bedrock", "openai", "azure", "aws"): | |
| if normalized_model.startswith(prefix): | |
| return prefix | |
| if normalized_model.startswith("claude") or "claude" in normalized_model: | |
| return "anthropic" |
| def _should_strip_thinking_blocks( | ||
| self, original_model: str, new_model: str | ||
| ) -> bool: | ||
| """Determine if thinking blocks should be stripped when switching models.""" | ||
| original_provider = self._get_provider_prefix(original_model) | ||
| new_provider = self._get_provider_prefix(new_model) | ||
| if original_provider == new_provider: | ||
| return False | ||
| providers_with_incompatible_thinking = ("vertex_ai", "anthropic") | ||
| return ( | ||
| original_provider in providers_with_incompatible_thinking | ||
| or new_provider in providers_with_incompatible_thinking | ||
| ) |
There was a problem hiding this comment.
This implementation strips thinking blocks whenever the providers differ and either side is in (vertex_ai, anthropic). The PR description says stripping should occur specifically when switching between vertex_ai and anthropic; if that narrower behavior is intended, this should be an explicit pairwise check (vertex_ai→anthropic or anthropic→vertex_ai) rather than the current broader condition.
| import copy | ||
|
|
||
| cleaned: List[Dict[str, Any]] = [] | ||
| for msg in messages: | ||
| if not isinstance(msg, dict): | ||
| cleaned.append(msg) | ||
| continue | ||
| msg_copy = copy.deepcopy(msg) | ||
| content = msg_copy.get("content") | ||
| if isinstance(content, list): | ||
| filtered = [ | ||
| block | ||
| for block in content | ||
| if not ( | ||
| isinstance(block, dict) | ||
| and block.get("type") in ("thinking", "redacted_thinking") | ||
| ) | ||
| ] | ||
| if not filtered: | ||
| continue | ||
| msg_copy["content"] = filtered | ||
| cleaned.append(msg_copy) | ||
| return cleaned |
There was a problem hiding this comment.
_strip_thinking_blocks_from_messages() duplicates the existing helper strip_thinking_blocks_from_anthropic_messages() in litellm/llms/anthropic/common_utils.py (which already handles deep-copying and omitting empty content arrays). Reusing the existing utility (or moving a provider-agnostic version to a shared module) would reduce duplication and the risk of the behaviors diverging over time.
| import copy | |
| cleaned: List[Dict[str, Any]] = [] | |
| for msg in messages: | |
| if not isinstance(msg, dict): | |
| cleaned.append(msg) | |
| continue | |
| msg_copy = copy.deepcopy(msg) | |
| content = msg_copy.get("content") | |
| if isinstance(content, list): | |
| filtered = [ | |
| block | |
| for block in content | |
| if not ( | |
| isinstance(block, dict) | |
| and block.get("type") in ("thinking", "redacted_thinking") | |
| ) | |
| ] | |
| if not filtered: | |
| continue | |
| msg_copy["content"] = filtered | |
| cleaned.append(msg_copy) | |
| return cleaned | |
| from litellm.llms.anthropic.common_utils import ( | |
| strip_thinking_blocks_from_anthropic_messages, | |
| ) | |
| return strip_thinking_blocks_from_anthropic_messages(messages) |
| content = result.messages[1]["content"] | ||
| assert isinstance(content, list) | ||
| assert len(content) == 2 # Both text and thinking preserved |
There was a problem hiding this comment.
IndexError on
result.messages[1] — only one message in input
The messages list passed to async_pre_routing_hook contains a single user-turn at index 0. The hook returns that same list unchanged (no thinking blocks to strip, same provider), so result.messages[1] raises IndexError: list index out of range. The test claims to verify that thinking blocks are preserved, but it never actually includes any thinking blocks in the input, making both the assertion and the intent incorrect.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
1e21936 to
cd986ea
Compare
…ith incompatible formats When complexity_router switches from vertex_ai (GLM) to anthropic models, the thinking blocks in message history have incompatible signatures, causing API errors. This fix: - Adds _get_provider_prefix() to extract provider from model name - Adds _should_strip_thinking_blocks() to detect provider switches - Adds _strip_thinking_blocks_from_messages() to clean incompatible content - Automatically strips thinking/redacted_thinking blocks when switching providers
9507010 to
cd76d7f
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cd76d7f5ea
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
|
|
||
| # Strip thinking blocks when switching between providers with incompatible thinking formats | ||
| cleaned_messages = messages | ||
| if self._should_strip_thinking_blocks(model, routed_model): |
There was a problem hiding this comment.
Use the actual source provider before stripping thinking
async_pre_routing_hook() calls _should_strip_thinking_blocks(model, routed_model) using model from the request, but in normal complexity-router flow that value is the router alias (for example smart-router), not the provider of the existing assistant history. When tiers are configured with prefixed targets like anthropic/claude-*, this evaluates as a provider switch on every turn and strips thinking blocks even when the conversation stays on Anthropic, which drops valid context and breaks the “only on provider switch” behavior this change is meant to enforce.
Useful? React with 👍 / 👎.
| def _strip_thinking_blocks_from_messages( | ||
| self, messages: List[Dict[str, Any]] | ||
| ) -> List[Dict[str, Any]]: | ||
| """Strip thinking/redacted_thinking blocks from messages.""" | ||
| import copy | ||
|
|
||
| cleaned: List[Dict[str, Any]] = [] | ||
| for msg in messages: | ||
| if not isinstance(msg, dict): | ||
| cleaned.append(msg) | ||
| continue | ||
| msg_copy = copy.deepcopy(msg) | ||
| content = msg_copy.get("content") | ||
| if isinstance(content, list): | ||
| filtered = [ | ||
| block | ||
| for block in content | ||
| if not ( | ||
| isinstance(block, dict) | ||
| and block.get("type") in ("thinking", "redacted_thinking") | ||
| ) | ||
| ] | ||
| if not filtered: | ||
| continue | ||
| msg_copy["content"] = filtered | ||
| cleaned.append(msg_copy) | ||
| return cleaned |
There was a problem hiding this comment.
Duplicates an existing function in
llms/anthropic/common_utils.py
_strip_thinking_blocks_from_messages is line-for-line identical to strip_thinking_blocks_from_anthropic_messages already defined in litellm/llms/anthropic/common_utils.py (lines 780–808), including the same deepcopy approach, the same filter predicate, and the same silent-drop-on-empty logic. Having two copies means any future fix to the original (e.g. for the silent-drop issue noted in prior review) won't propagate here.
Replace the duplicated implementation with an import of the existing helper:
from litellm.llms.anthropic.common_utils import strip_thinking_blocks_from_anthropic_messagesand call it at the call site:
cleaned_messages = strip_thinking_blocks_from_anthropic_messages(messages)Rule Used: What: Avoid writing provider-specific code outside... (source)
Summary
Fixes the complexity_router error when switching from vertex_ai (GLM) to anthropic models:
API Error: 400 {"error":{"message":"messages.1.content.0.thinking.signature.str: Input should be a valid string"}}
Root Cause
When complexity_router routes between GLM and Anthropic, the thinking blocks from GLM responses are passed to Anthropic API. However, the thinking block formats are provider-specific and incompatible:
signaturefieldFix
Added automatic thinking block stripping when switching between providers with incompatible formats:
_get_provider_prefix()- Extracts provider prefix from model name_should_strip_thinking_blocks()- Returns True when switching between vertex_ai and anthropic_strip_thinking_blocks_from_messages()- Removes thinking/redacted_thinking blocks from message contentChanges
litellm/router_strategy/complexity_router/complexity_router.py- Added fixtests/test_litellm/router_strategy/test_complexity_router.py- Added 2 testsTesting
All 58 tests pass ✓
🆕 New Feature
🐛 Bug Fix
🧹 Refactoring
📖 Documentation
🚄 Infrastructure
✅ Test
Changes