fix(formatter): preserve reasoning_content for thinking models (#155)#158
fix(formatter): preserve reasoning_content for thinking models (#155)#158lailoo wants to merge 3 commits intoagentscope-ai:mainfrom
Conversation
…sages (agentscope-ai#155) When thinking models like Kimi K2.5 return assistant messages with reasoning_content + tool_calls, the formatter must preserve reasoning_content in subsequent API calls. Previously, thinking blocks were silently dropped, causing API error: "thinking is enabled but reasoning_content is missing in assistant tool call message".
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
| msgs = _sanitize_tool_messages(msgs) | ||
| return await super()._format(msgs) | ||
|
|
||
| # Collect thinking content per assistant msg (in order). |
There was a problem hiding this comment.
Should we apply this kind of fix to AgentScope library's formatter instead of CoPaw? @rayrayraykk @DavdGao
There was a problem hiding this comment.
@lailoo can you make this fix to AgentScope library instead? It will help to keep this code base focusing on the application not the model interaction layer.
There was a problem hiding this comment.
Good point! I'll move this fix to the AgentScope library instead. Will create a PR there and link it back here.
@ekzhu
There was a problem hiding this comment.
Pull request overview
Fixes multi-turn tool-calling failures for “thinking” models (e.g., Kimi K2.5) by preserving assistant thinking content as reasoning_content in the formatted chat history, preventing upstream APIs from rejecting assistant tool-call messages that omit it.
Changes:
- Extend
FileBlockSupportFormatter._format()to extract thinking blocks from assistant messages and inject them into the formatted output asreasoning_content. - Add regression tests covering preservation, absence when no thinking exists, and correct mapping across multiple assistant messages.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/copaw/agents/model_factory.py |
Preserves thinking content by injecting reasoning_content into formatted assistant messages after base formatting. |
tests/test_thinking_block_fix.py |
Adds regression tests validating reasoning_content preservation behavior across key scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| content=[ | ||
| ToolResultBlock(type="tool_result", id="c1", name="tool_a", output="done"), | ||
| ], |
There was a problem hiding this comment.
Several lines in this test file exceed the repo’s max line length (79) and will fail flake8/black (e.g., the one-line ToolResultBlock call here). Please run black and/or wrap these long argument lists across multiple lines so E501 doesn’t block CI/pre-commit.
| formatted = await super()._format(msgs) | ||
|
|
||
| # Inject reasoning_content into formatted assistant messages. | ||
| if assistant_thinking: |
There was a problem hiding this comment.
The injection loop is guarded by if assistant_thinking:, but assistant_thinking is populated for every assistant message even when none contain thinking blocks (empty strings). This causes an unnecessary second pass over formatted on most calls; consider guarding with if any(assistant_thinking): (or tracking a has_thinking flag) so the injection work is skipped when there’s nothing to inject.
| if assistant_thinking: | |
| if any(assistant_thinking): |
- Run black formatter on model_factory.py and test_thinking_block_fix.py - Fix E501 line length violations (max 79 characters) - Split long lines in test assertions and function calls
Review Summary by QodoPreserve reasoning_content for thinking models in formatter
WalkthroughsDescription• Preserve reasoning_content from thinking blocks in formatted assistant messages • Prevents API rejection for thinking models like Kimi K2.5 during multi-turn tool calling • Collects thinking content before base formatting, injects into formatted output • Add comprehensive tests for thinking block preservation across multiple scenarios Diagramflowchart LR
A["Assistant Message<br/>with ThinkingBlock"] -->|"_format()"| B["Collect thinking<br/>content"]
B -->|"base format"| C["Formatted message"]
C -->|"inject"| D["Message with<br/>reasoning_content"]
D -->|"API call"| E["Kimi K2.5<br/>accepts request"]
File Changes1. src/copaw/agents/model_factory.py
|
Code Review by Qodo
1. Token undercount vs reasoning
|
| # Inject reasoning_content into formatted assistant messages. | ||
| if assistant_thinking: | ||
| asst_idx = 0 | ||
| for fmt_msg in formatted: | ||
| if fmt_msg.get("role") == "assistant": | ||
| if ( | ||
| asst_idx < len(assistant_thinking) | ||
| and assistant_thinking[asst_idx] | ||
| ): | ||
| fmt_msg["reasoning_content"] = assistant_thinking[asst_idx] | ||
| asst_idx += 1 |
There was a problem hiding this comment.
1. Token undercount vs reasoning 🐞 Bug ⛯ Reliability
The formatter now injects reasoning_content into outgoing assistant messages, but the token counting logic used for /history and auto memory compaction only counts content. This will under-estimate prompt size for thinking models and can delay compaction until after requests start failing due to context limits.
Agent Prompt
## Issue description
The formatter now injects `reasoning_content` into formatted assistant messages for thinking models. However, the token counting logic used for auto memory compaction and `/history` only extracts tokens from `content`, ignoring `reasoning_content`, causing systematic undercounting and increasing the risk of exceeding model context limits before compaction triggers.
## Issue Context
- `FileBlockSupportFormatter._format()` adds `reasoning_content` to formatted assistant messages.
- `safe_count_message_tokens()` relies on `_extract_text_from_messages()` which currently concatenates only `content` fields.
- Memory compaction and `/history` both use this token estimate.
## Fix Focus Areas
- src/copaw/agents/utils/token_counting.py[58-86]
- src/copaw/agents/hooks/memory_compaction.py[161-169]
- src/copaw/agents/command_handler.py[245-251]
- src/copaw/agents/model_factory.py[75-112]
## Suggested change
- In `_extract_text_from_messages()`, append `msg.get("reasoning_content")` (when present) into `parts` similarly to `content`.
- (Optional but recommended) Consider also incorporating text from tool call arguments if present in the formatted schema, to further reduce undercounting for tool-heavy conversations.
- Add/extend a unit test to ensure token counting increases when `reasoning_content` is present in formatted messages.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
- Preserve reasoning_content injection logic from PR - Add _strip_top_level_message_name call from main branch - Both changes are complementary and work together
📝 WalkthroughWalkthroughThe PR fixes a bug where thinking blocks from assistant messages weren't being preserved as Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/test_thinking_block_fix.py (1)
200-204: Optional: remove manual__main__test runner in pytest module.For test modules, keeping execution through pytest only is usually cleaner and avoids duplicate/manual execution paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_thinking_block_fix.py` around lines 200 - 204, Remove the manual test runner block that calls test_reasoning_content_preserved(), test_no_reasoning_when_no_thinking(), test_multiple_assistant_messages() under if __name__ == "__main__": so the module runs only under pytest; delete that conditional and its print("\nAll tests passed!") line and rely on pytest discovery to execute the tests instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/copaw/agents/model_factory.py`:
- Around line 100-110: The current loop injects reasoning_content into every
assistant formatted message when assistant_thinking exists; change it to only
inject into assistant messages that are tool-calling by additionally checking
for a tool-call indicator (e.g., fmt_msg.get("tool_calls") or
fmt_msg.get("tool_call")) before setting fmt_msg["reasoning_content"] so only
tool-call assistant messages receive reasoning_content; keep the existing
assistant_thinking indexing (assistant_thinking, formatted, fmt_msg, asst_idx)
and the asst_idx increment behavior unchanged.
---
Nitpick comments:
In `@tests/test_thinking_block_fix.py`:
- Around line 200-204: Remove the manual test runner block that calls
test_reasoning_content_preserved(), test_no_reasoning_when_no_thinking(),
test_multiple_assistant_messages() under if __name__ == "__main__": so the
module runs only under pytest; delete that conditional and its print("\nAll
tests passed!") line and rely on pytest discovery to execute the tests instead.
| # Inject reasoning_content into formatted assistant messages. | ||
| if assistant_thinking: | ||
| asst_idx = 0 | ||
| for fmt_msg in formatted: | ||
| if fmt_msg.get("role") == "assistant": | ||
| if ( | ||
| asst_idx < len(assistant_thinking) | ||
| and assistant_thinking[asst_idx] | ||
| ): | ||
| fmt_msg["reasoning_content"] = assistant_thinking[asst_idx] | ||
| asst_idx += 1 |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
OpenAI API reasoning_content field assistant messages tool_calls specification
💡 Result:
Is there a reasoning_content field on assistant messages?
In the current OpenAI API docs, no: the chat/assistant message schema does not define a reasoning_content field. For reasoning models, “reasoning” is represented separately (in the Responses API) as reasoning items, not as an assistant-message field. [2]
Chat Completions: assistant message + tool_calls shape
For Chat Completions, an assistant message is essentially:
role: "assistant"content: required unless you providetool_calls(or the deprecatedfunction_call) [1]tool_calls: array of tool calls generated by the model [1]
tool_calls items (function tool call) include:
id(tool call id)type(currentlyfunction)function: { name, arguments }whereargumentsis a JSON string [1]
The docs also describe a “custom tool call” shape with:
id,type: "custom", andcustom: { name, input }[1]
Responses API: where “reasoning content” lives now
In the Responses API, reasoning is carried as a distinct reasoning object/item (not an assistant message field). It includes:
idsummary[](summary text parts)- optional
content[]containingreasoning_text - optional
encrypted_content(when requested viainclude) [2][3]
The docs also note you should include these reasoning items back in input for later turns if you are manually managing context. [2]
Sources: Chat Completions message/tool_calls schema [1]; Responses API reasoning object [2]; streaming reasoning_text / encrypted_content details [3].
🏁 Script executed:
# Search for reasoning_content usage in the codebase
rg "reasoning_content" -A 3 -B 3Repository: agentscope-ai/CoPaw
Length of output: 8281
🏁 Script executed:
# Search for assistant_thinking usage to understand the context
rg "assistant_thinking" -A 2 -B 2Repository: agentscope-ai/CoPaw
Length of output: 1791
🏁 Script executed:
# Look at the broader context of the model_factory.py file
wc -l src/copaw/agents/model_factory.pyRepository: agentscope-ai/CoPaw
Length of output: 100
🏁 Script executed:
# Read the relevant section of the file for full context
sed -n '90,120p' src/copaw/agents/model_factory.pyRepository: agentscope-ai/CoPaw
Length of output: 1320
🏁 Script executed:
# Search for Kimi or DeepSeek references and backend-specific handling
rg -i "kimi|deepseek" -B 3 -A 3Repository: agentscope-ai/CoPaw
Length of output: 2114
🏁 Script executed:
# Search for issue `#155` or related discussion in code/tests
rg "#155" -B 3 -A 3Repository: agentscope-ai/CoPaw
Length of output: 354
🏁 Script executed:
# Check the test file to understand expected behavior better
cat -n tests/test_thinking_block_fix.pyRepository: agentscope-ai/CoPaw
Length of output: 7447
🏁 Script executed:
# Search for tool_calls usage in model_factory to understand the flow
rg "tool_calls" -B 2 -A 2 src/copaw/agents/model_factory.pyRepository: agentscope-ai/CoPaw
Length of output: 45
🌐 Web query:
Kimi K2.5 DeepSeek reasoning_content field OpenAI-compatible API tool_calls
💡 Result:
Kimi K2.5 (Moonshot) — reasoning_content + OpenAI-compatible tool_calls
reasoning_contentoutput: In thinking mode, Kimi K2.5 returns a separatereasoning_contentalongsidecontent(answer). The K2.5 repo’s example reads it asresponse.choices[0].message.reasoning_content. You can disable thinking (instant mode) viaextra_body(e.g.,{"thinking": {"type":"disabled"}}for the official API, or{"chat_template_kwargs": {"thinking": False}}for vLLM/SGLang). [2]- OpenAI SDK compatibility note: Some Moonshot “thinking” docs note the OpenAI SDK types may not expose
reasoning_contentdirectly, so you may need to access it defensively (e.g.,hasattr/getattr) in streaming. [5] - Tool calling: Moonshot recommends
tool_calls(not legacyfunction_call), supports parallel tool calls, and stresses that every returnedtool_call.idmust be matched by a laterrole="tool"message with the correcttool_call_id—and you must append the assistant message that containedtool_callsinto history or you can gettool_call_id not found. [1]
DeepSeek — reasoning_content behavior + tool calls
reasoning_contentoutput: DeepSeek’s reasoning model returns bothreasoning_content(CoT) andcontent(final answer) as separate fields. [3]- Whether to send
reasoning_contentback:- DeepSeek’s reasoning model guide says: do not feed
reasoning_contentback into latermessages(it can error). [3] - DeepSeek’s newer thinking mode + tool calls guide says: during a single question’s multi-step “thinking + tool calls” loop, you must send
reasoning_contentback so the model can continue; when you start the next user question, remove priorreasoning_content(it will be ignored / should not be kept). [4]
- DeepSeek’s reasoning model guide says: do not feed
OpenAI-compatible tool_calls shape (reference)
- In OpenAI Chat Completions, tool calling is controlled via
tool_choice(auto/none/requiredor forcing a specific tool). This is the interface most “OpenAI-compatible” providers try to mirror. [6]
Sources: Moonshot tool-calls guide [1], Kimi K2.5 repo example [2], DeepSeek reasoning model docs [3], DeepSeek thinking-mode tool-calls guide [4], Moonshot thinking blog (reasoning_content access) [5], OpenAI chat completions reference (tool_choice) [6].
Gate reasoning_content injection to tool-call assistant messages only.
The tests verify reasoning_content exclusively on assistant messages with tool_calls (lines 65, 182 in the test file). Backend documentation for DeepSeek and Kimi K2.5 also indicates reasoning_content is most relevant in tool-calling contexts. Injecting into all assistant messages with thinking blocks risks rejection by stricter backends.
Proposed diff
if assistant_thinking:
asst_idx = 0
for fmt_msg in formatted:
if fmt_msg.get("role") == "assistant":
if (
+ fmt_msg.get("tool_calls")
+ and
asst_idx < len(assistant_thinking)
and assistant_thinking[asst_idx]
):
fmt_msg["reasoning_content"] = assistant_thinking[asst_idx]
asst_idx += 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Inject reasoning_content into formatted assistant messages. | |
| if assistant_thinking: | |
| asst_idx = 0 | |
| for fmt_msg in formatted: | |
| if fmt_msg.get("role") == "assistant": | |
| if ( | |
| asst_idx < len(assistant_thinking) | |
| and assistant_thinking[asst_idx] | |
| ): | |
| fmt_msg["reasoning_content"] = assistant_thinking[asst_idx] | |
| asst_idx += 1 | |
| # Inject reasoning_content into formatted assistant messages. | |
| if assistant_thinking: | |
| asst_idx = 0 | |
| for fmt_msg in formatted: | |
| if fmt_msg.get("role") == "assistant": | |
| if ( | |
| fmt_msg.get("tool_calls") | |
| and | |
| asst_idx < len(assistant_thinking) | |
| and assistant_thinking[asst_idx] | |
| ): | |
| fmt_msg["reasoning_content"] = assistant_thinking[asst_idx] | |
| asst_idx += 1 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/copaw/agents/model_factory.py` around lines 100 - 110, The current loop
injects reasoning_content into every assistant formatted message when
assistant_thinking exists; change it to only inject into assistant messages that
are tool-calling by additionally checking for a tool-call indicator (e.g.,
fmt_msg.get("tool_calls") or fmt_msg.get("tool_call")) before setting
fmt_msg["reasoning_content"] so only tool-call assistant messages receive
reasoning_content; keep the existing assistant_thinking indexing
(assistant_thinking, formatted, fmt_msg, asst_idx) and the asst_idx increment
behavior unchanged.
|
Closing this PR in favor of a fix directly to agentscope library. |
Summary
FileBlockSupportFormatter._format()inmodel_factory.pydelegates toOpenAIChatFormatterwhich silently drops thinking blocks, losingreasoning_contentfrom assistant messagesreasoning_contentfrom thinking blocks into the corresponding formatted assistant messagesFixes #155
Problem
When using Kimi K2.5 (which has thinking mode enabled by default):
reasoning_content+tool_callsThinkingBlock+ToolUseBlockin memoryThinkingBlock(logged as "Unsupported block type thinking")reasoning_contentand rejects itBefore fix (Web UI on main branch):
Error log:
Full traceback:
Before fix (real API call reproduction):
Changes
src/copaw/agents/model_factory.py— InFileBlockSupportFormatter._format(), collect thinking content from assistant messages before base formatting, then inject asreasoning_contentinto the formatted output. Pattern follows agentscope's existingDeepSeekChatFormatter.After fix (Web UI on fix branch):
After fix (real API call verification):
Test plan
test_reasoning_content_preserved— thinking blocks → reasoning_content in formatted outputtest_no_reasoning_when_no_thinking— no false positivestest_multiple_assistant_messages— each assistant msg gets correct reasoning_contentEffect on User Experience
Before: Using Kimi K2.5 (or any thinking model) with tools causes immediate failure on the second turn. Users cannot use thinking models with CoPaw at all.
After: Thinking models work correctly.
reasoning_contentis preserved across turns, enabling multi-step tool calling with models like Kimi K2.5, DeepSeek-R1, etc.Summary by CodeRabbit
Release Notes
Bug Fixes
Tests