-
Notifications
You must be signed in to change notification settings - Fork 2.2k
fix(formatter): preserve reasoning_content for thinking models (#155) #158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -73,14 +73,43 @@ class FileBlockSupportFormatter(base_formatter_class): | |||||||||||||||||||||||||||||||||||||||||||||||||
| """Formatter with file block support for tool results.""" | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| async def _format(self, msgs): | ||||||||||||||||||||||||||||||||||||||||||||||||||
| """Override to sanitize tool messages before formatting. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| """Override to sanitize tool messages and preserve thinking blocks. | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| This prevents OpenAI API errors from improperly paired | ||||||||||||||||||||||||||||||||||||||||||||||||||
| tool messages. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| - Sanitizes improperly paired tool messages. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| - After base formatting, injects ``reasoning_content`` from | ||||||||||||||||||||||||||||||||||||||||||||||||||
| thinking blocks into assistant messages so that APIs like | ||||||||||||||||||||||||||||||||||||||||||||||||||
| Kimi K2.5 (which require it) don't reject the request. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||||||||||||||||||||||||
| msgs = _sanitize_tool_messages(msgs) | ||||||||||||||||||||||||||||||||||||||||||||||||||
| messages = await super()._format(msgs) | ||||||||||||||||||||||||||||||||||||||||||||||||||
| return _strip_top_level_message_name(messages) | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| # Collect thinking content per assistant msg (in order). | ||||||||||||||||||||||||||||||||||||||||||||||||||
| assistant_thinking: list[str] = [] | ||||||||||||||||||||||||||||||||||||||||||||||||||
| for msg in msgs: | ||||||||||||||||||||||||||||||||||||||||||||||||||
| if msg.role == "assistant": | ||||||||||||||||||||||||||||||||||||||||||||||||||
| parts = [ | ||||||||||||||||||||||||||||||||||||||||||||||||||
| b.get("thinking", "") | ||||||||||||||||||||||||||||||||||||||||||||||||||
| for b in msg.get_content_blocks() | ||||||||||||||||||||||||||||||||||||||||||||||||||
| if b.get("type") == "thinking" | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ] | ||||||||||||||||||||||||||||||||||||||||||||||||||
| assistant_thinking.append( | ||||||||||||||||||||||||||||||||||||||||||||||||||
| "\n".join(p for p in parts if p), | ||||||||||||||||||||||||||||||||||||||||||||||||||
| ) | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| formatted = await super()._format(msgs) | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| # Inject reasoning_content into formatted assistant messages. | ||||||||||||||||||||||||||||||||||||||||||||||||||
| if assistant_thinking: | ||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
| if assistant_thinking: | |
| if any(assistant_thinking): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Token undercount vs reasoning 🐞 Bug ⛯ Reliability
The formatter now injects reasoning_content into outgoing assistant messages, but the token counting logic used for /history and auto memory compaction only counts content. This will under-estimate prompt size for thinking models and can delay compaction until after requests start failing due to context limits.
Agent Prompt
## Issue description
The formatter now injects `reasoning_content` into formatted assistant messages for thinking models. However, the token counting logic used for auto memory compaction and `/history` only extracts tokens from `content`, ignoring `reasoning_content`, causing systematic undercounting and increasing the risk of exceeding model context limits before compaction triggers.
## Issue Context
- `FileBlockSupportFormatter._format()` adds `reasoning_content` to formatted assistant messages.
- `safe_count_message_tokens()` relies on `_extract_text_from_messages()` which currently concatenates only `content` fields.
- Memory compaction and `/history` both use this token estimate.
## Fix Focus Areas
- src/copaw/agents/utils/token_counting.py[58-86]
- src/copaw/agents/hooks/memory_compaction.py[161-169]
- src/copaw/agents/command_handler.py[245-251]
- src/copaw/agents/model_factory.py[75-112]
## Suggested change
- In `_extract_text_from_messages()`, append `msg.get("reasoning_content")` (when present) into `parts` similarly to `content`.
- (Optional but recommended) Consider also incorporating text from tool call arguments if present in the formatted schema, to further reduce undercounting for tool-heavy conversations.
- Add/extend a unit test to ensure token counting increases when `reasoning_content` is present in formatted messages.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🌐 Web query:
OpenAI API reasoning_content field assistant messages tool_calls specification
💡 Result:
Is there a reasoning_content field on assistant messages?
In the current OpenAI API docs, no: the chat/assistant message schema does not define a reasoning_content field. For reasoning models, “reasoning” is represented separately (in the Responses API) as reasoning items, not as an assistant-message field. [2]
Chat Completions: assistant message + tool_calls shape
For Chat Completions, an assistant message is essentially:
role: "assistant"content: required unless you providetool_calls(or the deprecatedfunction_call) [1]tool_calls: array of tool calls generated by the model [1]
tool_calls items (function tool call) include:
id(tool call id)type(currentlyfunction)function: { name, arguments }whereargumentsis a JSON string [1]
The docs also describe a “custom tool call” shape with:
id,type: "custom", andcustom: { name, input }[1]
Responses API: where “reasoning content” lives now
In the Responses API, reasoning is carried as a distinct reasoning object/item (not an assistant message field). It includes:
idsummary[](summary text parts)- optional
content[]containingreasoning_text - optional
encrypted_content(when requested viainclude) [2][3]
The docs also note you should include these reasoning items back in input for later turns if you are manually managing context. [2]
Sources: Chat Completions message/tool_calls schema [1]; Responses API reasoning object [2]; streaming reasoning_text / encrypted_content details [3].
🏁 Script executed:
# Search for reasoning_content usage in the codebase
rg "reasoning_content" -A 3 -B 3Repository: agentscope-ai/CoPaw
Length of output: 8281
🏁 Script executed:
# Search for assistant_thinking usage to understand the context
rg "assistant_thinking" -A 2 -B 2Repository: agentscope-ai/CoPaw
Length of output: 1791
🏁 Script executed:
# Look at the broader context of the model_factory.py file
wc -l src/copaw/agents/model_factory.pyRepository: agentscope-ai/CoPaw
Length of output: 100
🏁 Script executed:
# Read the relevant section of the file for full context
sed -n '90,120p' src/copaw/agents/model_factory.pyRepository: agentscope-ai/CoPaw
Length of output: 1320
🏁 Script executed:
# Search for Kimi or DeepSeek references and backend-specific handling
rg -i "kimi|deepseek" -B 3 -A 3Repository: agentscope-ai/CoPaw
Length of output: 2114
🏁 Script executed:
# Search for issue `#155` or related discussion in code/tests
rg "#155" -B 3 -A 3Repository: agentscope-ai/CoPaw
Length of output: 354
🏁 Script executed:
# Check the test file to understand expected behavior better
cat -n tests/test_thinking_block_fix.pyRepository: agentscope-ai/CoPaw
Length of output: 7447
🏁 Script executed:
# Search for tool_calls usage in model_factory to understand the flow
rg "tool_calls" -B 2 -A 2 src/copaw/agents/model_factory.pyRepository: agentscope-ai/CoPaw
Length of output: 45
🌐 Web query:
Kimi K2.5 DeepSeek reasoning_content field OpenAI-compatible API tool_calls
💡 Result:
Kimi K2.5 (Moonshot) — reasoning_content + OpenAI-compatible tool_calls
reasoning_contentoutput: In thinking mode, Kimi K2.5 returns a separatereasoning_contentalongsidecontent(answer). The K2.5 repo’s example reads it asresponse.choices[0].message.reasoning_content. You can disable thinking (instant mode) viaextra_body(e.g.,{"thinking": {"type":"disabled"}}for the official API, or{"chat_template_kwargs": {"thinking": False}}for vLLM/SGLang). [2]- OpenAI SDK compatibility note: Some Moonshot “thinking” docs note the OpenAI SDK types may not expose
reasoning_contentdirectly, so you may need to access it defensively (e.g.,hasattr/getattr) in streaming. [5] - Tool calling: Moonshot recommends
tool_calls(not legacyfunction_call), supports parallel tool calls, and stresses that every returnedtool_call.idmust be matched by a laterrole="tool"message with the correcttool_call_id—and you must append the assistant message that containedtool_callsinto history or you can gettool_call_id not found. [1]
DeepSeek — reasoning_content behavior + tool calls
reasoning_contentoutput: DeepSeek’s reasoning model returns bothreasoning_content(CoT) andcontent(final answer) as separate fields. [3]- Whether to send
reasoning_contentback:- DeepSeek’s reasoning model guide says: do not feed
reasoning_contentback into latermessages(it can error). [3] - DeepSeek’s newer thinking mode + tool calls guide says: during a single question’s multi-step “thinking + tool calls” loop, you must send
reasoning_contentback so the model can continue; when you start the next user question, remove priorreasoning_content(it will be ignored / should not be kept). [4]
- DeepSeek’s reasoning model guide says: do not feed
OpenAI-compatible tool_calls shape (reference)
- In OpenAI Chat Completions, tool calling is controlled via
tool_choice(auto/none/requiredor forcing a specific tool). This is the interface most “OpenAI-compatible” providers try to mirror. [6]
Sources: Moonshot tool-calls guide [1], Kimi K2.5 repo example [2], DeepSeek reasoning model docs [3], DeepSeek thinking-mode tool-calls guide [4], Moonshot thinking blog (reasoning_content access) [5], OpenAI chat completions reference (tool_choice) [6].
Gate reasoning_content injection to tool-call assistant messages only.
The tests verify reasoning_content exclusively on assistant messages with tool_calls (lines 65, 182 in the test file). Backend documentation for DeepSeek and Kimi K2.5 also indicates reasoning_content is most relevant in tool-calling contexts. Injecting into all assistant messages with thinking blocks risks rejection by stricter backends.
Proposed diff
if assistant_thinking:
asst_idx = 0
for fmt_msg in formatted:
if fmt_msg.get("role") == "assistant":
if (
+ fmt_msg.get("tool_calls")
+ and
asst_idx < len(assistant_thinking)
and assistant_thinking[asst_idx]
):
fmt_msg["reasoning_content"] = assistant_thinking[asst_idx]
asst_idx += 1📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Inject reasoning_content into formatted assistant messages. | |
| if assistant_thinking: | |
| asst_idx = 0 | |
| for fmt_msg in formatted: | |
| if fmt_msg.get("role") == "assistant": | |
| if ( | |
| asst_idx < len(assistant_thinking) | |
| and assistant_thinking[asst_idx] | |
| ): | |
| fmt_msg["reasoning_content"] = assistant_thinking[asst_idx] | |
| asst_idx += 1 | |
| # Inject reasoning_content into formatted assistant messages. | |
| if assistant_thinking: | |
| asst_idx = 0 | |
| for fmt_msg in formatted: | |
| if fmt_msg.get("role") == "assistant": | |
| if ( | |
| fmt_msg.get("tool_calls") | |
| and | |
| asst_idx < len(assistant_thinking) | |
| and assistant_thinking[asst_idx] | |
| ): | |
| fmt_msg["reasoning_content"] = assistant_thinking[asst_idx] | |
| asst_idx += 1 |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/copaw/agents/model_factory.py` around lines 100 - 110, The current loop
injects reasoning_content into every assistant formatted message when
assistant_thinking exists; change it to only inject into assistant messages that
are tool-calling by additionally checking for a tool-call indicator (e.g.,
fmt_msg.get("tool_calls") or fmt_msg.get("tool_call")) before setting
fmt_msg["reasoning_content"] so only tool-call assistant messages receive
reasoning_content; keep the existing assistant_thinking indexing
(assistant_thinking, formatted, fmt_msg, asst_idx) and the asst_idx increment
behavior unchanged.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,204 @@ | ||
| # -*- coding: utf-8 -*- | ||
| """Verify fix for #155: thinking blocks preserved as reasoning_content.""" | ||
| import asyncio | ||
| import json | ||
|
|
||
| from agentscope.message import ( | ||
| Msg, | ||
| ThinkingBlock, | ||
| TextBlock, | ||
| ToolUseBlock, | ||
| ToolResultBlock, | ||
| ) | ||
| from agentscope.model import OpenAIChatModel | ||
|
|
||
| from copaw.agents.model_factory import _create_formatter_instance | ||
|
|
||
|
|
||
| def test_reasoning_content_preserved(): | ||
| """Thinking blocks must appear as reasoning_content in formatted output.""" | ||
| formatter = _create_formatter_instance(OpenAIChatModel) | ||
|
|
||
| msgs = [ | ||
| Msg( | ||
| name="system", | ||
| role="system", | ||
| content="You are a helpful assistant.", | ||
| ), | ||
| Msg(name="user", role="user", content="What time is it now?"), | ||
| Msg( | ||
| name="assistant", | ||
| role="assistant", | ||
| content=[ | ||
| ThinkingBlock( | ||
| type="thinking", | ||
| thinking="I should call get_current_time to answer this.", | ||
| ), | ||
| TextBlock(type="text", text="Let me check."), | ||
| ToolUseBlock( | ||
| type="tool_use", | ||
| id="call_001", | ||
| name="get_current_time", | ||
| input={}, | ||
| ), | ||
| ], | ||
| ), | ||
| Msg( | ||
| name="system", | ||
| role="system", | ||
| content=[ | ||
| ToolResultBlock( | ||
| type="tool_result", | ||
| id="call_001", | ||
| name="get_current_time", | ||
| output="2026-03-01 14:30:00 CST", | ||
| ), | ||
| ], | ||
| ), | ||
| ] | ||
|
|
||
| formatted = asyncio.run(formatter.format(msgs)) | ||
|
|
||
| assistant_msgs = [ | ||
| m | ||
| for m in formatted | ||
| if m.get("role") == "assistant" and m.get("tool_calls") | ||
| ] | ||
| assert len(assistant_msgs) == 1, ( | ||
| f"Expected 1 assistant msg with tool_calls, " | ||
| f"got {len(assistant_msgs)}. " | ||
| f"All formatted: {json.dumps(formatted, ensure_ascii=False)}" | ||
| ) | ||
|
|
||
| msg = assistant_msgs[0] | ||
| reasoning = msg.get("reasoning_content") | ||
| assert reasoning, ( | ||
| f"reasoning_content missing! " | ||
| f"Formatted msg: {json.dumps(msg, ensure_ascii=False, indent=2)}" | ||
| ) | ||
| assert "get_current_time" in reasoning | ||
| print(f"✅ PASS — reasoning_content preserved: {reasoning}") | ||
|
|
||
|
|
||
| def test_no_reasoning_when_no_thinking(): | ||
| """Messages without thinking blocks should not get reasoning_content.""" | ||
| formatter = _create_formatter_instance(OpenAIChatModel) | ||
|
|
||
| msgs = [ | ||
| Msg(name="system", role="system", content="Hi"), | ||
| Msg(name="user", role="user", content="Hello"), | ||
| Msg( | ||
| name="assistant", | ||
| role="assistant", | ||
| content=[TextBlock(type="text", text="Hi there!")], | ||
| ), | ||
| ] | ||
|
|
||
| formatted = asyncio.run(formatter.format(msgs)) | ||
| assistant_msgs = [m for m in formatted if m.get("role") == "assistant"] | ||
| assert len(assistant_msgs) == 1 | ||
|
|
||
| msg = assistant_msgs[0] | ||
| assert ( | ||
| "reasoning_content" not in msg | ||
| ), ( | ||
| "reasoning_content should not be present " | ||
| "when there are no thinking blocks" | ||
| ) | ||
| print("✅ PASS — no reasoning_content when no thinking blocks") | ||
|
|
||
|
|
||
| def test_multiple_assistant_messages(): | ||
| """Each assistant message gets its own reasoning_content.""" | ||
| formatter = _create_formatter_instance(OpenAIChatModel) | ||
|
|
||
| msgs = [ | ||
| Msg(name="system", role="system", content="Hi"), | ||
| Msg(name="user", role="user", content="Do two things"), | ||
| Msg( | ||
| name="assistant", | ||
| role="assistant", | ||
| content=[ | ||
| ThinkingBlock( | ||
| type="thinking", | ||
| thinking="First task thinking", | ||
| ), | ||
| TextBlock(type="text", text="Doing first thing."), | ||
| ToolUseBlock( | ||
| type="tool_use", | ||
| id="c1", | ||
| name="tool_a", | ||
| input={}, | ||
| ), | ||
| ], | ||
| ), | ||
| Msg( | ||
| name="system", | ||
| role="system", | ||
| content=[ | ||
| ToolResultBlock( | ||
| type="tool_result", | ||
| id="c1", | ||
| name="tool_a", | ||
| output="done", | ||
| ), | ||
| ], | ||
|
Comment on lines
+138
to
+145
|
||
| ), | ||
| Msg( | ||
| name="assistant", | ||
| role="assistant", | ||
| content=[ | ||
| ThinkingBlock( | ||
| type="thinking", | ||
| thinking="Second task thinking", | ||
| ), | ||
| TextBlock(type="text", text="Doing second thing."), | ||
| ToolUseBlock( | ||
| type="tool_use", | ||
| id="c2", | ||
| name="tool_b", | ||
| input={}, | ||
| ), | ||
| ], | ||
| ), | ||
| Msg( | ||
| name="system", | ||
| role="system", | ||
| content=[ | ||
| ToolResultBlock( | ||
| type="tool_result", | ||
| id="c2", | ||
| name="tool_b", | ||
| output="done", | ||
| ), | ||
| ], | ||
| ), | ||
| ] | ||
|
|
||
| formatted = asyncio.run(formatter.format(msgs)) | ||
| assistant_msgs = [ | ||
| m | ||
| for m in formatted | ||
| if m.get("role") == "assistant" and m.get("tool_calls") | ||
| ] | ||
| assert ( | ||
| len(assistant_msgs) == 2 | ||
| ), f"Expected 2 assistant msgs, got {len(assistant_msgs)}" | ||
|
|
||
| assert ( | ||
| assistant_msgs[0].get("reasoning_content") == "First task thinking" | ||
| ) | ||
| assert ( | ||
| assistant_msgs[1].get("reasoning_content") == "Second task thinking" | ||
| ) | ||
| print( | ||
| "✅ PASS — multiple assistant messages " | ||
| "each get correct reasoning_content" | ||
| ) | ||
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| test_reasoning_content_preserved() | ||
| test_no_reasoning_when_no_thinking() | ||
| test_multiple_assistant_messages() | ||
| print("\nAll tests passed!") | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we apply this kind of fix to AgentScope library's formatter instead of CoPaw? @rayrayraykk @DavdGao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lailoo can you make this fix to AgentScope library instead? It will help to keep this code base focusing on the application not the model interaction layer.
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I'll move this fix to the AgentScope library instead. Will create a PR there and link it back here.
@ekzhu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! Closing this PR.