Skip to content

fix(formatter): preserve reasoning_content for thinking models (#155)#158

Closed
lailoo wants to merge 3 commits intoagentscope-ai:mainfrom
lailoo:fix/thinking-block-reasoning-content-155
Closed

fix(formatter): preserve reasoning_content for thinking models (#155)#158
lailoo wants to merge 3 commits intoagentscope-ai:mainfrom
lailoo:fix/thinking-block-reasoning-content-155

Conversation

@lailoo
Copy link
Copy Markdown

@lailoo lailoo commented Feb 28, 2026

Summary

  • Bug: Kimi K2.5 (and other thinking models) fail with "reasoning_content is missing in assistant tool call message" during multi-turn tool calling
  • Root cause: FileBlockSupportFormatter._format() in model_factory.py delegates to OpenAIChatFormatter which silently drops thinking blocks, losing reasoning_content from assistant messages
  • Fix: After base formatting, inject reasoning_content from thinking blocks into the corresponding formatted assistant messages

Fixes #155

Problem

When using Kimi K2.5 (which has thinking mode enabled by default):

  1. Model returns assistant message with reasoning_content + tool_calls
  2. CoPaw saves the response as ThinkingBlock + ToolUseBlock in memory
  3. On the next turn, the formatter drops the ThinkingBlock (logged as "Unsupported block type thinking")
  4. The API receives an assistant message without reasoning_content and rejects it

Before fix (Web UI on main branch):

image

Error log:

openai.BadRequestError: Error code: 400 - {'error': {'message': 'thinking is enabled but
reasoning_content is missing in assistant tool call message at index 6',
'type': 'invalid_request_error'}}

Full traceback:

react_agent.py:298 → reply() → super().reply()
_react_agent.py:437 → reply() → _reasoning()
_react_agent.py:563 → _reasoning() → self.model()
_openai_model.py:289 → __call__() → client.chat.completions.create()
→ openai.BadRequestError: 400 - thinking is enabled but reasoning_content
  is missing in assistant tool call message at index 6

Before fix (real API call reproduction):

Step 3: Send next turn WITHOUT reasoning_content → expect error
  ❌ FAIL — BUG CONFIRMED!
  Error: Error code: 400 - {'error': {'message': 'thinking is enabled but
  reasoning_content is missing in assistant tool call message at index 2',
  'type': 'invalid_request_error'}}

Changes

  • src/copaw/agents/model_factory.py — In FileBlockSupportFormatter._format(), collect thinking content from assistant messages before base formatting, then inject as reasoning_content into the formatted output. Pattern follows agentscope's existing DeepSeekChatFormatter.

After fix (Web UI on fix branch):

image

After fix (real API call verification):

Step 3: Format through CoPaw's formatter
  reasoning_content in formatted msg: YES

Step 4: Send formatted messages to Kimi K2.5 API
  ✅ PASS — Response: The current time is 2:30 PM (14:30) on March 1, 2026 (CST, UTC+0800).

Test plan

  • Reproduced on main with real Kimi K2.5 API call (400 error confirmed)
  • Reproduced on main via Web UI (same 400 error)
  • Verified fix with real Kimi K2.5 API call (multi-turn tool calling succeeds)
  • New test: test_reasoning_content_preserved — thinking blocks → reasoning_content in formatted output
  • New test: test_no_reasoning_when_no_thinking — no false positives
  • New test: test_multiple_assistant_messages — each assistant msg gets correct reasoning_content
  • All 3 regression tests pass

Effect on User Experience

Before: Using Kimi K2.5 (or any thinking model) with tools causes immediate failure on the second turn. Users cannot use thinking models with CoPaw at all.
After: Thinking models work correctly. reasoning_content is preserved across turns, enabling multi-step tool calling with models like Kimi K2.5, DeepSeek-R1, etc.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved handling of thinking blocks to ensure they are properly preserved and injected as reasoning content in assistant messages during message formatting.
  • Tests

    • Added test coverage to verify thinking blocks are correctly preserved as reasoning content in formatted output, including edge cases with multiple assistant messages and scenarios without thinking blocks.

…sages (agentscope-ai#155)

When thinking models like Kimi K2.5 return assistant messages with
reasoning_content + tool_calls, the formatter must preserve
reasoning_content in subsequent API calls. Previously, thinking blocks
were silently dropped, causing API error: "thinking is enabled but
reasoning_content is missing in assistant tool call message".
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

msgs = _sanitize_tool_messages(msgs)
return await super()._format(msgs)

# Collect thinking content per assistant msg (in order).
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we apply this kind of fix to AgentScope library's formatter instead of CoPaw? @rayrayraykk @DavdGao

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lailoo can you make this fix to AgentScope library instead? It will help to keep this code base focusing on the application not the model interaction layer.

Copy link
Copy Markdown
Author

@lailoo lailoo Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I'll move this fix to the AgentScope library instead. Will create a PR there and link it back here.
@ekzhu

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! Closing this PR.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes multi-turn tool-calling failures for “thinking” models (e.g., Kimi K2.5) by preserving assistant thinking content as reasoning_content in the formatted chat history, preventing upstream APIs from rejecting assistant tool-call messages that omit it.

Changes:

  • Extend FileBlockSupportFormatter._format() to extract thinking blocks from assistant messages and inject them into the formatted output as reasoning_content.
  • Add regression tests covering preservation, absence when no thinking exists, and correct mapping across multiple assistant messages.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/copaw/agents/model_factory.py Preserves thinking content by injecting reasoning_content into formatted assistant messages after base formatting.
tests/test_thinking_block_fix.py Adds regression tests validating reasoning_content preservation behavior across key scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +121 to +123
content=[
ToolResultBlock(type="tool_result", id="c1", name="tool_a", output="done"),
],
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several lines in this test file exceed the repo’s max line length (79) and will fail flake8/black (e.g., the one-line ToolResultBlock call here). Please run black and/or wrap these long argument lists across multiple lines so E501 doesn’t block CI/pre-commit.

Copilot uses AI. Check for mistakes.
formatted = await super()._format(msgs)

# Inject reasoning_content into formatted assistant messages.
if assistant_thinking:
Copy link

Copilot AI Mar 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The injection loop is guarded by if assistant_thinking:, but assistant_thinking is populated for every assistant message even when none contain thinking blocks (empty strings). This causes an unnecessary second pass over formatted on most calls; consider guarding with if any(assistant_thinking): (or tracking a has_thinking flag) so the injection work is skipped when there’s nothing to inject.

Suggested change
if assistant_thinking:
if any(assistant_thinking):

Copilot uses AI. Check for mistakes.
- Run black formatter on model_factory.py and test_thinking_block_fix.py
- Fix E501 line length violations (max 79 characters)
- Split long lines in test assertions and function calls
@qodo-code-review
Copy link
Copy Markdown

Review Summary by Qodo

Preserve reasoning_content for thinking models in formatter

🐞 Bug fix

Grey Divider

Walkthroughs

Description
• Preserve reasoning_content from thinking blocks in formatted assistant messages
• Prevents API rejection for thinking models like Kimi K2.5 during multi-turn tool calling
• Collects thinking content before base formatting, injects into formatted output
• Add comprehensive tests for thinking block preservation across multiple scenarios
Diagram
flowchart LR
  A["Assistant Message<br/>with ThinkingBlock"] -->|"_format()"| B["Collect thinking<br/>content"]
  B -->|"base format"| C["Formatted message"]
  C -->|"inject"| D["Message with<br/>reasoning_content"]
  D -->|"API call"| E["Kimi K2.5<br/>accepts request"]
Loading

Grey Divider

File Changes

1. src/copaw/agents/model_factory.py 🐞 Bug fix +35/-6

Inject reasoning_content from thinking blocks

• Enhanced FileBlockSupportFormatter._format() to preserve thinking blocks as reasoning_content
• Collects thinking content from assistant messages before base formatting
• Injects collected thinking content into formatted assistant messages with matching indices
• Fixed line length violations for code style compliance

src/copaw/agents/model_factory.py


2. tests/test_thinking_block_fix.py 🧪 Tests +204/-0

Add tests for thinking block preservation

• New test file with three comprehensive test cases for thinking block preservation
• test_reasoning_content_preserved(): Verifies thinking blocks appear as reasoning_content in
 formatted output
• test_no_reasoning_when_no_thinking(): Ensures no false positives when thinking blocks absent
• test_multiple_assistant_messages(): Validates correct reasoning_content injection for multiple
 assistant messages

tests/test_thinking_block_fix.py


Grey Divider

Qodo Logo

@qodo-code-review
Copy link
Copy Markdown

qodo-code-review Bot commented Mar 3, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Token undercount vs reasoning 🐞 Bug ⛯ Reliability
Description
The formatter now injects reasoning_content into outgoing assistant messages, but the token
counting logic used for /history and auto memory compaction only counts content. This will
under-estimate prompt size for thinking models and can delay compaction until after requests start
failing due to context limits.
Code

src/copaw/agents/model_factory.py[R100-110]

+            # Inject reasoning_content into formatted assistant messages.
+            if assistant_thinking:
+                asst_idx = 0
+                for fmt_msg in formatted:
+                    if fmt_msg.get("role") == "assistant":
+                        if (
+                            asst_idx < len(assistant_thinking)
+                            and assistant_thinking[asst_idx]
+                        ):
+                            fmt_msg["reasoning_content"] = assistant_thinking[asst_idx]
+                        asst_idx += 1
Evidence
FileBlockSupportFormatter._format() adds reasoning_content to formatted assistant messages, but
_extract_text_from_messages() (used by safe_count_message_tokens) ignores reasoning_content,
so memory compaction and /history token estimates systematically miss these tokens.

src/copaw/agents/model_factory.py[75-112]
src/copaw/agents/utils/token_counting.py[58-86]
src/copaw/agents/hooks/memory_compaction.py[161-169]
src/copaw/agents/command_handler.py[245-251]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
The formatter now injects `reasoning_content` into formatted assistant messages for thinking models. However, the token counting logic used for auto memory compaction and `/history` only extracts tokens from `content`, ignoring `reasoning_content`, causing systematic undercounting and increasing the risk of exceeding model context limits before compaction triggers.
## Issue Context
- `FileBlockSupportFormatter._format()` adds `reasoning_content` to formatted assistant messages.
- `safe_count_message_tokens()` relies on `_extract_text_from_messages()` which currently concatenates only `content` fields.
- Memory compaction and `/history` both use this token estimate.
## Fix Focus Areas
- src/copaw/agents/utils/token_counting.py[58-86]
- src/copaw/agents/hooks/memory_compaction.py[161-169]
- src/copaw/agents/command_handler.py[245-251]
- src/copaw/agents/model_factory.py[75-112]
## Suggested change
- In `_extract_text_from_messages()`, append `msg.get(&amp;quot;reasoning_content&amp;quot;)` (when present) into `parts` similarly to `content`.
- (Optional but recommended) Consider also incorporating text from tool call arguments if present in the formatted schema, to further reduce undercounting for tool-heavy conversations.
- Add/extend a unit test to ensure token counting increases when `reasoning_content` is present in formatted messages.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ The new review experience is currently in Beta. Learn more

Grey Divider

Qodo Logo

Comment on lines +100 to +110
# Inject reasoning_content into formatted assistant messages.
if assistant_thinking:
asst_idx = 0
for fmt_msg in formatted:
if fmt_msg.get("role") == "assistant":
if (
asst_idx < len(assistant_thinking)
and assistant_thinking[asst_idx]
):
fmt_msg["reasoning_content"] = assistant_thinking[asst_idx]
asst_idx += 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Token undercount vs reasoning 🐞 Bug ⛯ Reliability

The formatter now injects reasoning_content into outgoing assistant messages, but the token
counting logic used for /history and auto memory compaction only counts content. This will
under-estimate prompt size for thinking models and can delay compaction until after requests start
failing due to context limits.
Agent Prompt
## Issue description
The formatter now injects `reasoning_content` into formatted assistant messages for thinking models. However, the token counting logic used for auto memory compaction and `/history` only extracts tokens from `content`, ignoring `reasoning_content`, causing systematic undercounting and increasing the risk of exceeding model context limits before compaction triggers.

## Issue Context
- `FileBlockSupportFormatter._format()` adds `reasoning_content` to formatted assistant messages.
- `safe_count_message_tokens()` relies on `_extract_text_from_messages()` which currently concatenates only `content` fields.
- Memory compaction and `/history` both use this token estimate.

## Fix Focus Areas
- src/copaw/agents/utils/token_counting.py[58-86]
- src/copaw/agents/hooks/memory_compaction.py[161-169]
- src/copaw/agents/command_handler.py[245-251]
- src/copaw/agents/model_factory.py[75-112]

## Suggested change
- In `_extract_text_from_messages()`, append `msg.get("reasoning_content")` (when present) into `parts` similarly to `content`.
- (Optional but recommended) Consider also incorporating text from tool call arguments if present in the formatted schema, to further reduce undercounting for tool-heavy conversations.
- Add/extend a unit test to ensure token counting increases when `reasoning_content` is present in formatted messages.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

- Preserve reasoning_content injection logic from PR
- Add _strip_top_level_message_name call from main branch
- Both changes are complementary and work together
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 3, 2026

📝 Walkthrough

Walkthrough

The PR fixes a bug where thinking blocks from assistant messages weren't being preserved as reasoning_content in formatted output, which caused errors with models that have thinking mode enabled (such as Kimi 2.5). The formatter now collects thinking blocks and injects them into corresponding assistant messages after base formatting.

Changes

Cohort / File(s) Summary
Thinking Block Preservation
src/copaw/agents/model_factory.py
Enhanced FileBlockSupportFormatter._format to collect thinking blocks from assistant messages, perform base formatting, and inject reasoning_content back into assistant messages. Also includes a minor string concatenation change in a warning message.
Test Suite
tests/test_thinking_block_fix.py
New test module verifying thinking blocks are preserved as reasoning_content. Covers three scenarios: reasoning_content present with tool calls, absent without thinking blocks, and individual reasoning_content for multiple assistant messages.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A thinking block lost in the void,
Now preserved with joy! No more annoyed,
Reasoning content flows, the formatter's right,
Assistant messages shine with borrowed light. ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely identifies the core fix: preserving reasoning_content for thinking models, which directly addresses the main bug described in the changeset.
Linked Issues check ✅ Passed The PR implementation successfully addresses issue #155 by fixing FileBlockSupportFormatter to preserve reasoning_content from thinking blocks, enabling multi-turn tool calling with Kimi2.5 and similar thinking models without 400 API errors.
Out of Scope Changes check ✅ Passed All changes are scoped to the formatter fix and corresponding test coverage; no unrelated modifications detected beyond the core objective of preserving reasoning_content.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/test_thinking_block_fix.py (1)

200-204: Optional: remove manual __main__ test runner in pytest module.

For test modules, keeping execution through pytest only is usually cleaner and avoids duplicate/manual execution paths.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_thinking_block_fix.py` around lines 200 - 204, Remove the manual
test runner block that calls test_reasoning_content_preserved(),
test_no_reasoning_when_no_thinking(), test_multiple_assistant_messages() under
if __name__ == "__main__": so the module runs only under pytest; delete that
conditional and its print("\nAll tests passed!") line and rely on pytest
discovery to execute the tests instead.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/copaw/agents/model_factory.py`:
- Around line 100-110: The current loop injects reasoning_content into every
assistant formatted message when assistant_thinking exists; change it to only
inject into assistant messages that are tool-calling by additionally checking
for a tool-call indicator (e.g., fmt_msg.get("tool_calls") or
fmt_msg.get("tool_call")) before setting fmt_msg["reasoning_content"] so only
tool-call assistant messages receive reasoning_content; keep the existing
assistant_thinking indexing (assistant_thinking, formatted, fmt_msg, asst_idx)
and the asst_idx increment behavior unchanged.

---

Nitpick comments:
In `@tests/test_thinking_block_fix.py`:
- Around line 200-204: Remove the manual test runner block that calls
test_reasoning_content_preserved(), test_no_reasoning_when_no_thinking(),
test_multiple_assistant_messages() under if __name__ == "__main__": so the
module runs only under pytest; delete that conditional and its print("\nAll
tests passed!") line and rely on pytest discovery to execute the tests instead.

ℹ️ Review info

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 42cd006 and b36d98c.

📒 Files selected for processing (2)
  • src/copaw/agents/model_factory.py
  • tests/test_thinking_block_fix.py

Comment on lines +100 to +110
# Inject reasoning_content into formatted assistant messages.
if assistant_thinking:
asst_idx = 0
for fmt_msg in formatted:
if fmt_msg.get("role") == "assistant":
if (
asst_idx < len(assistant_thinking)
and assistant_thinking[asst_idx]
):
fmt_msg["reasoning_content"] = assistant_thinking[asst_idx]
asst_idx += 1
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

OpenAI API reasoning_content field assistant messages tool_calls specification

💡 Result:

Is there a reasoning_content field on assistant messages?

In the current OpenAI API docs, no: the chat/assistant message schema does not define a reasoning_content field. For reasoning models, “reasoning” is represented separately (in the Responses API) as reasoning items, not as an assistant-message field. [2]

Chat Completions: assistant message + tool_calls shape

For Chat Completions, an assistant message is essentially:

  • role: "assistant"
  • content: required unless you provide tool_calls (or the deprecated function_call) [1]
  • tool_calls: array of tool calls generated by the model [1]

tool_calls items (function tool call) include:

  • id (tool call id)
  • type (currently function)
  • function: { name, arguments } where arguments is a JSON string [1]

The docs also describe a “custom tool call” shape with:

  • id, type: "custom", and custom: { name, input } [1]

Responses API: where “reasoning content” lives now

In the Responses API, reasoning is carried as a distinct reasoning object/item (not an assistant message field). It includes:

  • id
  • summary[] (summary text parts)
  • optional content[] containing reasoning_text
  • optional encrypted_content (when requested via include) [2][3]

The docs also note you should include these reasoning items back in input for later turns if you are manually managing context. [2]

Sources: Chat Completions message/tool_calls schema [1]; Responses API reasoning object [2]; streaming reasoning_text / encrypted_content details [3].


🏁 Script executed:

# Search for reasoning_content usage in the codebase
rg "reasoning_content" -A 3 -B 3

Repository: agentscope-ai/CoPaw

Length of output: 8281


🏁 Script executed:

# Search for assistant_thinking usage to understand the context
rg "assistant_thinking" -A 2 -B 2

Repository: agentscope-ai/CoPaw

Length of output: 1791


🏁 Script executed:

# Look at the broader context of the model_factory.py file
wc -l src/copaw/agents/model_factory.py

Repository: agentscope-ai/CoPaw

Length of output: 100


🏁 Script executed:

# Read the relevant section of the file for full context
sed -n '90,120p' src/copaw/agents/model_factory.py

Repository: agentscope-ai/CoPaw

Length of output: 1320


🏁 Script executed:

# Search for Kimi or DeepSeek references and backend-specific handling
rg -i "kimi|deepseek" -B 3 -A 3

Repository: agentscope-ai/CoPaw

Length of output: 2114


🏁 Script executed:

# Search for issue `#155` or related discussion in code/tests
rg "#155" -B 3 -A 3

Repository: agentscope-ai/CoPaw

Length of output: 354


🏁 Script executed:

# Check the test file to understand expected behavior better
cat -n tests/test_thinking_block_fix.py

Repository: agentscope-ai/CoPaw

Length of output: 7447


🏁 Script executed:

# Search for tool_calls usage in model_factory to understand the flow
rg "tool_calls" -B 2 -A 2 src/copaw/agents/model_factory.py

Repository: agentscope-ai/CoPaw

Length of output: 45


🌐 Web query:

Kimi K2.5 DeepSeek reasoning_content field OpenAI-compatible API tool_calls

💡 Result:

Kimi K2.5 (Moonshot) — reasoning_content + OpenAI-compatible tool_calls

  • reasoning_content output: In thinking mode, Kimi K2.5 returns a separate reasoning_content alongside content (answer). The K2.5 repo’s example reads it as response.choices[0].message.reasoning_content. You can disable thinking (instant mode) via extra_body (e.g., {"thinking": {"type":"disabled"}} for the official API, or {"chat_template_kwargs": {"thinking": False}} for vLLM/SGLang). [2]
  • OpenAI SDK compatibility note: Some Moonshot “thinking” docs note the OpenAI SDK types may not expose reasoning_content directly, so you may need to access it defensively (e.g., hasattr/getattr) in streaming. [5]
  • Tool calling: Moonshot recommends tool_calls (not legacy function_call), supports parallel tool calls, and stresses that every returned tool_call.id must be matched by a later role="tool" message with the correct tool_call_id—and you must append the assistant message that contained tool_calls into history or you can get tool_call_id not found. [1]

DeepSeek — reasoning_content behavior + tool calls

  • reasoning_content output: DeepSeek’s reasoning model returns both reasoning_content (CoT) and content (final answer) as separate fields. [3]
  • Whether to send reasoning_content back:
    • DeepSeek’s reasoning model guide says: do not feed reasoning_content back into later messages (it can error). [3]
    • DeepSeek’s newer thinking mode + tool calls guide says: during a single question’s multi-step “thinking + tool calls” loop, you must send reasoning_content back so the model can continue; when you start the next user question, remove prior reasoning_content (it will be ignored / should not be kept). [4]

OpenAI-compatible tool_calls shape (reference)

  • In OpenAI Chat Completions, tool calling is controlled via tool_choice (auto/none/required or forcing a specific tool). This is the interface most “OpenAI-compatible” providers try to mirror. [6]

Sources: Moonshot tool-calls guide [1], Kimi K2.5 repo example [2], DeepSeek reasoning model docs [3], DeepSeek thinking-mode tool-calls guide [4], Moonshot thinking blog (reasoning_content access) [5], OpenAI chat completions reference (tool_choice) [6].


Gate reasoning_content injection to tool-call assistant messages only.

The tests verify reasoning_content exclusively on assistant messages with tool_calls (lines 65, 182 in the test file). Backend documentation for DeepSeek and Kimi K2.5 also indicates reasoning_content is most relevant in tool-calling contexts. Injecting into all assistant messages with thinking blocks risks rejection by stricter backends.

Proposed diff
             if assistant_thinking:
                 asst_idx = 0
                 for fmt_msg in formatted:
                     if fmt_msg.get("role") == "assistant":
                         if (
+                            fmt_msg.get("tool_calls")
+                            and
                             asst_idx < len(assistant_thinking)
                             and assistant_thinking[asst_idx]
                         ):
                             fmt_msg["reasoning_content"] = assistant_thinking[asst_idx]
                         asst_idx += 1
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Inject reasoning_content into formatted assistant messages.
if assistant_thinking:
asst_idx = 0
for fmt_msg in formatted:
if fmt_msg.get("role") == "assistant":
if (
asst_idx < len(assistant_thinking)
and assistant_thinking[asst_idx]
):
fmt_msg["reasoning_content"] = assistant_thinking[asst_idx]
asst_idx += 1
# Inject reasoning_content into formatted assistant messages.
if assistant_thinking:
asst_idx = 0
for fmt_msg in formatted:
if fmt_msg.get("role") == "assistant":
if (
fmt_msg.get("tool_calls")
and
asst_idx < len(assistant_thinking)
and assistant_thinking[asst_idx]
):
fmt_msg["reasoning_content"] = assistant_thinking[asst_idx]
asst_idx += 1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/copaw/agents/model_factory.py` around lines 100 - 110, The current loop
injects reasoning_content into every assistant formatted message when
assistant_thinking exists; change it to only inject into assistant messages that
are tool-calling by additionally checking for a tool-call indicator (e.g.,
fmt_msg.get("tool_calls") or fmt_msg.get("tool_call")) before setting
fmt_msg["reasoning_content"] so only tool-call assistant messages receive
reasoning_content; keep the existing assistant_thinking indexing
(assistant_thinking, formatted, fmt_msg, asst_idx) and the asst_idx increment
behavior unchanged.

@ekzhu
Copy link
Copy Markdown
Collaborator

ekzhu commented Mar 4, 2026

Closing this PR in favor of a fix directly to agentscope library.

@ekzhu ekzhu closed this Mar 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]:用kimi2.5模型,使用过程中一直报错

4 participants