fix(agent): preserve streamed reasoning content for tool replay by tompro · Pull Request #5606 · zeroclaw-labs/zeroclaw

tompro · 2026-04-10T15:19:14Z

Summary

Base branch target (master for all contributions): master
Problem: streamed tool-calling turns rebuilt a synthetic ChatResponse with reasoning_content: None, so Kimi-compatible providers rejected the replayed assistant tool-call message.
Why it matters: Kimi returns 400 when thinking is enabled but reasoning_content is missing on assistant tool-call history, which blocks the entire tool-use workflow in issue [Bug]: Use kimi-code provider in streaming chat call tools, provider API reports an error #5600.
What changed: preserve streamed reasoning deltas in Agent::turn_streamed and add a regression test that verifies assistant tool-call replay includes reasoning_content.
What did not change (scope boundary): no provider API contracts, tool-dispatch serialization logic, or non-streaming tool-call behavior were changed.

Label Snapshot (required)

Risk label (risk: low|medium|high): risk: high
Size label (size: XS|S|M|L|XL, auto-managed/read-only): size: S (expected)
Scope labels (core|agent|channel|config|cron|daemon|doctor|gateway|health|heartbeat|integration|memory|observability|onboard|provider|runtime|security|service|skillforge|skills|tool|tunnel|docs|dependencies|ci|tests|scripts|dev, comma-separated): agent
Module labels (<module>: <component>, for example channel: telegram, provider: kimi, tool: shell): N.A.
Contributor tier label (trusted contributor|experienced contributor|principal contributor|distinguished contributor, auto-managed/read-only; author merged PRs >=5/10/20/50): auto-managed
If any auto-label is incorrect, note requested correction: None

Change Metadata

Change type (bug|feature|refactor|docs|security|chore): bug
Primary scope (runtime|provider|channel|memory|security|ci|docs|multi): runtime

Linked Issue

Closes [Bug]: Use kimi-code provider in streaming chat call tools, provider API reports an error #5600
Related [Feature]: Evaluate multi-chunk reasoning replay fidelity for streamed tool turns #5840
Depends on # (if stacked)
Supersedes # (if replacing older PR)

Supersede Attribution (required when `Supersedes #` is used)

Superseded PRs + authors (#<pr> by @<author>, one per line): N.A.
Integrated scope by source PR (what was materially carried forward): N.A.
Co-authored-by trailers added for materially incorporated contributors? (Yes/No): No
If No, explain why (for example: inspiration-only, no direct code/design carry-over): No superseded PR.
Trailer format check (separate lines, no escaped \n): (Pass/Fail): Pass

Validation Evidence (required)

Commands and result summary:

cargo fmt --all -- --check
cargo clippy --all-targets -- -D warnings
cargo test

Evidence provided (test/log/trace/screenshot/perf): All commands passed locally. Also verified targeted regression test turn_streamed_preserves_reasoning_content_for_tool_call_replay and the user confirmed the local Kimi workflow succeeds with this patch.
If any command is intentionally skipped, explain why: None.

Security Impact (required)

New permissions/capabilities? (Yes/No): No
New external network calls? (Yes/No): No
Secrets/tokens handling changed? (Yes/No): No
File system access scope changed? (Yes/No): No
If any Yes, describe risk and mitigation: N.A.

Privacy and Data Hygiene (required)

Data-hygiene status (pass|needs-follow-up): pass
Redaction/anonymization notes: No user data, secrets, or identity-like fixtures added.
Neutral wording confirmation (use ZeroClaw/project-native labels if identity-like wording is needed): Confirmed.

Compatibility / Migration

Backward compatible? (Yes/No): Yes
Config/env changes? (Yes/No): No
Migration needed? (Yes/No): No
If yes, exact upgrade steps: N.A.

i18n Follow-Through (required when docs or user-facing wording changes)

i18n follow-through triggered? (Yes/No): No
If Yes, locale navigation parity updated in README*, docs/README*, and docs/SUMMARY.md for supported locales (en, zh-CN, ja, ru, fr, vi)? (Yes/No): N.A.
If Yes, localized runtime-contract docs updated where equivalents exist (minimum for fr/vi: commands-reference, config-reference, troubleshooting)? (Yes/No/N.A.): N.A.
If Yes, Vietnamese canonical docs under docs/i18n/vi/** synced and compatibility shims under docs/*.vi.md validated? (Yes/No/N.A.): N.A.
If any No/N.A., link follow-up issue/PR and explain scope decision: Rust-only bugfix; no user-facing docs or strings changed.

Human Verification (required)

What was personally validated beyond CI:

Verified scenarios: streamed reasoning deltas are preserved into assistant tool-call replay history via the new regression test; full local fmt/clippy/test suite passed; user validated the local Kimi tool-call workflow after applying the patch.
Edge cases checked: empty streamed reasoning still omits reasoning_content, existing non-streaming and dispatcher serialization behavior remain unchanged.
What was not verified: A live Kimi API call was not executed directly from this environment.

Side Effects / Blast Radius (required)

Affected subsystems/workflows: crates/zeroclaw-runtime/src/agent streamed tool-call loop, especially providers that require reasoning parity on replayed assistant messages.`
Potential unintended effects: streaming providers now retain reasoning text in the synthetic response, which could expose provider replay assumptions if a backend emits malformed reasoning fragments.
Guardrails/monitoring for early detection: regression test coverage plus CI Required Gate; failures should surface as replay/tool-call test regressions or provider 400s in runtime logs.

Agent Collaboration Notes (recommended)

Agent tools used (if any): Read, Bash, LSP diagnostics, Oracle-style read-only review.
Workflow/plan summary (if any): Traced the streamed tool-call path, confirmed dispatcher serialization already handled reasoning_content, patched the streamed response assembly, rebased the fix into the workspace layout, and added a focused regression test.
Verification focus: reasoning_content preservation for replayed assistant tool-call history plus full repo Rust validation.
Confirmation: naming + architecture boundaries followed (AGENTS.md + CONTRIBUTING.md): Yes

Rollback Plan (required)

Fast rollback command/path: git revert 4eb27a12 or revert this PR before release.`
Feature flags or config toggles (if any): None.
Observable failure symptoms: streamed tool-call turns regress, replay payload assertions fail, or provider-specific tool-call workflows start returning 400 errors again.

Risks and Mitigations

Risk: Providers may emit streamed reasoning in chunk patterns that concatenate into unexpected text.
- Mitigation: The change only preserves already-emitted reasoning data, keeps the field absent when empty, is covered by a regression test that exercises the replay path implicated in #5600, and follow-up issue #5840 tracks whether multi-chunk reasoning replay needs normalization.

singlerider · 2026-04-11T22:07:10Z

@tompro See #5559's summary for a guide on how to resolve the recent merge conflicts.

theonlyhennygod · 2026-04-12T17:55:14Z

CI Failure Analysis — #5606

Comprehension summary: This PR fixes a bug where streamed tool-calling turns rebuilt a synthetic ChatResponse with reasoning_content: None, causing Kimi-compatible providers to reject the replayed assistant tool-call message with a 400 error. The fix preserves streamed reasoning deltas in Agent::turn_streamed and adds a regression test. Blast radius is limited to src/agent/agent.rs — specifically the streamed tool-call response assembly path.

Why CI is failing

The Security Audit check is failing due to pre-existing wasmtime crate advisories on master (RUSTSEC-2026-0088, 0089, 0092, 0095, 0096). These are inherited from the Tauri desktop app dependency and are not introduced by this PR. The cascade is:

Security Audit fails (wasmtime advisories)
Security Required Gate fails (depends on Security Audit)
CI Required Gate (Quality Gate workflow) fails (depends on Security Required Gate)

The CI Required Gate in the CI workflow passes. All other checks (Lint, Test, Build on all platforms, Check 32-bit, Strict Delta Lint, Docs Quality, Verify Benchmarks Compile) pass.

What to fix

Nothing on the contributor's side. This PR's code compiles, passes all tests, passes lint, and builds on all targets. The Security Audit failure is a repo-wide pre-existing issue that affects all open PRs equally.

The repository maintainers need to either:

Update or pin wasmtime to a patched version
Add the advisories to an audit.toml ignore list if they are acknowledged/accepted

This PR is not blocked by anything it introduced.

theonlyhennygod · 2026-04-12T19:42:16Z

Agent Review — Ready to Merge

Comprehension summary: This PR fixes a bug where streamed tool-calling turns rebuilt a synthetic ChatResponse with reasoning_content: None, causing Kimi-compatible providers to reject the replayed assistant tool-call message with HTTP 400 (Kimi requires reasoning_content on assistant messages when thinking mode is enabled). The fix accumulates streamed reasoning deltas into a streamed_reasoning String during turn_streamed and populates reasoning_content in the synthetic response when non-empty. A focused regression test verifies the assistant tool-call replay includes reasoning_content. Blast radius: src/agent/agent.rs streamed tool-call loop only; no impact on non-streaming paths or providers that don't require reasoning replay.

Thank you, @tompro. The fix is minimal, correct, and well-targeted.

What was reviewed and verified:

Code correctness: The fix adds streamed_reasoning accumulation alongside the existing streamed_text pattern. The conditional (!streamed_reasoning.is_empty()).then_some(streamed_reasoning) correctly keeps reasoning_content as None when no reasoning was streamed, preserving existing behavior for non-reasoning providers.
Regression analysis: Non-streaming path is unchanged. Providers that don't emit reasoning events continue to get reasoning_content: None. The test explicitly verifies the replay payload structure.
Test coverage: New turn_streamed_preserves_reasoning_content_for_tool_call_replay test with a StreamingReasoningProvider mock that emits reasoning + tool call events. Test verifies assistant payload contains reasoning_content and tool_calls.
Privacy/data hygiene: Pass — no PII, test data uses system-scoped content.
Architecture alignment: Follows existing streaming event handling patterns.

Security/performance assessment:

Security: No security impact.
Performance: Negligible — one additional String allocation for reasoning accumulation during streamed tool-call turns.

CI Status: The CI Required Gate failure on Quality Gate is from the known wasmtime RUSTSEC-2026-04-09 Security Audit issue (not this PR). The CI workflow's CI Required Gate passes. All lint, test, and build checks pass.

Template completeness: Fully completed with all required sections properly filled. Exemplary PR discipline.

This PR is ready for maintainer merge.

Field	Content
PR	#5606 — fix(agent): preserve streamed reasoning content for tool replay
Author	@tompro
Summary	Accumulates streamed reasoning deltas for Kimi-compatible tool-call replay
Action	Ready to merge
Reason	Correct fix, focused scope, all tests pass, no outstanding findings
Security/performance	No security impact; negligible performance cost
Changes requested	None
Architectural notes	Follows existing streaming accumulation pattern; clean symmetry with `streamed_text`
Tests	Full suite passes; new regression test covers the exact failure mode
Notes	CI Required Gate failure on Quality Gate is known infra issue (wasmtime RUSTSEC), not this PR

theonlyhennygod · 2026-04-12T19:47:50Z

This PR has merge conflicts with master. Could you rebase to resolve them? Once conflicts are cleared, this is agent-approved and ready to merge.

singlerider

Agent Review — Routing: Needs Maintainer Review

@WareWolf-MoonWall @JordanTheJet — this PR modifies crates/zeroclaw-runtime/src/agent/agent.rs, a high-risk path requiring maintainer sign-off.

DRY note: @theonlyhennygod already fully reviewed and approved this PR ("Ready to Merge" verdict). No new code findings. This comment exists solely to ensure maintainer eyes land on the runtime path changes before merge.

What the change does: Fixes a bug where streamed tool-calling turns rebuilt a synthetic ChatResponse without preserving the streamed reasoning content, causing reasoning to be dropped during tool replay. The fix carries reasoning_content through the synthetic response assembly. 118 lines added, 1 deleted.

CI: all checks passing.

WareWolf-MoonWall

PR Review — #5606 `fix(agent): preserve streamed reasoning content for tool replay`

I've read the full diff, the linked issue (#5600), all prior review threads, and the relevant foundations.

What this change does

During turn_streamed, the agent accumulates streaming events into a synthetic ChatResponse before handing it to the tool dispatcher. Before this fix, that synthetic response hardcoded reasoning_content: None, so when Kimi-compatible providers received the replayed assistant tool-call message in the next turn, they returned HTTP 400: "thinking is enabled but reasoning_content is missing in assistant tool call message." The fix adds a streamed_reasoning accumulator alongside the existing streamed_text one, collects reasoning deltas as they arrive, and populates reasoning_content in the synthetic response using (!streamed_reasoning.is_empty()).then_some(streamed_reasoning) — preserving None for providers that don't emit reasoning events.

The fix is minimal, precisely targeted, and does not touch non-streaming paths, tool dispatch serialization, or any provider API contract.

The merge conflict flag raised by @theonlyhennygod has been resolved — the author has kept the branch current with three master merges, most recently April 16. The branch is clean and mergeable.

✅ Commendation

Two things done well:

(!streamed_reasoning.is_empty()).then_some(streamed_reasoning) is the right idiom here. It keeps reasoning_content absent — not an empty string — when no reasoning was streamed. An empty string and None are semantically different to a provider that validates this field: None means "not a thinking-enabled turn," Some("") could cause a different class of validation failure. The choice to use then_some rather than Some(streamed_reasoning) unconditionally shows the author thought about the provider contract, not just the Rust type.

The regression test constructs a StreamingReasoningProvider mock that correctly exercises the full two-turn path: stream_chat call 0 emits reasoning + tool call → tool executes → stream_chat call 1 returns empty → falls back to chat → seen_requests captures the message history. Asserting on the replayed assistant payload (the chat call) rather than on the intermediate streamed state means the test verifies what the provider actually receives, which is exactly what was broken in #5600.

🔴 Blocking — risk label mismatch

The PR body declares risk: medium. The applied label is risk: high. As with other PRs touching crates/zeroclaw-runtime/src/**, the high label is correct per AGENTS.md — that path is explicitly listed as high-risk. The label snapshot in the PR body needs to reflect risk: high so the audit trail is consistent. This is a one-line update.

🟡 Conditional — multi-chunk reasoning accumulation edge case needs a follow-up

The streamed_reasoning accumulator concatenates all reasoning deltas with push_str. For the single-chunk case in the regression test this is correct. The question worth tracking is what happens when a provider streams reasoning across many small chunks with leading/trailing whitespace or structural delimiters — the concatenation is lossless at the byte level, but some providers structure reasoning output with newlines or markers between "thinking steps" that may be meaningful for replay fidelity.

This isn't a reason to block the PR — the behavior is strictly better than None in every case, and the regression test covers the critical path from #5600. But I'd like to see a follow-up issue filed to track whether reasoning replay fidelity across multi-chunk reasoning streams needs any normalization, particularly as more providers adopt thinking modes. The current PR description notes this in the risks section; making it a tracked issue with an owner closes the loop on the conditional acceptance.

To @tompro

The fix is correct, the test is well-structured, and the PR discipline is strong — the template is one of the most completely filled I've seen. Two items before merge: update the risk label in the PR body from medium to high, and file a follow-up issue for the multi-chunk reasoning accumulation question (brief, just needs to capture the question and assign an owner). The blocking item is a one-line change; the conditional can be done concurrently.

Thank you for keeping the branch current through the merge conflicts. Kimi tool-use being entirely blocked (S1 severity on #5600) makes this worth getting across the line cleanly.

tompro · 2026-04-17T14:51:39Z

@WareWolf-MoonWall added the follow up here #5840 and changed the risk label accordingly.

WareWolf-MoonWall

Re-review following rebase. All 20 CI checks green, including the new Quality Gate. Previous blockers (merge conflicts, wasmtime advisory CI failures) are fully resolved.

✅ The fix

Three production lines, perfectly symmetric with the existing streamed_text accumulation pattern. streamed_reasoning follows the same lifecycle: initialized empty, accumulated from deltas, conditionally present in the synthetic response. The (!streamed_reasoning.is_empty()).then_some(streamed_reasoning) idiom is idiomatic and correct — reasoning_content stays None for providers that don't emit reasoning, preserving all existing behaviour exactly.

✅ The test

The StreamingReasoningProvider mock is well-constructed — it emits a reasoning delta followed by a tool call event, which is exactly the sequence that triggered the Kimi 400. The assertion on the serialized assistant payload directly validates what the provider receives on replay. This is the right kind of regression test: it covers the specific failure path, it will catch any future regression immediately, and it documents the contract in executable form.

✅ Follow-up discipline

Filing #5840 for multi-chunk reasoning replay normalization before this merged was the right call — it keeps this PR focused on the confirmed regression and defers the broader question to a tracked, owned issue.

Thank you @tompro. Clean fix, strong test, good follow-up hygiene.

tompro requested review from JordanTheJet and theonlyhennygod as code owners April 10, 2026 15:19

github-project-automation bot added this to ZeroClaw Project Board Apr 10, 2026

github-project-automation bot moved this to Backlog in ZeroClaw Project Board Apr 10, 2026

github-actions bot added the agent Auto scope: src/agent/** changed. label Apr 10, 2026

theonlyhennygod self-assigned this Apr 12, 2026

theonlyhennygod added the agent-approved PR approved by automated review agent label Apr 12, 2026

fix(agent): preserve streamed reasoning content for tool replay

4eb27a1

tompro force-pushed the fix/kimi-streamed-reasoning-replay branch from 2a40818 to 4eb27a1 Compare April 13, 2026 06:56

github-actions bot removed the agent Auto scope: src/agent/** changed. label Apr 13, 2026

tompro added 3 commits April 14, 2026 08:00

Merge branch 'master' into fix/kimi-streamed-reasoning-replay

1fcf853

Merge branch 'master' into fix/kimi-streamed-reasoning-replay

e6a6df7

Merge branch 'master' into fix/kimi-streamed-reasoning-replay

bb9be03

singlerider added risk: high Auto risk: security/runtime/gateway/tools/workflows. size: S Auto size: 81-250 non-doc changed lines. needs-maintainer-review labels Apr 17, 2026

singlerider requested a review from WareWolf-MoonWall April 17, 2026 02:56

singlerider reviewed Apr 17, 2026

View reviewed changes

WareWolf-MoonWall requested changes Apr 17, 2026

View reviewed changes

github-project-automation bot moved this from Backlog to Needs Changes in ZeroClaw Project Board Apr 17, 2026

Merge branch 'master' into fix/kimi-streamed-reasoning-replay

7c0f0f6

Merge branch 'master' into fix/kimi-streamed-reasoning-replay

3d2d8f8

tompro requested a review from WareWolf-MoonWall April 18, 2026 15:46

WareWolf-MoonWall approved these changes Apr 18, 2026

View reviewed changes

github-project-automation bot moved this from Needs Changes to Ready to Merge in ZeroClaw Project Board Apr 18, 2026

WareWolf-MoonWall added this to the v0.7.2 milestone Apr 18, 2026

This was referenced Apr 18, 2026

release: v0.7.4 milestone tracking #5877

Open

chore: bump version to 0.7.3 and update release changelog #5893

Merged

singlerider closed this in #5893 Apr 19, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): preserve streamed reasoning content for tool replay#5606

fix(agent): preserve streamed reasoning content for tool replay#5606
tompro wants to merge 6 commits intozeroclaw-labs:masterfrom
tompro:fix/kimi-streamed-reasoning-replay

tompro commented Apr 10, 2026 •

edited

Loading

Uh oh!

singlerider commented Apr 11, 2026

Uh oh!

theonlyhennygod commented Apr 12, 2026

Uh oh!

theonlyhennygod commented Apr 12, 2026

Uh oh!

theonlyhennygod commented Apr 12, 2026

Uh oh!

singlerider left a comment

Uh oh!

WareWolf-MoonWall left a comment

Uh oh!

tompro commented Apr 17, 2026

Uh oh!

WareWolf-MoonWall left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tompro commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Label Snapshot (required)

Change Metadata

Linked Issue

Supersede Attribution (required when Supersedes # is used)

Validation Evidence (required)

Security Impact (required)

Privacy and Data Hygiene (required)

Compatibility / Migration

i18n Follow-Through (required when docs or user-facing wording changes)

Human Verification (required)

Side Effects / Blast Radius (required)

Agent Collaboration Notes (recommended)

Rollback Plan (required)

Risks and Mitigations

Uh oh!

singlerider commented Apr 11, 2026

Uh oh!

theonlyhennygod commented Apr 12, 2026

CI Failure Analysis — #5606

Why CI is failing

What to fix

Uh oh!

theonlyhennygod commented Apr 12, 2026

Agent Review — Ready to Merge

Uh oh!

theonlyhennygod commented Apr 12, 2026

Uh oh!

singlerider left a comment

Choose a reason for hiding this comment

Agent Review — Routing: Needs Maintainer Review

Uh oh!

WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

PR Review — #5606 fix(agent): preserve streamed reasoning content for tool replay

What this change does

✅ Commendation

🔴 Blocking — risk label mismatch

🟡 Conditional — multi-chunk reasoning accumulation edge case needs a follow-up

To @tompro

Uh oh!

tompro commented Apr 17, 2026

Uh oh!

WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

✅ The fix

✅ The test

✅ Follow-up discipline

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tompro commented Apr 10, 2026 •

edited

Loading

Supersede Attribution (required when `Supersedes #` is used)

PR Review — #5606 `fix(agent): preserve streamed reasoning content for tool replay`