Skip to content

fix(agent): align history trim to user boundary for provider compatibility#5257

Closed
WanZheng wants to merge 7 commits intozeroclaw-labs:masterfrom
WanZheng:fix/align-history-trim-user-boundary
Closed

fix(agent): align history trim to user boundary for provider compatibility#5257
WanZheng wants to merge 7 commits intozeroclaw-labs:masterfrom
WanZheng:fix/align-history-trim-user-boundary

Conversation

@WanZheng
Copy link
Copy Markdown

@WanZheng WanZheng commented Apr 3, 2026

Summary

  • Base branch target (master for all contributions): master
  • Problem: After any history trim operation, the first non-system message could be assistant or tool, causing providers like Zhipu GLM to reject the request with a 400 error (they require system* -> user ordering).
  • Why it matters: Long tool-using conversations trigger trimming frequently; invalid ordering silently breaks provider calls with no automatic recovery.
  • What changed: Added boundary alignment to every trim path — after removing messages, continue dropping leading non-system messages until the first one has role user. Extracted shared align_to_user_boundary helper to avoid duplication.
  • What did not change (scope boundary): Tool-pair integrity (addressed by fix(agent): treat tool_use/tool_result as atomic groups in history pruning #4825), compaction logic, channel history management, provider retry/fallback behavior.

Label Snapshot (required)

  • Risk label (risk: low|medium|high): risk: medium
  • Size label (size: XS|S|M|L|XL, auto-managed/read-only): size: S
  • Scope labels: agent
  • Module labels: agent: history
  • Contributor tier label: N/A
  • If any auto-label is incorrect, note requested correction: N/A

Change Metadata

  • Change type (bug|feature|refactor|docs|security|chore): bug
  • Primary scope (runtime|provider|channel|memory|security|ci|docs|multi): agent

Linked Issue

Supersede Attribution (required when Supersedes # is used)

N/A

Validation Evidence (required)

Commands and result summary:

cargo fmt --all -- --check   # pass
cargo clippy --all-targets -- -D warnings   # pass, zero warnings
cargo test   # 159 passed, 0 failed, 10 ignored
  • Evidence provided: all three commands pass locally on macOS (darwin 25.3.0)
  • If any command is intentionally skipped, explain why: N/A

Security Impact (required)

  • New permissions/capabilities? No
  • New external network calls? No
  • Secrets/tokens handling changed? No
  • File system access scope changed? No

Privacy and Data Hygiene (required)

  • Data-hygiene status: pass
  • Redaction/anonymization notes: No personal/sensitive data introduced
  • Neutral wording confirmation: All test fixtures use neutral labels (sys, u1, a1, t1, etc.)

Compatibility / Migration

  • Backward compatible? Yes
  • Config/env changes? No
  • Migration needed? No

i18n Follow-Through (required when docs or user-facing wording changes)

  • i18n follow-through triggered? No (no user-facing wording changes)

Human Verification (required)

What was personally validated beyond CI:

  • Verified scenarios: constructed message sequences with [system, user, assistant, tool, user, assistant], confirmed trim leaves user as first non-system message across all four trim paths
  • Edge cases checked: empty history, system-only, multiple system messages, all-assistant (no user), keep_recent protection preventing over-removal
  • What was not verified: live end-to-end with Zhipu GLM provider (no API key available)

Side Effects / Blast Radius (required)

  • Affected subsystems/workflows: history::trim_history, history::emergency_history_trim, history_pruner::prune_history, Agent::trim_history
  • Potential unintended effects: Slightly fewer messages retained after trim when orphaned assistant/tool messages are dropped to reach the next user boundary. This is strictly better than sending an invalid sequence to the provider.
  • Guardrails/monitoring for early detection: Provider error logs for role-ordering rejections should decrease; if message loss becomes excessive, keep_recent config protects the most recent messages.

Agent Collaboration Notes (recommended)

  • Agent tools used: Cursor (code review, cherry-pick, polish)
  • Workflow/plan summary: Reviewed commit, verified fix correctness across all four trim paths, corrected inaccurate commit message (removed false ensure_valid_sequence removal claim), deduplicated Phase 3 logic in history_pruner by reusing shared align_to_user_boundary, validated with fmt/clippy/test
  • Verification focus: boundary alignment correctness, edge case coverage, no regression in existing tests
  • Confirmation: naming + architecture boundaries followed (AGENTS.md + CONTRIBUTING.md)

Rollback Plan (required)

  • Fast rollback command/path: git revert <merge-commit> — restores previous trim behavior
  • Feature flags or config toggles (if any): None
  • Observable failure symptoms: Providers that enforce system* -> user ordering (Zhipu GLM, MiniMax) would return 400 errors after history trimming in long conversations

Risks and Mitigations

  • Risk: If no user message exists after the trim point, trim_history may drain all non-system messages.
    • Mitigation: This only occurs in malformed conversations (no user messages). In practice, every valid conversation has user messages. The keep_recent parameter protects the most recent messages from removal.

@JordanTheJet
Copy link
Copy Markdown
Collaborator

PR Review: fix(agent): align history trim to user boundary for provider compatibility

Comprehension Summary

What: After any history trim operation (across four code paths: trim_history, emergency_history_trim, prune_history, and Agent::trim_history), the first non-system message could end up being assistant or tool. This PR ensures the first non-system message always has role user by extending the trim past any leading non-user messages until a user boundary is reached.

Why: Providers like Zhipu GLM and MiniMax require system* -> user message ordering. When history trimming left an assistant or tool message at the front, these providers returned 400 errors with no automatic recovery — effectively bricking long tool-using sessions.

Blast radius: Four trim paths in src/agent/: history.rs (trim_history, emergency_history_trim, align_to_user_boundary), history_pruner.rs (prune_history), and agent.rs (Agent::trim_history). No changes to provider logic, channel handling, or compaction. Risk classification: medium (behavior change in agent history management, but strictly corrective).


Security & Performance

  • Security: No new permissions, network calls, or secrets handling. No impact.
  • Performance: align_to_user_boundary uses history.remove(idx) in a loop, which is O(n*k) where k is the number of orphaned messages. For typical history sizes and trim scenarios (1-3 orphaned messages), this is negligible. The inline drain-extension in trim_history is O(1) extra work. No concern.

Overlap with PR #5264

PR #5264 (fix(agent): preserve tool_use/tool_result pairs during history trimming by @okhsunrog) addresses a complementary but distinct invariant: it prevents orphaned tool messages that lack a preceding assistant (which causes Anthropic API 400 errors). Both PRs modify trim_history and emergency_history_trim in history.rs and will have textual merge conflicts.

The two fixes are not redundant — they enforce different constraints:

Recommendation: Whoever merges second must reconcile the merge conflict. Ideally both invariants should be enforced in a single post-trim pass. Consider coordinating merge order or combining.


Issues

  1. [nit] Inconsistent deduplication claim. The PR body states "Extracted shared align_to_user_boundary helper to avoid duplication," but trim_history in history.rs uses an inline while loop to extend the drain rather than calling the shared helper. This is defensible for efficiency (drain-extension vs. remove-in-loop), but the PR description is slightly misleading. Consider either calling the shared helper for consistency, or noting the intentional divergence in the PR body.

  2. [low] Edge case: no user message exists. In trim_history, if no user message exists after the drain point, drain_end advances to history.len(), draining all non-system messages. In Agent::trim_history, the same pattern yields an empty other_messages vec. The PR body correctly identifies this as a malformed-conversation edge case and notes keep_recent as a mitigation — but trim_history (the ChatMessage variant) does not have keep_recent protection. This means a conversation with only system+assistant messages could be fully drained. Acceptable given the scenario is degenerate, but worth a doc comment.

  3. [low] Agent::trim_history does not use align_to_user_boundary. The Agent struct uses ConversationMessage (an enum wrapping ChatMessage), so it cannot directly call the ChatMessage-typed helper. The inline implementation is correct but represents a second place this logic must be maintained. A future refactor could unify the types or add a trait-based helper.

  4. [info] Test for Agent::trim_history is missing. The PR adds thorough unit tests for trim_history, emergency_history_trim, align_to_user_boundary, and prune_history, but the Agent::trim_history method in agent.rs is only tested indirectly (if at all). The inline logic there is simple enough that this is acceptable, but a dedicated test would strengthen coverage.

  5. [info] Existing test modified in history_pruner.rs. The prune_collapses_tool_pairs test was updated to insert a leading msg("user", "do something") to satisfy the new alignment invariant. This is correct — the original test had an implicit assumption that system -> assistant was valid, which was always wrong for strict providers. The updated assertions (messages.len() == 5, index shifts) are consistent.


Verdict

APPROVE -- The fix is correct, well-scoped, and addresses a real provider-compatibility bug. Test coverage is strong (12 new test cases across the helper, trim_history, emergency_history_trim, and prune_history). The issues identified are all low-severity or informational. CI is green.

Merge coordination note: this PR will conflict with #5264. The maintainer merging second should ensure both invariants (user-first ordering AND tool-pair integrity) are preserved in the reconciled code.


Reviewed by ZeroClaw PR Review Agent

@JordanTheJet JordanTheJet added the agent-approved PR approved by automated review agent label Apr 5, 2026
WanZheng added 4 commits April 7, 2026 18:57
…ility

After any trim operation (trim_history, emergency_history_trim,
prune_history, Agent::trim_history), the first non-system message could
be `assistant` or `tool`. Providers like Zhipu GLM reject such sequences,
requiring `system* -> user` ordering.

Add boundary alignment to every trim path: after removing messages,
continue dropping leading non-system messages until the first one has
role `user`. Extract shared `align_to_user_boundary` helper in
history.rs and reuse it in history_pruner Phase 3.

Changes:
- history::trim_history: extend drain end to next user message
- history::emergency_history_trim: call align_to_user_boundary after drop
- history::align_to_user_boundary: new pub(crate) helper
- history_pruner::prune_history: add Phase 3 via align_to_user_boundary
- Agent::trim_history: extend drain_end to next user ConversationMessage

Made-with: Cursor
…e APIs

After proactive trim and length-based truncation, drop leading non-user
turns so a prepended system message still yields system→user ordering
(e.g. Zhipu GLM). Add unit tests for the alignment behavior.

In OpenAI-compatible provider: centralize tool_stream via
tool_stream_for_tools, omit tool_stream when tools are absent, and emit
debug logs with message role sequences on HTTP 400 for easier diagnosis.

Made-with: Cursor
…and test

- Refactor `trim_history` to call shared `align_to_user_boundary` instead
  of inline while-loop, making the deduplication claim accurate
- Add doc comment on no-user edge case in `trim_history` (no `keep_recent`
  guard; acceptable for degenerate conversations)
- Add doc comment on `Agent::trim_history` explaining why it cannot reuse
  the shared helper (ConversationMessage vs ChatMessage type mismatch)
- Add dedicated test for `Agent::trim_history` boundary alignment
- Fix pre-existing rustfmt issues in agent.rs (assert! format, import order)

Made-with: Cursor
@WanZheng WanZheng force-pushed the fix/align-history-trim-user-boundary branch from 5bacecc to 76756a6 Compare April 7, 2026 11:12
@github-actions github-actions Bot added channel Auto scope: src/channels/** changed. provider Auto scope: src/providers/** changed. provider:compatible Auto module: provider/compatible changed. labels Apr 7, 2026
@WanZheng
Copy link
Copy Markdown
Author

WanZheng commented Apr 7, 2026

Post-review update summary

Reviewer feedback addressed (all items from @JordanTheJet's review)

# Issue Severity Resolution
1 trim_history used inline loop instead of shared helper nit Fixed in 5eb34d9trim_history now calls align_to_user_boundary
2 Edge case: no user message after drain low Doc comment added to trim_history explaining the degenerate-conversation behavior
3 Agent::trim_history can't reuse the ChatMessage-typed helper low Doc comment added explaining the ConversationMessage type boundary
4 Missing test for Agent::trim_history info Test trim_history_aligns_to_user_boundary_for_conversation_messages added
5 Modified prune_collapses_tool_pairs test info Acknowledged — test correctly updated for new invariant

Additional fix in this push

  • cargo fmt compliance: Applied rustfmt to agent.rs and channels/mod.rs (macro formatting and import ordering) to pass cargo fmt --all -- --check.

Validation evidence (local, post-update)

cargo fmt --all -- --check   # pass
cargo clippy --all-targets -- -D warnings   # pass, zero warnings
cargo test   # 5926 passed, 2 failed (pre-existing sandbox-only: git_operations), 3 ignored

Label request for maintainers

The PR body specifies risk: medium and size: S, but the fork account cannot add labels. Please apply:

  • risk: medium
  • size: S

Note on PR #5264 overlap

PR #5264 (tool_use/tool_result pair preservation) is still open and addresses a complementary invariant. Both PRs modify trim_history / emergency_history_trim in history.rs. Whoever merges second will need to resolve textual conflicts. The two invariants are independent and both needed.

@okhsunrog
Copy link
Copy Markdown

I closed #5264

WanZheng added 3 commits April 8, 2026 18:08
Combine orphan ToolResults guard with user-boundary alignment on
ConversationMessage, and keep both unit tests.

Made-with: Cursor
@singlerider singlerider requested a review from Audacity88 April 29, 2026 08:18
@singlerider singlerider added bug Something isn't working risk: medium Auto risk: src/** or dependency/config changes. size: M Auto size: 251-500 non-doc changed lines. needs-author-action Author action required before merge labels Apr 29, 2026
@Audacity88
Copy link
Copy Markdown
Collaborator

I think this needs a rebase before meaningful review. GitHub reports DIRTY and local merge checking conflicts across provider compatibility, runtime history, the old src/agent/history_pruner.rs path, and channel code. Current master now has substantial history-pruner/tool_use/tool_result preservation work in crates/zeroclaw-runtime/src/agent/history.rs and history_pruner.rs, so the original fix may overlap with later changes.

Could you rebase against current master and confirm the remaining provider-compatibility failure this PR still fixes?

@JordanTheJet
Copy link
Copy Markdown
Collaborator

Hi @WanZheng — thanks for the careful work on this. The fix shape is sound, but this PR has been overtaken by parallel work and isn't reviewable as-is:

  1. File paths moved. All four targeted files (src/agent/agent.rs, src/agent/history.rs, src/agent/history_pruner.rs, src/providers/compatible.rs) live under crates/zeroclaw-runtime/ and crates/zeroclaw-providers/ after the workspace split. The branch is DIRTY against master and can't cleanly rebase.

  2. Adjacent fixes have already landed. Master now has tool_use/tool_result pair preservation in Agent::trim_history (agent.rs:738-780) and persistent-session orphan repair in load_interactive_session_history (history.rs:198-204) — both via [Bug]: History pruner severs tool_use/tool_result pairs, causing Anthropic 400 error #4810 and [Bug]: Compaction still orphans tool_result blocks, causing Anthropic API 400 errors (v0.6.9) (Signal Channel) #5813. Your patch overlaps these meaningfully.

  3. The user-boundary alignment piece is being handled in fix(providers): drop leading non-user turns before provider call #6303. That open PR ("fix(providers): drop leading non-user turns before provider call") closes [Bug]: Gemini 400 — assistant tool_call emitted as first non-system turn (history serializer invariant violation) #6302 and targets the current crate layout — it does the same system* → user alignment you proposed, but at the provider-call boundary in crates/zeroclaw-providers/src/history_sanitizer.rs, which catches every code path (CLI, gateway, agent-loop) rather than only the trim sites.

Closing as superseded. If #6303 lands and there's still a missed trim path that produces a leading-assistant or leading-tool message, please open a fresh issue with the exact reproduction — happy to revisit. Thanks again for the patch and for the rigorous validation evidence in the original description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agent Auto scope: src/agent/** changed. agent-approved PR approved by automated review agent bug Something isn't working channel Auto scope: src/channels/** changed. needs-author-action Author action required before merge provider:compatible Auto module: provider/compatible changed. provider Auto scope: src/providers/** changed. risk: medium Auto risk: src/** or dependency/config changes. size: M Auto size: 251-500 non-doc changed lines.

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

5 participants