fix(reborn): preserve provider tool roundtrip metadata by henrypark133 · Pull Request #3722 · nearai/ironclaw

henrypark133 · 2026-05-17T01:15:38Z

Summary

preserve provider tool-call metadata through the Reborn loop so provider-backed models can receive valid tool-result roundtrips
keep raw provider replay metadata out of product-visible tool-result transcript content; store it on the typed internal provider-call side channel instead
group consecutive tool-result refs from the same provider turn so multi-tool responses replay as one assistant tool-call message followed by matching tool results

Stacked on #3720 / reborn/3622-tool-result-evidence.

Verification

cargo test -p ironclaw_reborn --features root-llm-provider --test llm_gateway
cargo test -p ironclaw_loop_support --test thread_loop_support_contract
cargo test -p ironclaw_threads --test session_thread_contract
cargo test -p ironclaw_threads --features libsql --test db_session_thread_contract
cargo test -p ironclaw_agent_loop
cargo test -p ironclaw_turns
cargo test -p ironclaw_reborn --test loop_driver_host
git diff --check origin/reborn/3622-tool-result-evidence..HEAD

gemini-code-assist

Code Review

This pull request introduces comprehensive support for provider-side tool calls within the agent loop, enabling the model to emit and track capability invocations with associated metadata. Key changes include the addition of ProviderToolCall and ProviderToolDefinition structures, updates to the LoopCapabilityPort and HostManagedModelGateway traits to handle tool-aware requests, and enhancements to the thread history logic for replaying tool-call turns. Feedback highlights critical issues regarding non-deterministic snapshot selection from HashMap collections in capability_port.rs, a missing import for sha256_digest_token, and fragile manual field copying when constructing tool completion requests in model_gateway.rs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 127b4249b5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copilot

Pull request overview

This PR adds provider-native tool-call metadata plumbing through the Reborn loop so tool-result transcript refs can be replayed to LLM providers as valid tool-call/result roundtrips.

Changes:

Adds provider tool definitions/calls/reference DTOs across turns, loop support, agent loop, and thread storage.
Updates model gateway conversion to call providers with tools and reconstruct provider tool roundtrips from stored result refs.
Extends tests and fixtures for tool-result metadata and new capability descriptor schemas.

Reviewed changes

Copilot reviewed 21 out of 21 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
crates/ironclaw_turns/tests/agent_loop_host_contract.rs	Updates capability descriptor fixtures with parameter schemas.
crates/ironclaw_turns/src/run_profile/mod.rs	Re-exports provider tool metadata types.
crates/ironclaw_turns/src/run_profile/host.rs	Adds provider tool DTOs and capability/transcript metadata fields.
crates/ironclaw_threads/tests/session_thread_contract.rs	Updates tool-result append calls for new provider metadata field.
crates/ironclaw_threads/tests/db_session_thread_contract.rs	Updates durable tool-result append calls for new field.
crates/ironclaw_threads/src/tool_result_reference.rs	Adds provider tool-call reference envelope validation.
crates/ironclaw_threads/src/lib.rs	Re-exports provider tool-call reference envelope.
crates/ironclaw_threads/src/in_memory.rs	Persists provider metadata in in-memory thread records/context.
crates/ironclaw_threads/src/db.rs	Persists provider metadata in durable thread records/context.
crates/ironclaw_threads/src/contract.rs	Extends thread/message/context contracts with provider metadata.
crates/ironclaw_reborn/tests/loop_driver_host.rs	Updates test candidates for new provider metadata fields.
crates/ironclaw_reborn/tests/llm_gateway.rs	Adds provider tool-call and roundtrip reconstruction tests.
crates/ironclaw_reborn/src/model_gateway.rs	Adds tool-aware provider requests and tool-result replay reconstruction.
crates/ironclaw_reborn/src/loop_exit_applier/tests/mod.rs	Updates test records/requests for new metadata field.
crates/ironclaw_reborn/src/loop_driver_host.rs	Passes capability ports into model gateway flow.
crates/ironclaw_loop_support/tests/thread_loop_support_contract.rs	Adds metadata preservation checks for transcript/model ports.
crates/ironclaw_loop_support/src/lib.rs	Carries provider metadata through transcript and model request adapters.
crates/ironclaw_loop_support/src/capability_surface_filter.rs	Filters provider tool definitions/calls by capability allow-set.
crates/ironclaw_loop_support/src/capability_port.rs	Builds provider tool definitions and maps provider calls to capability candidates.
crates/ironclaw_agent_loop/src/test_support/mod.rs	Records provider metadata in mock host result refs.
crates/ironclaw_agent_loop/src/executor.rs	Propagates provider call metadata when appending completed tool results.

Comments suppressed due to low confidence (5)

crates/ironclaw_loop_support/src/capability_port.rs:541

Provider tool calls cannot be used with the actual host-runtime capability port because this new path immediately calls LoopCapabilityInputResolver::register_provider_tool_call_input, but the trait only provides a default implementation that returns InvalidInvocation and there is no overriding implementation in the repository. As a result, any provider response containing tool calls will fail instead of producing capability candidates unless every host wires a custom resolver implementation.

        let input_ref = self
            .input_resolver
            .register_provider_tool_call_input(&self.run_context, &tool_call)
            .await?;

crates/ironclaw_loop_support/src/capability_port.rs:534

This lookup accepts any provider name recorded in the visible snapshot, including capabilities that tool_definitions() deliberately filtered out because their schemas were not usable for provider tool calling. A provider can still return one of those unadvertised tool names and this path will register it for invocation, so the registration path should enforce the same advertised-tool set as tool_definitions().

            let Some(capability_id) = snapshot.provider_names.get(&tool_call.name).cloned() else {
                return Err(AgentLoopHostError::new(
                    AgentLoopHostErrorKind::InvalidInvocation,
                    "provider tool call is outside the visible capability surface",
                ));

crates/ironclaw_loop_support/src/capability_port.rs:515

Provider-call metadata is not validated before it is turned into a capability input. The only new validation happens later when the tool result is appended to the thread, so an oversized or invalid provider call can execute the capability side effect and then fail transcript persistence, leaving the run unable to replay the roundtrip safely. Validate the provider id/name/arguments limits before registering the input ref.

    async fn register_provider_tool_call(
        &self,
        tool_call: ProviderToolCall,
    ) -> Result<ironclaw_turns::run_profile::CapabilityCallCandidate, AgentLoopHostError> {

crates/ironclaw_reborn/src/model_gateway.rs:855

Assistant-level provider reasoning is reconstructed from the first per-call reasoning value, but provider_tool_call_from_llm stores call-specific reasoning and response-level reasoning in the same field. When the first tool call has its own reasoning and the response also has shared reasoning, replay sends the call-specific text as the assistant message reasoning instead of the provider's response-level reasoning, which can break providers that require the exact reasoning artifact to be echoed.

    let reasoning = provider_results
        .iter()
        .find_map(|(provider_call, _)| provider_call.reasoning.clone());
    let assistant = ChatMessage::assistant_with_tool_calls(

crates/ironclaw_loop_support/src/capability_port.rs:769

Provider tool names are generated directly from the full capability id without enforcing the 256-byte limit that ProviderToolCallReferenceEnvelope::validate applies later. A long but valid capability id can therefore be advertised to the provider and executed, then fail when the result metadata is appended; the generated name should be bounded or rejected before advertising the tool.

    let base = capability_id.as_str().replace('.', "__");
    if !existing.contains_key(&base) {
        return base;
    }
    let digest = sha256_digest_token(capability_id.as_str().as_bytes());
    format!("{}__{}", base, &digest[..8])

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

henrypark133 · 2026-05-17T02:04:37Z

Addressed the provider-tool-call review round in 3f4e21cfc88265af73dae48de99dc7b2feb6afee.

Summary:

replaced the expanded CapabilityCallCandidate provider fields with nested ProviderToolCallReplay metadata
made provider surface selection deterministic and provider registration reject invalid calls before staging input
preserved zero-argument capability schemas for provider tools
scoped synthesized provider turn ids by run/turn/model-call and fixed digest truncation
centralized CompletionRequest -> ToolCompletionRequest conversion
kept provider replay metadata internal to context loading while stripping/redacting it from product history surfaces
added executor, loop-support, in-memory thread, and durable libSQL contract coverage for the provider metadata path

Resolved the review threads that are covered by those changes.

zmanian

Reviewed the diff carefully. The metadata side channel is structurally sound but a few correctness gaps in the replay path need attention before this lands.

Metadata isolation — looks correct

ThreadMessageRecord.tool_result_provider_call is stripped in history_messages() in both crates/ironclaw_threads/src/db.rs and crates/ironclaw_threads/src/in_memory.rs, while context_messages_with_summary_replacements retains it. So list_thread_history (product surface) sees only content, and load_context_window (model replay) sees the provider envelope. Redaction clears it. Test coverage on the in-memory and durable paths is real. Good.

One nit: nothing prevents a future caller from serializing a ThreadMessageRecord directly (it derives Serialize and the field has skip_serializing_if = Option::is_none but is otherwise public). A doc comment landed on the field, but it would be more defensible to make tool_result_provider_call pub(crate) or wrap accessors so the only emission path is the deliberately-scrubbing history fn. As-is, the isolation is enforced by convention, not by the type.

Multi-tool grouping — partial-failure replay is wrong

In crates/ironclaw_agent_loop/src/executor.rs, append_capability_result_ref only runs on CapabilityOutcome::Completed. Failures don't append a tool-result ref carrying provider_call. On the next turn, convert_messages in crates/ironclaw_reborn/src/model_gateway.rs groups consecutive ToolResult messages by provider_turn_id and reconstructs an assistant message containing only the completed tool calls.

That is structurally invalid for OpenAI / Anthropic: if the model originally emitted N tool calls in one turn, the provider expects exactly N matching tool-result messages. Replaying with K<N tool_calls and K tool_results drops the failed calls entirely from the assistant message — the provider sees a turn that never happened, and may produce different (or rejected) downstream behavior.

The two grouping tests (gateway_reconstructs_multi_tool_provider_turn_from_grouped_result_references, gateway_with_two_tool_calls_returns_two_candidates) only cover the all-success case. There is no test for partial-failure replay, no test for failure-only replay, and no test for what happens when a denied / suspended tool is in the batch.

Either:

emit a tool-result envelope with provider_call populated even on failure (with the safe summary describing the failure), so the assistant turn replays with its full original shape, or
record the assistant tool-call turn shape independently so reconstruction does not depend on which results survived.

Worth catching before this lands.

Cross-provider replay — no identity check

ProviderToolCallReferenceEnvelope carries provider_call_id, provider_tool_name, and signature verbatim into ChatMessage::assistant_with_tool_calls and ChatMessage::tool_result on replay (tool_result_replay_message → provider_tool_call_from_reference). Nothing in the envelope, the ToolResultReferenceEnvelope, or ProviderToolCallReplay records the originating provider or model.

Real failure modes on resume against a different provider:

Anthropic call IDs are toolu_*, OpenAI are call_*. Replaying Anthropic IDs into an OpenAI request will be rejected by the API.
signature is an Anthropic reasoning-signature concept. Forwarding it into an OpenAI request is at best ignored, at worst rejected.
Tool-name normalization differs (demo__echo is fine for both, but the encoding rule lives in provider_tool_name() and is not provider-specific — a future provider with stricter constraints would silently mismatch).

Either bind the replay metadata to the producing provider/model and refuse cross-provider replay, or strip/translate provider-specific fields on emit. As-is, a resume that crosses providers is a latent runtime failure with no early signal.

Schema gate vs invocation surface drift

provider_schema_is_usable (crates/ironclaw_loop_support/src/capability_port.rs) silently drops capabilities whose parameters_schema isn't { "type": "object", ... } from tool_definitions(), but the same capabilities remain invokable through invoke_capability. The model sees a smaller surface than the host enforces. For capabilities authored without a schema this is a soft failure — they become permanently invisible to provider-tool-call paths until someone notices. Worth at least a debug! when a capability is filtered out, and ideally a contract that capabilities advertised as visible always carry a usable schema.

Stacking on #3720

Base is reborn/3622-tool-result-evidence (#3720's head). Does not stand alone — the ProviderToolCallReferenceEnvelope validation primitives, tool_result_provider_call field plumbing, and the tool-result-reference envelope itself are introduced on the parent branch. If #3720 changes shape during review (envelope schema, validation rules) this PR's tests will need a rebase pass. Worth confirming the merge order is locked.

Smaller items

provider_turn_id is a sha256-truncated digest of (run, turn, model_call_seq, tool_call ids+names). The 16-hex-char truncation gives ~64 bits of entropy. Collisions across long-running threads are vanishingly unlikely but the grouping in convert_messages relies entirely on string equality — a collision would silently merge two distinct provider turns into one assistant message. Either widen to 32 hex chars or pin grouping with a secondary check (e.g., contiguous sequence range in transcript).
AtomicU64::fetch_add(1, Ordering::Relaxed) for provider_turn_sequence is fine for grouping but means turn ids are not deterministic across process restarts. If a gateway is recreated mid-thread, sequence resets to 1 and the same (run, turn) produces a new provider_turn_id. Confirm this matches the intended replay semantics.
LlmProviderModelGateway::stream_model and RoutedLlmProviderModelGateway::stream_model no longer pass capabilities — only stream_model_with_capabilities does. The non-tool path is intentional, but means any code path that goes through plain stream_model will never get tool definitions even when capabilities are wired. Ensure no production caller path falls back to plain stream_model by accident.

Approve the metadata side-channel direction. Block on partial-failure replay correctness and cross-provider identity handling.

Zaki

henrypark133 · 2026-05-17T05:18:20Z

Addressed Zaki’s follow-up review in c6c69252c45664374c6a3861e484350f616b1c7b.

What changed:

provider replay metadata is now bound to provider/model identity and replay fails closed if the selected route does not match
provider-originated terminal non-success outcomes now append a safe synthetic tool-result reference, so replay preserves the full assistant tool-call shape for mixed success/failure batches
provider turn ids now use 32 hex chars of digest entropy
provider tool schema drops now emit debug diagnostics
direct ThreadMessageRecord serialization skips the provider replay side channel, while durable DB persistence still writes/loads it explicitly

Validation run locally:

cargo test -p ironclaw_agent_loop
cargo test -p ironclaw_reborn --features root-llm-provider --test llm_gateway
cargo test -p ironclaw_threads --test session_thread_contract
cargo test -p ironclaw_threads --features libsql --test db_session_thread_contract
cargo test -p ironclaw_loop_support
cargo test -p ironclaw_turns
cargo test -p ironclaw_reborn --test loop_driver_host
cargo check -p ironclaw_reborn --features root-llm-provider

henrypark133 · 2026-05-17T05:39:33Z

Follow-up after rebasing onto reborn/3622-tool-result-evidence and running another review pass.

A review subagent checked local head be5888af0965d514080df0b7bf6027533c3af319 against base a60b062b3 for PR scope, Reborn contract boundaries, and bug risk. It found no verified bugs or contract violations. The review specifically checked provider replay identity, provider metadata storage/public serialization boundaries, synthetic result refs for provider-backed non-success paths, and the cross-crate capability/result-reference contracts.

Additional fixes included in be5888af0:

static LlmProviderModelGateway now carries an explicit provider id instead of deriving provider identity from LlmProvider::model_name()
the all-features clippy large-enum warning in agent-loop test support is fixed
the post-rebase clippy warning in loop-exit-applier tests is fixed

Local verification after the rebase:

cargo fmt --check
cargo test -p ironclaw_reborn --features root-llm-provider --test llm_gateway
cargo test -p ironclaw_agent_loop
cargo clippy --all --benches --tests --examples --all-features -- -D warnings

The rebased branch has been force-pushed with lease. GitHub checks have started on the new head and are still pending.

serrrfirat

Correctness review focused on regressions from the broad provider-tool-call changes. I found two medium-or-higher issues and did a second pass over the touched replay, filtering, and persistence paths without finding additional medium findings.

Findings:

P1 - Provider replay metadata is lost on history fallback. ThreadBackedLoopModelPort::resolve_model_messages falls back to list_thread_history when a requested message ref is outside the current context window. Both DB and in-memory list_thread_history deliberately clear tool_result_provider_call before returning history records, so history_messages_by_ref rebuilds a model message without the provider side channel. convert_messages then treats that tool result as a plain system summary instead of reconstructing the assistant tool-call + tool-result round trip. Any older tool-result ref resolved through history can therefore lose the provider call id/signature/reasoning and break the next provider request.
P2 - Provider-native tool definitions ignore the executor's per-iteration capability filter. The executor computes a CapabilityFilter and applies it only to the local visible surface used for prompt/invocation checks. Later, the gateway fetches provider tool definitions from capabilities.tool_definitions(), which is built from the current snapshot without that local filter. The executor still denies filtered calls after the model asks for them, but the provider is being advertised tools that the current loop iteration intended to hide, leading to avoidable invalid tool calls and incorrect model context.

Verification run locally on the PR head:

cargo test -p ironclaw_threads --test session_thread_contract provider -- --nocapture
cargo test -p ironclaw_loop_support capability_surface_filter::tests::provider_tool_call_denied_before_inner_registration_when_not_allowed
cargo test -p ironclaw_reborn --features root-llm-provider --test llm_gateway

The first attempted cargo test -p ironclaw_reborn --test llm_gateway failed only because that test target requires root-llm-provider; rerunning with the feature passed.

zmanian

Re-review of 3 commits since my prior review (6657d33d, 194ee470, be5888af) on top of the rebased base.

Prior findings — status

1. Metadata isolation enforced by convention not type — Partial. ThreadMessageRecord now has a doc note and DB persistence is the only explicit emitter; in-memory list_thread_history still scrubs at the same spot (in_memory.rs:612). The field is still public on a Serialize struct. Acceptable for now but I'd still prefer pub(crate) with explicit accessors.

2. Multi-tool partial-failure replay broken — Resolved. executor.rs now appends a tool-result ref on all terminal non-success outcomes (SkipResult, FailRun/FailTurn, PolicyDenied Retry, gate SkipAndContinue/Abort, unsupported process wait, final-fallback). append_capability_error_ref/append_capability_safe_summary_ref carry the provider_call envelope so reconstructed assistant turns match the original tool-call shape. New test gateway_reconstructs_multi_tool_provider_turn_with_failure_safe_summary covers mixed success/failure. Good fix.

3. Cross-provider replay no identity check — Resolved. ProviderToolCallReferenceEnvelope now carries provider_id + provider_model_id, validated on emit (capability_port.rs::validate_provider_identity) and again on replay (model_gateway.rs::validate_provider_replay_identity), which fails closed with PolicyDenied on mismatch. LlmProviderModelGateway::new now takes an explicit provider_id instead of deriving from model_name(). Good.

4. Schema gate vs invocation surface drift — Partial. Filter now emits tracing::debug! when a capability is dropped from tool_definitions(). The invocation surface still accepts capabilities that were silently filtered (no positive contract that "visible ⇒ provider-usable"). Acceptable for follow-up; not a blocker.

5. Stacking on #3720 — Resolved by rebase. Base is now a60b062b3 post-rebase; both PRs need to land in order.

6. 64-bit provider_turn_id entropy + non-deterministic sequence — Partially resolved. Digest widened from 16 to 32 hex chars (sha256_hex_prefix(..., 32)) — collision risk now adequate. AtomicU64 sequence reset on gateway recreation is unchanged; that means the same (run, turn) can produce different provider_turn_ids across restarts. Grouping is per-turn so a restart mid-turn is the only real failure window; flag for future hardening.

New findings (still blocking)

A. History-fallback path loses tool_result_provider_call (also raised by serrrfirat). ThreadBackedLoopModelPort::resolve_model_messages in crates/ironclaw_loop_support/src/lib.rs:862 falls back to list_thread_history when a result_ref is outside the active context window. Both in_memory.rs:611-613 and the DB equivalent in db.rs deliberately clear tool_result_provider_call before returning history records. history_messages_by_ref (line 1189) then propagates that None into the rebuilt ContextMessage. convert_messages falls back to the system-summary branch and the assistant tool-call turn cannot be reconstructed. Concretely: any tool-result older than the current window will lose its provider envelope on replay, regressing exactly the case this PR was meant to fix. Either retain the envelope in the fallback path (separate accessor that does not scrub), or refuse fallback for tool-result refs and require they live in the context window.

B. Per-iteration CapabilityFilter not applied to tool_definitions (also raised by serrrfirat). CanonicalAgentLoopExecutor::run_batch computes surface_filter = planner.capability().filter(&state) and applies it to the local VisibleCapabilitySurface via apply_capability_filter (executor.rs:232/250). The model gateway then independently calls capabilities.tool_definitions() on the unfiltered port. CapabilitySurfaceProfileFilter only enforces the static run-profile allow set, not the dynamic per-iteration filter. Net effect: provider gets advertised tools that the executor will then deny. The right fix is to either pass the per-iteration filter through to the capability port the gateway sees, or have tool_definitions() consult the same filter source the executor uses.

Validation

I did not re-run tests locally; relying on the run reported in the PR comments. The new loop_driver_host and gateway tests look targeted; the partial-failure test in particular exercises the right replay shape.

Verdict

Findings 2, 3, 5 cleanly resolved. Findings 1, 4, 6 acceptable as-is or partial. Block on A (history fallback) and B (per-iteration filter not propagated). Both are correctness regressions on the exact replay path this PR is meant to harden.

Zaki

henrypark133 · 2026-05-18T07:14:55Z

Addressed the provider-tool visibility concern in the latest push. The executor now passes the resolved model-visible capability IDs for the current iteration, and loop-support wraps the capability port with that resolved view before provider-native tool definitions/calls reach the model gateway. This keeps loop-specific filtering policy inside agent-loop while making provider tools respect the final visible surface.\n\nValidation run:\n- cargo test -p ironclaw_loop_support --test thread_loop_support_contract\n- cargo test -p ironclaw_agent_loop\n- cargo test -p ironclaw_turns --test agent_loop_host_contract\n- cargo test -p ironclaw_reborn --test loop_driver_host\n- cargo test -p ironclaw_reborn --test loop_milestone_event_projection\n- cargo test -p ironclaw_reborn --features root-llm-provider --test llm_gateway\n- cargo clippy -p ironclaw_turns -p ironclaw_agent_loop -p ironclaw_loop_support -p ironclaw_reborn --tests -- -D warnings

serrrfirat · 2026-05-18T09:27:58Z

Correctness/security follow-up on the latest head (2eea296), avoiding the findings already raised in prior review threads.

Findings:

P2 - Synthetic provider error refs are appended to transcript but not recorded in loop state.

In crates/ironclaw_agent_loop/src/executor.rs, provider-backed non-success paths call append_capability_error_ref / append_capability_safe_summary_ref, which writes a synthetic result:provider-error-* tool-result ref to the transcript. Unlike completed capability outcomes, those synthetic refs are never pushed into state.result_refs.

That leaves TurnSummary.batch_result_refs, final LoopExit.result_refs, and checkpointed loop state missing tool results that were required to satisfy provider replay. The transcript may replay correctly, but loop-level accounting still treats some provider tool outcomes as resultless. The append helper should return the synthetic ref and provider-backed terminal non-success paths should record it in state.result_refs.

P2 - Multi-tool provider registration is not atomic.

In crates/ironclaw_reborn/src/model_gateway.rs, tool_response_to_host registers provider tool calls one at a time. If the first call stages input successfully and a later call fails validation/filtering, the whole model response is rejected while the earlier staged input remains orphaned. Retrying can stage duplicates.

The provider tool-call batch should be prevalidated before staging any inputs, or registration should be moved behind a batch API that can apply rollback/idempotency across the whole provider turn.

Extra paranoid pass: I did not find additional non-duplicate medium-or-higher correctness/security issues in the latest diff.

henrypark133 · 2026-05-18T10:18:34Z

Code Review (multi-agent)

Intent: Preserve provider tool-call metadata through the Reborn loop so provider-backed models receive valid tool-result roundtrips without exposing raw metadata in user-visible transcripts.

Stats: 42 findings (from 42 raw, 42 after dedup) across 27 files

Security (7 issues)

High ContextMessage serializes raw provider metadata without skip_serializing (crates/ironclaw_threads/src/contract.rs:220-228, confidence 90)
High validate_provider_replay_identity only checks provider_id/model_id, not turn/call scope (crates/ironclaw_reborn/src/model_gateway.rs:963-976, confidence 85)
Medium ProviderToolCallReferenceEnvelope arguments field lacks content sanitization (crates/ironclaw_threads/src/tool_result_reference.rs:68-83, confidence 75)
Medium HostManagedModelMessage serializes tool_result_provider_call when present (crates/ironclaw_loop_support/src/lib.rs:1061-1068, confidence 70)
Medium signature field stored and transmitted but never cryptographically verified (crates/ironclaw_threads/src/tool_result_reference.rs:68-107, confidence 65)
Low validate_optional_provider_text lacks sensitive content checks (crates/ironclaw_threads/src/tool_result_reference.rs:181-193, confidence 60)
Low ThreadMessageRecord Clone derives leak provider_call in memory across boundaries (crates/ironclaw_threads/src/contract.rs:78-98, confidence 55)

Bugs (5 issues)

Medium validate_provider_tool_call skips turn_id validation when None (crates/ironclaw_loop_support/src/capability_port.rs:1053, confidence 80)
Medium append_capability_safe_summary_ref accepts unvalidated String for safe_summary (crates/ironclaw_agent_loop/src/executor.rs:920, confidence 75)
Medium invoke_filtered_batch truncation drops denials between last returned outcome and suspension point (crates/ironclaw_loop_support/src/capability_surface_filter.rs:101, confidence 70)
Medium TOCTOU race in CapabilitySurfaceVisibleFilter::register_provider_tool_call (crates/ironclaw_loop_support/src/capability_surface_filter.rs:1298, confidence 60)
Low provider_tool_name collision fallback produces suboptimal names for short capability IDs (crates/ironclaw_loop_support/src/capability_port.rs:1031, confidence 65)

Performance (11 issues)

Medium context_messages_with_summary_replacements is O(n^2 * s) in db.rs (crates/ironclaw_threads/src/db.rs:1026-1075, confidence 75)
Medium context_messages_with_summary_replacements is O(n^2 * s) in in_memory.rs (crates/ironclaw_threads/src/in_memory.rs:548-597, confidence 75)
Low context_messages_by_id does O(n*m) linear scans in db.rs (crates/ironclaw_threads/src/db.rs:1077-1097, confidence 75)
Low context_messages_by_id does O(n*m) linear scans in in_memory.rs (crates/ironclaw_threads/src/in_memory.rs:599-619, confidence 75)
Low to_message_json does redundant triple serialization (crates/ironclaw_threads/src/db.rs:1864-1881, confidence 50)
Low history_messages clones full records then nullifies provider_call in db.rs (crates/ironclaw_threads/src/db.rs:1118-1128, confidence 50)
Low history_messages clones full records then nullifies provider_call in in_memory.rs (crates/ironclaw_threads/src/in_memory.rs:640-650, confidence 50)
Low current_snapshot acquires two mutexes sequentially (crates/ironclaw_loop_support/src/capability_port.rs:309-340, confidence 50)
Nit register_provider_tool_call calls tool_definitions() then inner.register (crates/ironclaw_loop_support/src/capability_surface_filter.rs:56-85, confidence 50)
Nit create_summary_artifact scans messages twice for range validation in db.rs (crates/ironclaw_threads/src/db.rs:892-905, confidence 50)
Nit create_summary_artifact scans messages twice for range validation in in_memory.rs (crates/ironclaw_threads/src/in_memory.rs:414-427, confidence 50)

Tests (15 issues)

High ProviderToolCallReferenceEnvelope.validate() has zero test coverage (crates/ironclaw_threads/src/tool_result_reference.rs:86-106, confidence 100)
High convert_messages() has zero test coverage for provider tool roundtrip (crates/ironclaw_reborn/src/model_gateway.rs:911-961, confidence 100)
High Consecutive tool-result ref grouping not tested with mixed turn_ids (crates/ironclaw_reborn/src/model_gateway.rs:940-954, confidence 100)
High tool_result_replay_message() error path for malformed JSON not tested (crates/ironclaw_reborn/src/model_gateway.rs:983-997, confidence 100)
High tool_response_to_host() finish_reason error paths not tested (crates/ironclaw_reborn/src/model_gateway.rs:767-821, confidence 75)
Medium validate_provider_replay_identity() mismatch error path not tested (crates/ironclaw_reborn/src/model_gateway.rs:963-976, confidence 100)
Medium CapabilitySurfaceVisibleFilter has zero test coverage (crates/ironclaw_loop_support/src/capability_surface_filter.rs:48-118, confidence 100)
Medium validate_provider_tool_call() has no direct unit tests (crates/ironclaw_loop_support/src/capability_port.rs:1052-1079, confidence 100)
Medium history_messages() not tested to confirm provider_call is stripped in in_memory.rs (crates/ironclaw_threads/src/in_memory.rs:640-650, confidence 100)
Medium history_messages() not tested to confirm provider_call is stripped in db.rs (crates/ironclaw_threads/src/db.rs:1118-1128, confidence 100)
Medium context_messages_by_id() not tested with empty ids or all-missing ids in in_memory.rs (crates/ironclaw_threads/src/in_memory.rs:599-619, confidence 75)
Medium context_messages_by_id() in db.rs not tested with empty/missing/mixed cases (crates/ironclaw_threads/src/db.rs:1077-1097, confidence 75)
Medium to_message_json() not tested with and without provider_call in db.rs (crates/ironclaw_threads/src/db.rs:1864-1881, confidence 75)
Medium synthetic_provider_error_result_ref() error paths not tested (crates/ironclaw_agent_loop/src/executor.rs:303-323, confidence 75)
Medium complete_model_request() capabilities error path not tested (crates/ironclaw_reborn/src/model_gateway.rs:714-757, confidence 50)

Conventions (4 issues)

Medium Duplicated provider validation functions across crates (crates/ironclaw_loop_support/src/capability_port.rs:865-937, confidence 90) — Anchor: AGENTS.md
Medium Duplicated ProviderToolCallReference struct requires manual conversion (crates/ironclaw_loop_support/src/lib.rs:1288-1303, confidence 85) — Anchor: AGENTS.md
Low New public types lack documentation (crates/ironclaw_turns/src/run_profile/host.rs:783-1167, confidence 95)
Low Missing safety comment on infallible unwrap_or (crates/ironclaw_loop_support/src/capability_port.rs:823-824, confidence 80) — Anchor: AGENTS.md

henrypark133 · 2026-05-18T17:26:20Z

Review findings summary

All 19 review comments from automated reviewers (Gemini, Codex, Copilot) and human reviewers (zmanian ×2, serrrfirat) have been triaged. No new changes are needed — every finding has been addressed by prior commits on this branch:

Already resolved by commits on this branch

Finding	Status	Resolved by
HashMap non-determinism in `snapshots.values().next()`	✅ Fixed	`2eea296a7` — sorted/deterministic lookups
Missing `sha256_digest_token` import	✅ False positive	Already imported at line 9
Manual field copying `CompletionRequest` → `ToolCompletionRequest`	✅ Fixed	`From` impl added in `crates/ironclaw_llm/src/provider.rs`
`provider_schema_is_usable` rejects no-arg schemas	✅ Fixed	Now uses `.is_none_or()` on properties
`provider_turn_id` collision risk (9 hex chars)	✅ Fixed	Widened to 32 hex chars (128-bit entropy) + includes `run_id`/`turn_id`/`model_call` sequence
`ThreadMessageRecord` exposes provider metadata to product APIs	✅ Fixed	`#[serde(default, skip_serializing)]` prevents serialization
Redaction doesn't clear `tool_result_provider_call`	✅ Fixed	Both `in_memory.rs` and `db.rs` clear it on redact; `history_messages()` scrubs it
Allow-set check after inner delegation (security)	✅ Fixed	Check moved before `self.inner.register_provider_tool_call()`
Missing test coverage for provider metadata roundtrip	✅ Fixed	Tests with `provider_replay: Some(...)` added
Multi-tool partial-failure replay broken	✅ Fixed	`append_capability_error_ref`/`append_capability_safe_summary_ref` carry provider_call
Cross-provider replay no identity check	✅ Fixed	`provider_id` + `provider_model_id` validated on emit and replay
History-fallback loses `tool_result_provider_call`	✅ Fixed	`load_context_window` preserves it; `load_context_messages` added
Per-iteration `CapabilityFilter` not applied to `tool_definitions`	✅ Fixed	`apply_capability_filter` called; `capability_view` passed to model request
`debug!` logging when capability filtered	✅ Fixed	`tracing::debug!` present at capability_port.rs:548

Duplicates (same issue, already resolved)

Schema filtering comment on capability_port.rs:500 — duplicate of 3253841725
Redaction comment on in_memory.rs:302 — duplicate of db.rs:779

Suppressed (low-confidence, no action needed)

Copilot suppressed findings — all deferred by design (fail-closed defaults, validation exists)

Verdict: All review findings are addressed. The PR is mergeable with a clean state.

@zmanian @serrrfirat — please re-review if you have any remaining concerns.

henrypark133

All review findings have been addressed by commits on this branch. Re-reviewing to confirm.

zmanian

Re-review against head faa93b84 (the "address provider replay review findings" commit) on top of the prior reviewer rounds.

Prior findings — status check

#	Finding	Status
Zaki #1	Metadata isolation enforced by convention not type	Partial — `tool_result_provider_call` is now `skip_serializing` on both `ThreadMessageRecord` (contract.rs:94) and `HostManagedModelMessage` (loop_support/lib.rs:1066). Acceptable.
Zaki #2	Multi-tool partial-failure replay broken	Resolved via `append_capability_safe_summary_ref` on all terminal non-success paths.
Zaki #3	Cross-provider replay no identity check	Resolved — `validate_provider_replay_identity` runs on each replayed envelope.
Zaki #4	Schema gate vs invocation surface drift	Partial — `tracing::debug!` emitted on filter, no positive contract. Acceptable.
Zaki #6	Turn id entropy + non-determinism	Resolved (32 hex chars). Restart-mid-turn sequence reset still latent.
Zaki A	History fallback loses `tool_result_provider_call`	Resolved — `resolve_model_messages` now calls `load_context_messages` (loop_support/lib.rs:881-890) which preserves the side channel via `context_messages_by_id` in both `in_memory.rs:621` and `db.rs:1096`. `list_thread_history` is only used for summary artifacts.
Zaki B	Per-iteration `CapabilityFilter` not applied to `tool_definitions`	Resolved — new `CapabilitySurfaceVisibleFilter` (capability_surface_filter.rs:21-118) wraps the inner port; `tool_definitions`, `register_provider_tool_call`, `visible_capabilities`, and both `invoke_capability*` paths all enforce the model-visible set.
serrrfirat #1 (latest)	Synthetic provider error refs not recorded in loop state	Still open — see below.
serrrfirat #2 (latest)	Multi-tool provider registration not atomic	Partially addressed — see below.

Still blocking

A. Synthetic provider error refs are appended to the transcript but never pushed into `state.result_refs`

faa93b84 does not touch crates/ironclaw_agent_loop/src/executor.rs, so serrrfirat's earlier finding is unchanged. Concretely:

append_capability_safe_summary_ref (executor.rs:1710-1726) writes the synthetic result:provider-error-… ref to the host transcript via host.append_capability_result_ref(...), but the only writer of state.result_refs is push_completed_result (executor.rs:1776-1785), which runs solely on the CapabilityOutcome::Completed branch. Every call site at executor.rs:791, 803, 821, 937, 989, 1003, 1031 calls append_capability_*_ref for a provider-backed non-success outcome (SkipResult, Abort, policy-denied Retry, gate SkipAndContinue / Abort, final fallback, unsupported process wait) and only updates state.recovery_state / state.gate_state — never state.result_refs.

Net effect: the transcript contains the synthetic ref (so model replay reconstructs the assistant tool-call shape via provider_tool_roundtrip_messages), but loop accounting (TurnSummary.batch_result_refs derived at executor.rs:460 from state.result_refs[result_refs_start..], final LoopExit.result_refs at line 1387, and checkpointed loop state) is silently missing those refs. Mixed success/failure provider batches will produce a LoopExit.result_refs whose length differs from the number of provider tool-call results, and downstream consumers (telemetry, batch accounting, retry/resumption logic) see fewer refs than were actually emitted.

The fix is either to have append_capability_safe_summary_ref return the LoopResultRef and push it into state.result_refs at each provider-backed non-success call site, or to wrap the append in a helper that does both atomically.

Test gap that masked this: the test at executor.rs:3387 only asserts host.appended_result_refs() (transcript view), not completed.result_refs (loop-state view) on a denied provider call.

B. Multi-tool provider registration is still not atomic for non-name failure modes

faa93b84 adds a useful upfront check at model_gateway.rs:784-799 that all tool calls in the batch are in advertised_tool_names before any registration, and the new test gateway_rejects_unknown_provider_tool_call_before_registration confirms capabilities.registered.lock().unwrap().is_empty() after a name-mismatch rejection. That closes the unknown-name path cleanly.

But after that gate passes, tool_response_to_host still calls register_provider_tool_call in a sequential loop (model_gateway.rs:800-814), and register_provider_tool_call (capability_port.rs:566-623) has several failure modes that fire after prior calls in the batch have already staged input via input_resolver.register_provider_tool_call_input(...):

validate_provider_tool_call (capability_port.rs:571) — turn-id/call-id/name length, 16 KiB arguments cap, JSON nesting depth, NUL/control chars, sensitive-marker rejection, sk- token rejection. Any of these only catch the offending call.
current_snapshot() returning None between iterations (StaleSurface).
provider_names/capabilities map miss (capability_port.rs:584-594).
provider_schema_is_usable (capability_port.rs:596).
input_resolver.register_provider_tool_call_input itself returning an error on a later call.

Scenario: a model emits two correctly-named tool calls, the second of which carries a 20 KiB arguments blob. Upfront name check passes. Call 1 validates and stages an input ref via the input resolver. Call 2 fails validate_provider_arguments. The whole response is rejected, but call 1's input ref is orphaned in the resolver. A retry can stage duplicates.

Either prevalidate the whole batch (run validate_provider_tool_call plus the visibility/schema checks on each tool_call in a first pass before any side effects), or move registration behind a batch API that can roll back / dedupe across the full provider turn. The current test only exercises the upfront name-check path; a partial-failure test against arguments-size / sensitive-content / unusable-schema would have caught this and should land with the fix.

Smaller observations (non-blocking)

validate_provider_replay_identity (model_gateway.rs:984-1003) now runs provider_call.validate() first — good. It only matches provider_id / provider_model_id, not turn/call scope, which is the right call (turn/call IDs are provider-emitted and have no host-side expected value), but worth pinning in a doc-comment so a future reader doesn't try to "tighten" it into something that breaks replay.
CapabilitySurfaceVisibleFilter::register_provider_tool_call (capability_surface_filter.rs:56-85) does a TOCTOU-style check: read tool_definitions() to map name→capability, then call inner.register_provider_tool_call. The post-call permits check on candidate.capability_id (line 78) catches the visibility-change race, but the inner registration's side effects (input-resolver staging) still happen — same orphan shape as B above, just one layer up. Easier to address once B has a batch-validation primitive.
The previously-flagged "synthetic ref written but state.result_refs not updated" test gap could be closed with a single new test asserting LoopExit::Completed { result_refs, .. } after a mixed-success provider batch.

Verification

I read the diff and the files at PR head faa93b84; I did not re-run the test suite locally. CI is still pending on this head (status pending, no checks reported yet).

Verdict

Findings A (history fallback) and B (per-iteration filter) from the previous round are cleanly resolved. The most recent serrrfirat findings are not fully resolved by faa93b84 despite the "all findings addressed" summary comment:

Block on A (loop-state result_refs missing synthetic provider error refs — correctness regression for loop accounting on every provider-backed non-success outcome).
Block on B (atomicity gap remains for argument-size / sensitive-content / unusable-schema failures after the first registered call, even with the new upfront name check).

Both are tractable; B in particular is largely a prevalidate-then-stage refactor of the existing per-call validation in register_provider_tool_call.

Generated by Claude Code

henrypark133 · 2026-05-18T19:48:37Z

Addressed the open review feedback from #3722 (review) in 421a939a5.

What changed:

synthetic provider error refs now update loop-state result_refs at the same time they append transcript refs, so TurnSummary, checkpoints, and final LoopExit accounting include provider-backed non-success outcomes
provider tool-call batches now prevalidate every provider call through the capability port/filter stack before any registration or input staging, closing the partial-registration orphan case
duplicate append_tool_result_reference retries now return existing finalized tool-result refs before validating incoming provider metadata in both in-memory and durable stores
provider replay conversion now keeps same-turn provider tool results grouped even when a plain tool-result reference is interleaved

Regression coverage added for:

denied provider call final/checkpointed result_refs
invalid second provider tool call with no prior registration
in-memory and libSQL idempotent retry with invalid duplicate provider metadata
interleaved plain/provider tool-result replay grouping

Local validation:

cargo fmt --check
git diff --check
cargo test -p ironclaw_agent_loop
cargo test -p ironclaw_loop_support
cargo test -p ironclaw_reborn --test loop_driver_host
cargo test -p ironclaw_threads
cargo test -p ironclaw_threads --features libsql --test db_session_thread_contract libsql_persists_thread_history_and_context_across_reopen
cargo test -p ironclaw_reborn --features root-llm-provider --test llm_gateway
cargo clippy -p ironclaw_turns -p ironclaw_agent_loop -p ironclaw_loop_support -p ironclaw_reborn --tests -- -D warnings
cargo clippy -p ironclaw_threads --features libsql --tests -- -D warnings
cargo clippy -p ironclaw_reborn --features root-llm-provider --test llm_gateway -- -D warnings

GitHub checks have started on the new head and are still in progress.

serrrfirat

Multi-agent review of nearai/ironclaw#3722 at head 421a939a5bfd28bb21b425145fc5e2424e8b8f4a.

I found 4 medium-severity issues that should be addressed before merge:

crates/ironclaw_agent_loop/src/executor.rs:273 - Filtered-out capabilities can still be materialized into the model prompt.

The executor applies the planner capability filter to surface and passes the filtered IDs only as capability_view for provider tool definitions, but the prompt request still carries only the unchanged surface.version. Prompt construction can resolve that version through the host surface lookup and materialize the pre-strategy host/profile surface, so the model may receive names/descriptions for capabilities removed from this iteration. Bind prompt construction to the filtered surface as well, either by carrying filtered capability IDs/surface in the prompt request or by making the prompt authority resolve this request against the filtered descriptor set. Add a regression that asserts both the materialized prompt and provider tool definitions exclude a denied capability.
crates/ironclaw_loop_support/src/capability_port.rs:694 - Capability surface snapshot locks are acquired in opposite orders.

current_snapshot() locks current_surface_version and then snapshots, while visible_capabilities() locks snapshots and then current_surface_version. Concurrent model/tool surface access can deadlock the shared port. Put the current version and snapshots behind a single mutex, or enforce one lock acquisition order everywhere.
crates/ironclaw_loop_support/src/capability_port.rs:863 - Provider tool names are not constrained to the LLM provider contract.

provider_tool_name() only replaces dots and otherwise exposes the capability ID as the provider tool name when it is <= 256 bytes. The inbound validator even allows ., :, _, and -, but common provider tool/function name contracts are stricter than this. A valid IronClaw capability ID can therefore produce a tool definition that a provider rejects before the model sees the visible tool surface. Normalize provider tool names at this boundary to the strict shared provider-safe contract and use a digest suffix for truncation/collisions; validate inbound provider tool names against the same advertised-name contract.
crates/ironclaw_threads/src/db.rs:769 - Idempotent tool-result replay can permanently drop provider replay metadata.

append_tool_result_reference() returns an existing finalized record based only on turn_run_id and tool_result_ref before validating or comparing the new provider_call. If an older/fallback write created that record with tool_result_provider_call: None, a retry with provider metadata now available will return the stale record and never backfill the side channel. Backfill None -> Some(provider_call) on idempotent replay, and fail loudly if both sides are Some but differ. Mirror the same behavior in the in-memory store.

henrypark133 · 2026-05-18T20:29:20Z

Addressed the four latest serrrfirat review findings in 3f78ca9be:

Prompt construction now receives the same filtered capability_view as provider tool definitions, so dynamic capability filters hide tools from both the materialized prompt and provider tool surface.
HostRuntimeLoopCapabilityPort::current_snapshot now uses the same lock order as visible_capabilities to avoid the snapshot/current-version deadlock.
Provider tool names are normalized and validated against a strict provider-safe contract ([A-Za-z0-9_-], 64 bytes max) with digest suffixes for truncation/collisions; inbound provider tool calls and stored provider refs use the same contract.
Idempotent tool-result append now backfills None -> Some(provider_call) on retry and rejects conflicting provider metadata in both durable and in-memory stores.

Validation run:

cargo fmt --check
git diff --check
cargo test -p ironclaw_turns -p ironclaw_agent_loop -p ironclaw_loop_support -p ironclaw_threads
cargo test -p ironclaw_threads --features libsql --test db_session_thread_contract libsql_persists_thread_history_and_context_across_reopen
cargo clippy -p ironclaw_turns -p ironclaw_agent_loop -p ironclaw_loop_support -p ironclaw_threads --tests -- -D warnings

serrrfirat

Re-review against current head 3f78ca9b, focused on the follow-up commits after the prior faa93b84 review and the latest automated review at 421a939a.

The shape is right: the follow-up now threads the per-iteration capability_view into both prompt construction and provider tool definitions, pre-validates provider tool batches before registration, constrains provider tool names to the stricter provider-safe contract, fixes the capability-port lock ordering, and backfills/rejects idempotent provider replay metadata consistently across in-memory and DB thread stores. The targeted test additions cover the intended invariants.

One thing still blocks merge:

The current head does not compile under the PR checks

Both failing CI jobs are stopped by the same Rust compile error: LoopPromptBundleRequest gained required field capability_view, but crates/ironclaw_reborn/tests/loop_driver_host.rs:2750 still initializes it without that field in the inline-message rejection test.

This is caller-visible in the quality gate rather than a runtime design issue, but it blocks all downstream verification: Clippy (all-features) and Reborn CLI smoke tests both fail with E0063: missing field capability_view in initializer of LoopPromptBundleRequest at loop_driver_host.rs:2758 in the current CI run.

Please add capability_view: None to that initializer and rerun the affected checks. A good minimal verification is the failing loop_driver_host test target plus the existing provider-tool replay tests.

Verification I checked:

GitHub CI logs for run 26058596870: Clippy (all-features) and Reborn CLI smoke tests both fail on the missing field above.
Local focused check passed: cargo test -p ironclaw_reborn --features root-llm-provider --test llm_gateway gateway_rejects_invalid_provider_tool_batch_before_any_registration -- --exact.

Net: prior provider replay correctness issues look addressed on current head, but this cannot merge until the test compile failure is fixed.

henrypark133 · 2026-05-18T21:05:15Z

Fixed the compile blocker from review 4313712547 in 8c9511603.

Change: added the missing capability_view: None to the inline-message LoopPromptBundleRequest initializer in crates/ironclaw_reborn/tests/loop_driver_host.rs.

Validation run from an isolated worktree:

cargo fmt --check
git diff --check
cargo test -p ironclaw_reborn --test loop_driver_host text_only_host_prompt_rejects_inline_messages -- --exact
cargo test -p ironclaw_reborn --features root-llm-provider --test llm_gateway gateway_rejects_invalid_provider_tool_batch_before_any_registration -- --exact

serrrfirat

Multi-agent review of nearai/ironclaw#3722 at head 3f78ca9bee4a640b25ece2babcc2e3d0ba2fe144 with --force.

Reviewers completed: security, bugs, tests, conventions. The performance/concurrency reviewer failed to inspect the checkout due to a subprocess launch error, so I excluded it rather than treating it as clean.

Findings kept after validation:

High - provider-backed tool calls cannot stage their input on the live HostRuntime capability path.
High - current head fails CI because one LoopPromptBundleRequest initializer still omits capability_view.
Medium - adjacent provider tool results from different turns need a regression test.
Medium - FinishReason::ToolUse with an empty tool-call list needs a gateway regression test.
Medium - run-profile provider tool advertisement filtering needs a regression test.
Medium - provider replay identifiers remain raw Strings across internal module boundaries.

I dropped one conventions finding as a false positive: durable thread persistence does retain tool_result_provider_call through to_message_json, and the DB contract tests assert reload behavior.

serrrfirat · 2026-05-18T21:06:19Z

+        let prepared = self.prepare_provider_tool_call(&tool_call)?;
+        let input_ref = self
+            .input_resolver
+            .register_provider_tool_call_input(&self.run_context, &tool_call)


HostRuntimeLoopCapabilityPort::register_provider_tool_call now calls register_provider_tool_call_input, but LoopCapabilityInputResolver only provides a default implementation that returns InvalidInvocation, and the concrete resolver in this checkout does not override it. The gateway can advertise provider tools and accept a provider tool call, but registration fails before producing a CapabilityCallCandidate, so the live provider-tool-call path cannot execute. Please implement provider tool-call argument staging for the resolver wired into HostRuntimeLoopCapabilityPortFactory, or make this trait method required and provide the concrete store-backed implementation.

serrrfirat · 2026-05-18T21:06:19Z

+                        continue;
+                    };
+                    validate_provider_replay_identity(&next_provider_call, replay_identity)?;
+                    if next_provider_call.provider_turn_id != provider_turn_id {


The replay converter breaks the current provider-tool group when the next tool result has a different provider_turn_id, but the added gateway tests cover same-turn grouping and plain-result interleaving only. Please add a regression such as gateway_separates_consecutive_provider_tool_results_from_different_turns with adjacent ToolResult messages for turn_1 then turn_2, so a future change cannot merge turns or skip the second replay.

serrrfirat · 2026-05-18T21:06:19Z

+            HostManagedModelErrorKind::PolicyDenied,
+            "model response was blocked by provider policy",
+        )),
+        FinishReason::ToolUse => Err(HostManagedModelError::safe(


This new error branch for FinishReason::ToolUse with no tool calls is not covered by the current gateway tests. Please add a regression such as gateway_with_tool_surface_rejects_tool_use_finish_without_tool_calls where complete_with_tools returns ToolUse and an empty tool_calls list.

serrrfirat · 2026-05-18T21:06:19Z

+
 #[async_trait]
 impl LoopCapabilityPort for CapabilitySurfaceProfileFilter {
+    fn tool_definitions(&self) -> Result<Vec<ProviderToolDefinition>, AgentLoopHostError> {


CapabilitySurfaceProfileFilter::tool_definitions now filters provider tool advertisements by the run-profile allow set, but the adjacent tests only verify denied registration. Please add a regression such as profile_filter_tool_definitions_excludes_denied_capabilities that asserts the provider tool definitions expose only allowlisted capabilities.

serrrfirat · 2026-05-18T21:06:19Z

+#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
+pub struct ProviderToolCallReplay {
+    /// Provider identity selected by the host route.
+    pub provider_id: String,


Provider replay metadata is now carried across turns, loop support, Reborn, and threads as raw String identifiers (provider_id, provider_model_id, turn/call IDs, and provider tool names). That conflicts with .claude/rules/types.md, which requires boundary strings to be converted into specialized domain types before flowing between internal modules. Please introduce validated newtypes for these provider replay identifiers in the owning contract layer and use them through the replay structs.

github-actions Bot added size: XL 500+ changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels May 17, 2026

gemini-code-assist Bot reviewed May 17, 2026

View reviewed changes

Comment thread crates/ironclaw_loop_support/src/capability_port.rs Outdated

Comment thread crates/ironclaw_loop_support/src/capability_port.rs Outdated

Comment thread crates/ironclaw_loop_support/src/capability_port.rs Outdated

Comment thread crates/ironclaw_reborn/src/model_gateway.rs Outdated

henrypark133 requested a review from Copilot May 17, 2026 01:18

Copilot started reviewing on behalf of henrypark133 May 17, 2026 01:19 View session

chatgpt-codex-connector Bot reviewed May 17, 2026

View reviewed changes

Comment thread crates/ironclaw_loop_support/src/capability_port.rs Outdated

Comment thread crates/ironclaw_reborn/src/model_gateway.rs

Copilot AI reviewed May 17, 2026

View reviewed changes

henrypark133 force-pushed the reborn/3620-provider-tool-calls branch from 127b424 to 3f4e21c Compare May 17, 2026 02:02

zmanian reviewed May 17, 2026

View reviewed changes

henrypark133 force-pushed the reborn/3620-provider-tool-calls branch from c6c6925 to be5888a Compare May 17, 2026 05:36

serrrfirat requested changes May 17, 2026

View reviewed changes

zmanian reviewed May 17, 2026

View reviewed changes

github-actions Bot mentioned this pull request May 18, 2026

🦞 OpenClaw 生态日报 2026-05-18 gsscsd/big_model_radar#362

Open

henrypark133 added 5 commits May 17, 2026 23:41

fix(reborn): preserve provider tool roundtrip metadata

1004638

fix(reborn): harden provider tool replay

5d491d4

fix(reborn): tighten provider replay identity

55cb5b6

fix(reborn): preserve replay metadata in context fallback

caaf6bf

fix(reborn): constrain provider tools to visible view

2eea296

henrypark133 force-pushed the reborn/3620-provider-tool-calls branch from 464aa31 to 2eea296 Compare May 18, 2026 07:14

henrypark133 requested review from serrrfirat and zmanian May 18, 2026 07:34

This was referenced May 18, 2026

Reborn: revise and implement binary-E2E test framework plan #3702

Open

Reborn: convert provider tool calls into ParentLoopOutput::CapabilityCalls #3620

Open

serrrfirat mentioned this pull request May 18, 2026

Reborn: verify internal tool-result completion evidence and result refs #3622

Open

Base automatically changed from reborn/3622-tool-result-evidence to reborn-integration May 18, 2026 14:29

serrrfirat added a commit that referenced this pull request May 18, 2026

fix(reborn): address provider replay review findings (#3722)

764f5b9

fix(reborn): address provider replay review findings (#3722)

faa93b8

serrrfirat force-pushed the reborn/3620-provider-tool-calls branch from 764f5b9 to faa93b8 Compare May 18, 2026 15:29

henrypark133 commented May 18, 2026

View reviewed changes

zmanian requested changes May 18, 2026

View reviewed changes

fix(reborn): address provider replay review feedback

421a939

serrrfirat requested changes May 18, 2026

View reviewed changes

fix(reborn): address provider tool review follow-up

3f78ca9

henrypark133 requested a review from serrrfirat May 18, 2026 20:30

serrrfirat requested changes May 18, 2026

View reviewed changes

fix(reborn): add missing prompt capability view

8c95116

serrrfirat requested changes May 18, 2026

View reviewed changes

Conversation

henrypark133 commented May 17, 2026

Summary

Verification

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

henrypark133 commented May 17, 2026

Uh oh!

zmanian left a comment

Choose a reason for hiding this comment

Metadata isolation — looks correct

Multi-tool grouping — partial-failure replay is wrong

Cross-provider replay — no identity check

Schema gate vs invocation surface drift

Stacking on #3720

Smaller items

Uh oh!

henrypark133 commented May 17, 2026

Uh oh!

henrypark133 commented May 17, 2026

Uh oh!

serrrfirat left a comment

Choose a reason for hiding this comment

Uh oh!

zmanian left a comment

Choose a reason for hiding this comment

Prior findings — status

New findings (still blocking)

Validation

Verdict

Uh oh!

henrypark133 commented May 18, 2026

Uh oh!

serrrfirat commented May 18, 2026

Uh oh!

henrypark133 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review (multi-agent)

Uh oh!

henrypark133 commented May 18, 2026

Review findings summary

Already resolved by commits on this branch

Duplicates (same issue, already resolved)

Suppressed (low-confidence, no action needed)

Uh oh!

henrypark133 left a comment

Choose a reason for hiding this comment

Uh oh!

zmanian left a comment

Choose a reason for hiding this comment

Prior findings — status check

Still blocking

A. Synthetic provider error refs are appended to the transcript but never pushed into state.result_refs

B. Multi-tool provider registration is still not atomic for non-name failure modes

henrypark133 commented May 18, 2026 •

edited

Loading

A. Synthetic provider error refs are appended to the transcript but never pushed into `state.result_refs`