Skip to content

fix: harden openai-compatible provider, approval replay, and embeddings defaults#237

Merged
ilblackdragon merged 10 commits intomainfrom
takeover/112-approval-replay-embeddings
Feb 19, 2026
Merged

fix: harden openai-compatible provider, approval replay, and embeddings defaults#237
ilblackdragon merged 10 commits intomainfrom
takeover/112-approval-replay-embeddings

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon commented Feb 19, 2026

Summary

Continuation of #112 by @panosAthDBX.

Hardens the OpenAI-compatible chat provider, adds multi-tool approval replay with deferred tool calls, wires Ollama embeddings, and fixes several robustness issues across LLM providers.

Changes included (from original PR)

  • Deferred tool call replay: When a multi-tool LLM response requires approval for one tool, remaining tools are queued and replayed after approval (dispatcher.rs, thread_ops.rs, session.rs)
  • Tool message sanitization (sanitize_tool_messages): Rewrites orphaned tool_result messages as user messages to prevent HTTP 400 from Anthropic/others. Applied consistently across all providers
  • Ollama embeddings: New OllamaEmbeddings provider with configurable model/dimension
  • Flexible embedding dimensions: PostgreSQL migration V9 removes fixed 1536-dim constraint
  • Approval aliases: /approve, /always, /deny, a, n etc. in submission parser
  • NearAI chat provider: Configurable tool-message flattening toggle, smarter URL construction
  • Config additions: OPENAI_BASE_URL, ANTHROPIC_BASE_URL for proxy support; OLLAMA_BASE_URL for embeddings
  • Sandbox default: Changed default image from ghcr.io/nearai/sandbox:latest to ironclaw-worker:latest
  • Web gateway: thread_id on ApprovalNeeded SSE events for thread-scoped filtering
  • REPL: Forward /quit as message so agent loop exits when other channels are active
  • Gateway: Wire with_llm_provider for OpenAI-compatible API proxy

Composable RetryProvider & LLM module hardening

  • RetryProvider decorator (retry.rs): Composable wrapper for any LlmProvider with exponential backoff + jitter. Respects RateLimited { retry_after } hints. Wired into main.rs composition chain so each provider retries independently before failover.
  • Removed openai_compatible_chat.rs: Replaced by rig adapter + RetryProvider. The rig-based adapter now handles all OpenAI-compatible endpoints (including custom base URLs) with tool name normalization moved to rig_adapter.rs.
  • Removed internal retry loops: Both nearai.rs and nearai_chat.rs no longer have internal retry loops — retries are handled by the external RetryProvider wrapper, eliminating double-retry (was up to 16 attempts, now correctly 4).
  • Error classification reconciled: is_retryable() (retry/failover) and is_transient() (circuit breaker) now have clear, documented semantics. ModelNotAvailable is no longer retryable; Json no longer trips the circuit breaker.
  • Fixed Duration subtraction panic in circuit_breaker.rs — uses checked_sub to avoid panic on TOCTOU race.
  • Failover uses shared is_retryable() from retry.rs instead of duplicating classification logic.

Changes from original

  • Merged with latest main (resolved 4 conflicts: CHANGELOG.md, config.rs→config/, agent_loop.rs→dispatcher+thread_ops, llm/mod.rs)
  • Ported deferred tool call logic from monolithic agent_loop.rs to new split module architecture (dispatcher.rs + thread_ops.rs)
  • Ported config changes from deleted src/config.rs to new modular config/ directory (llm.rs, embeddings.rs, sandbox.rs)
  • Fixed ChatCompletionResponse.id to Option<String> (reviewer feedback: some providers omit this field)
  • Fixed unwrap_or_else(|_| Client::new()) to properly propagate errors (reviewer feedback: silently dropped timeout config)
  • Added EMBEDDING_DIMENSION env var with smart per-model defaults instead of hardcoding 768/1536
  • Full /review-crate @src/llm/ audit — fixed all Medium findings

Original PR

#112 - fix: harden openai-compatible provider, approval replay, and embeddings defaults

Review comments addressed

  • ChatCompletionResponse.id changed from String to Option<String> with #[serde(default)]
  • Ollama embedding dimension no longer hardcoded; uses EMBEDDING_DIMENSION env var or infers from model name
  • unwrap_or_else(|_| Client::new()) replaced with proper error propagation
  • libSQL F32_BLOB(1536) schema noted as known limitation

Test plan

  • All 1265 tests pass (1211 lib + 54 integration/doc)
  • cargo clippy --all --all-features clean (zero warnings)
  • cargo fmt clean
  • New tests: RetryProvider (6 tests), approval aliases, tool name normalization, URL construction, usage parsing, tool message sanitization, error classification

Co-Authored-By: panosAthDBX panosAthDBX@users.noreply.github.com

Generated with Claude Code

panosAthDBX and others added 5 commits February 16, 2026 17:22
…ddings

# Conflicts:
#	CHANGELOG.md
#	src/agent/agent_loop.rs
#	src/config.rs
#	src/llm/mod.rs
- Make ChatCompletionResponse.id Optional<String> to handle providers
  that omit or null the field
- Propagate HTTP client builder errors instead of silently dropping
  timeout configuration (openai_compatible_chat, nearai_chat)
- Add EMBEDDING_DIMENSION env var with smart per-model defaults instead
  of hardcoding 768/1536 everywhere
- Remove duplicated dimension inference logic from main.rs

Co-Authored-By: panosAthDBX <panosAthDBX@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @ilblackdragon, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the robustness and flexibility of the LLM integration by hardening the OpenAI-compatible chat provider, introducing a sophisticated multi-tool approval replay mechanism, and expanding embedding capabilities to include Ollama and flexible dimensions. These changes improve compatibility with various LLM services, streamline complex agentic workflows, and provide greater control over embedding configurations, ultimately leading to a more reliable and adaptable system.

Highlights

  • Enhanced OpenAI-Compatible Provider: Introduced a new OpenAiCompatibleChatProvider with robust retry logic, tool-call name normalization, and improved usage parsing, allowing for flexible base URL configurations for OpenAI and Anthropic.
  • Multi-Tool Approval Replay: Implemented deferred tool call logic, enabling the system to queue and replay subsequent tool calls after a preceding tool call requires and receives approval, ensuring all tool_use IDs get matching tool_result messages.
  • Flexible Embeddings and Ollama Support: Added support for Ollama embeddings with configurable models and dimensions, and introduced a database migration to allow flexible embedding vector dimensions in PostgreSQL, removing the fixed 1536-dim constraint.
  • Improved Tool Message Sanitization: Implemented a sanitize_tool_messages function across all LLM providers to rewrite orphaned tool_result messages as user messages, preventing HTTP 400 errors from providers like Anthropic.
  • Expanded Approval Aliases: Added more intuitive aliases for approval responses (e.g., /approve, /always, /deny) in the submission parser for a smoother user experience.
Changelog
  • CHANGELOG.md
    • Added OpenAiCompatibleChatProvider and wired OpenAI-compatible chat completion routing for custom base URL usage.
    • Added Ollama embeddings provider support (EMBEDDING_PROVIDER=ollama, OLLAMA_BASE_URL) in workspace embeddings.
    • Added migration V9__flexible_embedding_dimension.sql for flexible embedding vector dimensions.
    • Changed default sandbox image to ironclaw-worker:latest in config/settings/sandbox defaults.
    • Improved tool-message sanitization and provider compatibility handling across NEAR AI, rig adapter, and shared LLM provider code.
    • Fixed approval-input aliases (a, /approve, /always, /deny, etc.) in submission parsing.
    • Fixed multi-tool approval resume flow by preserving and replaying deferred tool calls so all prior tool_use IDs receive matching tool_result messages.
    • Fixed REPL quit/exit handling to route shutdown through the agent loop for graceful termination.
  • migrations/V9__flexible_embedding_dimension.sql
    • Added a new migration to allow embedding vectors of any dimension by altering the memory_chunks.embedding column type to vector and recreating dependent views.
  • src/agent/dispatcher.rs
    • Modified the tool call execution loop to store and process deferred tool calls when approval is needed for a multi-tool response.
  • src/agent/session.rs
    • Extended the PendingApproval struct to include deferred_tool_calls for replaying queued tool actions.
  • src/agent/submission.rs
    • Expanded the set of recognized aliases for approval responses, including slash commands like /approve and /deny.
  • src/agent/thread_ops.rs
    • Implemented logic to handle and replay deferred tool calls after an approval response, including re-checking approval for each deferred call.
  • src/channels/repl.rs
    • Updated the /quit and /exit commands to forward a shutdown message to the agent loop, ensuring graceful termination.
  • src/channels/web/mod.rs
    • Added thread_id to ApprovalNeeded Server-Sent Events (SSE) to enable thread-scoped filtering in the web gateway.
  • src/channels/web/static/app.js
    • Added a check to filter approval_needed events based on the current thread ID, preventing irrelevant approval prompts.
  • src/channels/web/types.rs
    • Added an optional thread_id field to the SseEvent::ApprovalNeeded enum for better event context.
  • src/config/embeddings.rs
    • Added ollama_base_url and dimension fields to EmbeddingsConfig and implemented a default_dimension_for_model function to infer embedding dimensions.
    • Updated the configuration resolution logic to support OLLAMA_BASE_URL and EMBEDDING_DIMENSION environment variables.
  • src/config/llm.rs
    • Added an optional base_url field to OpenAiDirectConfig and AnthropicDirectConfig to support custom proxy endpoints.
    • Updated the LLM configuration resolution to read OPENAI_BASE_URL and ANTHROPIC_BASE_URL.
  • src/config/sandbox.rs
    • Updated the default sandbox image to ironclaw-worker:latest.
  • src/llm/mod.rs
    • Added and exported the openai_compatible_chat module and OpenAiCompatibleChatProvider.
    • Modified create_openai_provider to use the new OpenAiCompatibleChatProvider when a custom OPENAI_BASE_URL is specified.
    • Updated create_anthropic_provider to utilize the configured base_url for Anthropic clients.
    • Refactored create_openai_compatible_provider to instantiate the new OpenAiCompatibleChatProvider.
  • src/llm/nearai.rs
    • Applied sanitize_tool_messages to incoming messages before processing them in the NearAiProvider.
  • src/llm/nearai_chat.rs
    • Added flatten_tool_messages configuration and a new_with_flatten constructor to control tool message handling.
    • Improved the api_url construction logic for better compatibility with various NEAR AI endpoints.
    • Applied sanitize_tool_messages to incoming messages and conditionally flattened tool messages based on configuration.
    • Added unit tests for api_url construction scenarios.
  • src/llm/openai_compatible_chat.rs
    • Added a new module implementing OpenAiCompatibleChatProvider for robust interaction with OpenAI-compatible chat completion endpoints, including retry logic, usage parsing, and tool-call name normalization.
  • src/llm/provider.rs
    • Added a sanitize_tool_messages utility function to convert orphaned tool_result messages into user messages, preventing API errors.
    • Added unit tests for the sanitize_tool_messages function.
  • src/llm/rig_adapter.rs
    • Applied sanitize_tool_messages to messages before converting them for the RigAdapter.
  • src/main.rs
    • Imported OllamaEmbeddings and updated the main application logic to initialize Ollama as an embedding provider.
    • Modified embedding provider initialization to use the newly introduced flexible embedding dimensions.
    • Wired the GatewayChannel with the LLM provider for enhanced functionality.
  • src/sandbox/config.rs
    • Updated the default sandbox image to ironclaw-worker:latest.
  • src/settings.rs
    • Updated the default sandbox image setting to ironclaw-worker:latest.
  • src/workspace/embeddings.rs
    • Added the OllamaEmbeddings struct and its implementation as an EmbeddingProvider, supporting Ollama's local embedding API.
    • Included methods for creating OllamaEmbeddings instances with specific models and dimensions.
  • src/workspace/mod.rs
    • Exported OllamaEmbeddings from the workspace module.
Activity
  • The pull request author, ilblackdragon, continued work from a previous PR (fix: harden openai-compatible provider, approval replay, and embeddings defaults #112) by panosAthDBX.
  • The changes were merged with the latest main branch, resolving 4 conflicts in CHANGELOG.md, config.rs, agent_loop.rs, and llm/mod.rs.
  • Deferred tool call logic was ported from a monolithic agent_loop.rs to the new dispatcher.rs and thread_ops.rs modules.
  • Configuration changes were moved from src/config.rs to the new modular config/ directory (llm.rs, embeddings.rs, sandbox.rs).
  • Reviewer feedback was addressed, including changing ChatCompletionResponse.id to Option<String> and fixing error propagation in unwrap_or_else calls.
  • The EMBEDDING_DIMENSION environment variable was introduced with smart per-model defaults, addressing feedback about hardcoded Ollama dimensions.
  • Test compilation issues were fixed by creating a test_nearai_config() helper for new NearAiConfig fields.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a substantial set of features and robustness improvements across the LLM provider landscape, including the addition of the OpenAiCompatibleChatProvider, support for Ollama embeddings, flexible embedding dimensions, and deferred tool call replay for multi-tool approvals. However, a high-severity security vulnerability was identified in the web UI's HTML sanitization logic. The custom regular expression-based sanitization is bypassable, which could lead to Cross-Site Scripting (XSS) and the theft of sensitive authentication tokens. It is strongly recommended to use a dedicated sanitization library like DOMPurify to mitigate this risk. Additionally, there are suggestions to improve maintainability by refactoring duplicated code and to enhance error handling for more robust provider interactions.

Comment thread src/agent/thread_ops.rs
Comment on lines +801 to +957
let mut deferred_queue = std::collections::VecDeque::from(deferred_tool_calls);
while let Some(tc) = deferred_queue.pop_front() {
// Re-check approval for each deferred tool call
if let Some(tool) = self.tools().get(&tc.name).await
&& tool.requires_approval()
{
let is_auto_approved = {
let sess = session.lock().await;
let mut approved = sess.is_tool_auto_approved(&tc.name);
if approved && tool.requires_approval_for(&tc.arguments) {
approved = false;
}
approved
};

if !is_auto_approved {
let new_pending = PendingApproval {
request_id: Uuid::new_v4(),
tool_name: tc.name.clone(),
parameters: tc.arguments.clone(),
description: tool.description().to_string(),
tool_call_id: tc.id.clone(),
context_messages: context_messages.clone(),
deferred_tool_calls: deferred_queue.iter().cloned().collect(),
};

let request_id = new_pending.request_id;
let tool_name = new_pending.tool_name.clone();
let description = new_pending.description.clone();
let parameters = new_pending.parameters.clone();

{
let mut sess = session.lock().await;
if let Some(thread) = sess.threads.get_mut(&thread_id) {
thread.await_approval(new_pending);
}
}

let _ = self
.channels
.send_status(
&message.channel,
StatusUpdate::Status("Awaiting approval".into()),
&message.metadata,
)
.await;

return Ok(SubmissionResult::NeedApproval {
request_id,
tool_name,
description,
parameters,
});
}
}

let _ = self
.channels
.send_status(
&message.channel,
StatusUpdate::ToolStarted {
name: tc.name.clone(),
},
&message.metadata,
)
.await;

let deferred_result = self
.execute_chat_tool(&tc.name, &tc.arguments, &job_ctx)
.await;

let _ = self
.channels
.send_status(
&message.channel,
StatusUpdate::ToolCompleted {
name: tc.name.clone(),
success: deferred_result.is_ok(),
},
&message.metadata,
)
.await;

if let Ok(ref output) = deferred_result
&& !output.is_empty()
{
let _ = self
.channels
.send_status(
&message.channel,
StatusUpdate::ToolResult {
name: tc.name.clone(),
preview: output.clone(),
},
&message.metadata,
)
.await;
}

// Record in thread
{
let mut sess = session.lock().await;
if let Some(thread) = sess.threads.get_mut(&thread_id)
&& let Some(turn) = thread.last_turn_mut()
{
match &deferred_result {
Ok(output) => turn.record_tool_result(serde_json::json!(output)),
Err(e) => turn.record_tool_error(e.to_string()),
}
}
}

// Auth detection for deferred tools
if let Some((ext_name, instructions)) =
detect_auth_awaiting(&tc.name, &deferred_result)
{
let auth_data = parse_auth_result(&deferred_result);
{
let mut sess = session.lock().await;
if let Some(thread) = sess.threads.get_mut(&thread_id) {
thread.enter_auth_mode(ext_name.clone());
thread.complete_turn(&instructions);
}
}
let _ = self
.channels
.send_status(
&message.channel,
StatusUpdate::AuthRequired {
extension_name: ext_name,
instructions: Some(instructions.clone()),
auth_url: auth_data.auth_url,
setup_url: auth_data.setup_url,
},
&message.metadata,
)
.await;
return Ok(SubmissionResult::response(instructions));
}

let deferred_content = match deferred_result {
Ok(output) => {
let sanitized =
self.safety().sanitize_tool_output(&tc.name, &output);
self.safety().wrap_for_llm(
&tc.name,
&sanitized.content,
sanitized.was_modified,
)
}
Err(e) => format!("Error: {}", e),
};

context_messages
.push(ChatMessage::tool_result(&tc.id, &tc.name, deferred_content));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication between this block for handling deferred tool calls and the main tool execution loop in dispatcher.rs. Both sections handle iterating through tool calls, checking for approvals, executing tools, and processing results.

This duplication can make maintenance difficult, as changes in one place might need to be manually synchronized in the other. Consider refactoring this shared logic into a helper function that can be called from both dispatcher.rs and here. This would centralize the tool execution flow, improving maintainability and reducing the risk of inconsistencies.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deferred tool call duplication is a valid observation. However, refactoring this into a shared helper is a larger architectural change that touches the core agent loop — it warrants its own PR. Filed as a follow-up.

Comment thread src/llm/openai_compatible_chat.rs Outdated
};

let status = response.status();
let response_text = response.text().await.unwrap_or_default();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of unwrap_or_default() here can hide potential errors when reading the response body. If response.text().await fails, this will proceed with an empty string, leading to a potentially misleading "JSON parse error" later on. It would be more robust to handle the Result from .text() and return a more specific error, which would aid in debugging issues with the response stream from the provider.

            let response_text = match response.text().await {
                Ok(text) => text,
                Err(e) => {
                    return Err(LlmError::RequestFailed {
                        provider: "openai_compatible_chat".to_string(),
                        reason: format!("Failed to read response body: {}", e),
                    });
                }
            };
References
  1. Ensuring robust error handling when reading API response bodies helps prevent client-side parsing failures, aligning with the principle that API endpoints should provide valid and consistent responses.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file (openai_compatible_chat.rs) was deleted in e285759 — replaced by the rig adapter + RetryProvider. The same unwrap_or_default() pattern existed in nearai.rs and nearai_chat.rs; fixed in all 4 call sites in 0c42321 with proper error propagation.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR hardens LLM provider robustness and embeddings flexibility while adding support for OpenAI-compatible endpoints and Ollama embeddings. It's a continuation of #112 that addresses reviewer feedback and merges with the latest main branch architecture changes.

Changes:

  • Adds OpenAI-compatible chat provider with retry logic, tool-call normalization, and robust usage parsing
  • Implements multi-tool approval replay with deferred tool calls to prevent orphaned tool_result protocol errors
  • Adds Ollama embeddings provider with configurable dimensions and removes fixed 1536-dim PostgreSQL constraint
  • Improves provider configuration with base URL overrides for proxies (OpenAI, Anthropic)
  • Adds tool message sanitization across all LLM providers to prevent HTTP 400 errors
  • Adds approval command aliases (/approve, /always, /deny, a, n) for better UX
  • Changes default sandbox image from ghcr.io/nearai/sandbox:latest to ironclaw-worker:latest
  • Adds thread_id to approval SSE events for better web gateway filtering
  • Fixes REPL quit forwarding to gracefully exit the agent loop

Reviewed changes

Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/llm/openai_compatible_chat.rs New OpenAI-compatible chat provider with retry, tool normalization, and robust usage parsing
src/llm/provider.rs Added sanitize_tool_messages function with comprehensive tests to prevent orphaned tool_result errors
src/llm/nearai_chat.rs Added configurable tool-message flattening and improved URL construction
src/llm/nearai.rs Integrated sanitize_tool_messages for robustness
src/llm/rig_adapter.rs Integrated sanitize_tool_messages before sending to Rig adapters
src/llm/mod.rs Wired OpenAI-compatible provider and added base URL support for OpenAI/Anthropic
src/workspace/embeddings.rs Added OllamaEmbeddings provider with configurable model and dimension
src/workspace/mod.rs Exported OllamaEmbeddings
src/config/embeddings.rs Added dimension inference from model names with EMBEDDING_DIMENSION override
src/config/llm.rs Added optional base_url to OpenAI and Anthropic configs for proxy support
src/config/sandbox.rs Changed default sandbox image to ironclaw-worker:latest
src/sandbox/config.rs Changed default sandbox image to ironclaw-worker:latest
src/settings.rs Changed default sandbox image to ironclaw-worker:latest
src/agent/dispatcher.rs Collects deferred tool calls when approval is needed for multi-tool responses
src/agent/thread_ops.rs Replays deferred tool calls after approval with proper approval re-checking
src/agent/session.rs Added deferred_tool_calls field to PendingApproval
src/agent/submission.rs Added approval aliases for better UX
src/channels/web/types.rs Added thread_id to ApprovalNeeded SSE event
src/channels/web/mod.rs Wired thread_id in approval events and added with_llm_provider
src/channels/web/static/app.js Added thread filtering for approval events
src/channels/repl.rs Forwards /quit command through agent loop for graceful shutdown
src/main.rs Updated embeddings setup with dimension config and Ollama support
migrations/V9__flexible_embedding_dimension.sql Removes fixed dimension constraint and HNSW index for flexible embeddings
CHANGELOG.md Documents all changes

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

let result: OllamaEmbedResponse = response.json().await.map_err(|e| {
EmbeddingError::InvalidResponse(format!("Failed to parse Ollama response: {}", e))
})?;

Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Ollama embeddings implementation doesn't validate that the returned embedding dimensions match the configured dimension. If the model returns embeddings with a different dimension than expected, this could cause silent failures when storing to the database or performing vector operations. Consider adding validation after receiving the response to ensure each embedding has the expected dimension.

Suggested change
// Validate that all returned embeddings match the configured dimension.
let expected_dim = self.dimension;
if let Some((index, actual_dim)) = result
.embeddings
.iter()
.enumerate()
.find_map(|(i, emb)| {
let len = emb.len();
if len != expected_dim {
Some((i, len))
} else {
None
}
})
{
return Err(EmbeddingError::InvalidResponse(format!(
"Ollama returned embedding with incorrect dimension at index {}: expected {}, got {}",
index, expected_dim, actual_dim
)));
}

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Added dimension validation in OllamaEmbeddings::embed_batch() — returns EmbeddingError::InvalidResponse if any returned embedding doesn't match the configured dimension. Fixed in 0c42321.

Comment thread src/main.rs
Comment on lines +120 to +133
.with_model(&config.embeddings.model, config.embeddings.dimension),
)),
"ollama" => Some(Arc::new(
ironclaw::workspace::OllamaEmbeddings::new(
&config.embeddings.ollama_base_url,
)
.with_model(&config.embeddings.model, config.embeddings.dimension),
)),
_ => {
if let Some(api_key) = config.embeddings.openai_api_key() {
let dim = match config.embeddings.model.as_str() {
"text-embedding-3-large" => 3072,
_ => 1536,
};
Some(Arc::new(ironclaw::workspace::OpenAiEmbeddings::with_model(
api_key,
&config.embeddings.model,
dim,
config.embeddings.dimension,
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The libSQL migrations still use F32_BLOB(1536) with a fixed dimension, meaning libSQL deployments cannot use embedding models with different dimensions (e.g., Ollama's 768-dim models or OpenAI's 3072-dim models). While the PR description notes this as a "known limitation," users who configure non-1536-dim models with libSQL will experience silent failures or dimension mismatches when storing embeddings. Consider adding runtime validation that checks if the configured embedding dimension matches the database schema dimension, and emitting a clear error or warning when there's a mismatch for libSQL deployments.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a runtime tracing::warn! at startup when libSQL backend is configured with a non-1536 embedding dimension, explaining the F32_BLOB(1536) schema limitation and suggesting PostgreSQL or EMBEDDING_DIMENSION=1536. Fixed in 0c42321.

ilblackdragon and others added 2 commits February 19, 2026 13:16
- Replace 9x .expect() on RwLock with graceful poison recovery
  (nearai.rs: 7, nearai_chat.rs: 2) — eliminates production panics
- Propagate HTTP client builder errors in nearai.rs instead of
  silently dropping timeout config (NearAiProvider::new now returns Result)
- Make nearai_chat ChatCompletionResponse.id Optional<String>
  (mirrors openai_compatible_chat.rs fix for providers that omit id)
- Make nearai_chat usage fields optional with defensive parse_usage()
  helper (was required u32 fields that crash on null/missing)
- Truncate error responses to 512 chars in nearai_chat.rs error
  messages to prevent log bloat and potential data leakage
- Delegate 4 missing LlmProvider methods in FailoverProvider
  (model_metadata, seed_response_chain, get_response_chain_id,
  calculate_cost) to last-used provider instead of trait defaults

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…en decorators

- Add composable RetryProvider decorator wrapping any LlmProvider with
  exponential backoff + jitter, respecting RateLimited retry_after hints
- Remove openai_compatible_chat.rs — replaced by rig adapter + RetryProvider
- Remove internal retry loop from nearai.rs (was causing double-retry
  with external RetryProvider, up to 16 attempts instead of 4)
- Remove internal retry loop from nearai_chat.rs (same issue)
- Wire RetryProvider into main.rs composition chain: each provider gets
  its own retry wrapper before failover
- Move normalize_tool_name to rig_adapter.rs for all rig-based providers
- Reconcile is_retryable() vs is_transient() error classification:
  ModelNotAvailable no longer retryable, Json no longer transient
- Fix unchecked Duration subtraction panic in circuit_breaker.rs
- Make failover.rs use shared is_retryable() from retry.rs
- Remove stale #[allow(dead_code)] on NearAiResponse::id (field is used)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 19, 2026 22:30
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

EmbeddingError::InvalidResponse(format!("Failed to parse Ollama response: {}", e))
})?;

Ok(result.embeddings)
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The OllamaEmbeddings provider doesn't validate that the returned embedding dimension matches the configured dimension. If the Ollama model returns embeddings of a different dimension than what was configured (e.g., due to model misconfiguration), this mismatch will only be caught later when storing to the database, leading to a confusing error. Consider adding validation after line 456 to check that each embedding in result.embeddings has length equal to self.dimension, and return an EmbeddingError::InvalidResponse if the dimension doesn't match.

Suggested change
Ok(result.embeddings)
let embeddings = result.embeddings;
for (i, emb) in embeddings.iter().enumerate() {
if emb.len() != self.dimension {
return Err(EmbeddingError::InvalidResponse(format!(
"Ollama returned embedding of dimension {}, expected {} at index {}",
emb.len(),
self.dimension,
i
)));
}
}
Ok(embeddings)

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicate of the Copilot comment above — fixed in 0c42321 with dimension validation in embed_batch().

ilblackdragon and others added 2 commits February 19, 2026 14:41
…n, libSQL warning

- Replace response.text().await.unwrap_or_default() with proper error
  propagation in nearai.rs and nearai_chat.rs (4 call sites). Failures
  now return LlmError::RequestFailed with context instead of silently
  proceeding with an empty string.
- Add embedding dimension validation in OllamaEmbeddings::embed_batch():
  returns EmbeddingError if Ollama returns embeddings with a dimension
  that doesn't match the configured value.
- Add runtime warning when libSQL backend is used with non-1536 embedding
  dimension, since the libSQL schema uses F32_BLOB(1536) and cannot store
  different-dimension vectors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-replay-embeddings

# Conflicts:
#	src/llm/failover.rs
#	src/llm/nearai_chat.rs
#	src/llm/rig_adapter.rs
#	src/main.rs
Copilot AI review requested due to automatic review settings February 19, 2026 22:55
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread CHANGELOG.md Outdated
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings February 19, 2026 23:04
@ilblackdragon ilblackdragon merged commit 097a26a into main Feb 19, 2026
3 checks passed
@ilblackdragon ilblackdragon deleted the takeover/112-approval-replay-embeddings branch February 19, 2026 23:05
@github-actions github-actions Bot mentioned this pull request Feb 19, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/agent/dispatcher.rs
Comment on lines +258 to 262
let mut idx = 0usize;
while idx < tool_calls.len() {
let mut tc = tool_calls[idx].clone();

// Check if tool requires approval
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switching to a manual while idx < tool_calls.len() loop requires ensuring idx is advanced on every control-flow path. There are continue branches later in this loop body (e.g. hook rejection / policy error) that will now skip idx += 1 and can cause an infinite loop on the same tool call.

Copilot uses AI. Check for mistakes.
Comment thread src/llm/retry.rs
Comment on lines +350 to +351
// Wait for at least 1 retry attempt (backoff is ~1s, so 1.5s should be enough)
tokio::time::sleep(Duration::from_millis(1500)).await;
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test introduces a 1.5s wall-clock sleep to wait for retry backoff, which will noticeably slow the suite and can be flaky under load. Consider making backoff delays configurable for tests, using an error with retry_after=Duration::ZERO, or using Tokio time control to advance time without real sleeps.

Suggested change
// Wait for at least 1 retry attempt (backoff is ~1s, so 1.5s should be enough)
tokio::time::sleep(Duration::from_millis(1500)).await;
// Wait briefly before flipping; this should allow at least one failure before retry
tokio::time::sleep(Duration::from_millis(10)).await;

Copilot uses AI. Check for mistakes.
Comment thread src/config/embeddings.rs
Comment on lines +68 to +76
let dimension = optional_env("EMBEDDING_DIMENSION")?
.map(|s| s.parse::<usize>())
.transpose()
.map_err(|e| ConfigError::InvalidValue {
key: "EMBEDDING_DIMENSION".to_string(),
message: format!("must be a positive integer: {e}"),
})?
.unwrap_or_else(|| default_dimension_for_model(&model));

Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EMBEDDING_DIMENSION is parsed as usize, but the code (and error message) implies it must be a positive integer. As written, 0 is accepted and will propagate into embedding providers / DB writes where a zero-dimension vector is invalid. Add an explicit > 0 validation and return ConfigError::InvalidValue when the parsed dimension is 0.

Suggested change
let dimension = optional_env("EMBEDDING_DIMENSION")?
.map(|s| s.parse::<usize>())
.transpose()
.map_err(|e| ConfigError::InvalidValue {
key: "EMBEDDING_DIMENSION".to_string(),
message: format!("must be a positive integer: {e}"),
})?
.unwrap_or_else(|| default_dimension_for_model(&model));
let dimension_opt = optional_env("EMBEDDING_DIMENSION")?
.map(|s| s.parse::<usize>())
.transpose()
.map_err(|e| ConfigError::InvalidValue {
key: "EMBEDDING_DIMENSION".to_string(),
message: format!("must be a positive integer: {e}"),
})?;
if let Some(0) = dimension_opt {
return Err(ConfigError::InvalidValue {
key: "EMBEDDING_DIMENSION".to_string(),
message: "must be a positive integer greater than zero".to_string(),
});
}
let dimension = dimension_opt.unwrap_or_else(|| default_dimension_for_model(&model));

Copilot uses AI. Check for mistakes.
Comment thread src/config/embeddings.rs
Comment on lines +64 to +66
let ollama_base_url = optional_env("OLLAMA_BASE_URL")?
.or_else(|| settings.ollama_base_url.clone())
.unwrap_or_else(|| "http://localhost:11434".to_string());
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New env vars (OLLAMA_BASE_URL, EMBEDDING_DIMENSION) were added for embeddings resolution. The env-mutating tests in this module should also clear these vars in the clear_embedding_env() helper to avoid flakiness when they are set in the surrounding test environment.

Copilot uses AI. Check for mistakes.
input: texts,
};

let url = format!("{}/api/embed", self.base_url);
Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OllamaEmbeddings builds the endpoint with format!("{}/api/embed", self.base_url). If base_url is configured with a trailing slash, this produces a //api/embed path which can 404 (depending on server path normalization / redirect behavior). Consider normalizing base_url (e.g., trimming trailing /) before formatting the endpoint URL.

Suggested change
let url = format!("{}/api/embed", self.base_url);
let base_url = self.base_url.trim_end_matches('/');
let url = format!("{}/api/embed", base_url);

Copilot uses AI. Check for mistakes.
@github-actions github-actions Bot mentioned this pull request Feb 20, 2026
ilblackdragon added a commit that referenced this pull request Feb 20, 2026
Port relevant changes from PR #112 that were not carried over to #237:

- Add persist_turn calls in process_approval for the response, error,
  and auth-required paths. Previously, turns completed after tool
  approval were never persisted to DB — if the process crashed after
  approval the entire turn (user message + assistant response) was lost.

- Add agent-level unit tests: StaticLlmProvider mock, make_test_agent
  helper, tests for auto-approval logic, destructive shell command
  detection, and PendingApproval backward-compatible deserialization
  (without deferred_tool_calls field).

- Remove unused _thread_state binding in process_approval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Feb 20, 2026
Port relevant changes from PR #112 that were not carried over to #237:

- Add persist_turn calls in process_approval for the response, error,
  and auth-required paths. Previously, turns completed after tool
  approval were never persisted to DB — if the process crashed after
  approval the entire turn (user message + assistant response) was lost.

- Add agent-level unit tests: StaticLlmProvider mock, make_test_agent
  helper, tests for auto-approval logic, destructive shell command
  detection, and PendingApproval backward-compatible deserialization
  (without deferred_tool_calls field).

- Remove unused _thread_state binding in process_approval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
ilblackdragon added a commit that referenced this pull request Feb 20, 2026
* fix: persist turns after approval and add agent-level tests

Port relevant changes from PR #112 that were not carried over to #237:

- Add persist_turn calls in process_approval for the response, error,
  and auth-required paths. Previously, turns completed after tool
  approval were never persisted to DB — if the process crashed after
  approval the entire turn (user message + assistant response) was lost.

- Add agent-level unit tests: StaticLlmProvider mock, make_test_agent
  helper, tests for auto-approval logic, destructive shell command
  detection, and PendingApproval backward-compatible deserialization
  (without deferred_tool_calls field).

- Remove unused _thread_state binding in process_approval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address 14 audit findings in src/agent/

Audit of the agent module found 2 High, 7 Medium, 3 Low, and 2 Nit
severity issues. This commit fixes all of them:

High:
- Remove 4 `.expect()` calls in session.rs (entry API, match, direct
  indexing, if-let) to eliminate panic paths in production
- Add typed RoutineError enum replacing Result<_, String> across
  routine.rs, routine_engine.rs, and callers in history/store.rs and
  db/libsql/mod.rs

Medium:
- Sanitize routine names in path construction to prevent directory
  traversal (routine_engine.rs)
- Log warnings for 5 silently-swallowed errors in scheduler.rs,
  compaction.rs, and worker.rs
- Extract shared handle_auth_intercept helper to deduplicate auth
  interception in thread_ops.rs
- Add session count warning threshold in session_manager.rs
- Make FullJob stub degradation visible via warn-level log and
  prepended warning in output

Low:
- Restrict dead code visibility with #[cfg(test)] on 19 unused items
  in submission.rs, task.rs, and undo.rs
- Narrow pub to pub(crate) on self_repair.rs builder methods
- Remove TaskStatus from mod.rs re-exports (test-only type)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review comments

- Reorder persist_turn before persist_response_chain so the
  conversation row exists before the metadata UPDATE runs
- Add persist_response_chain call to handle_auth_intercept so
  auth-required paths preserve the response chain
- Harden sanitize_routine_name to use allowlist (alphanumeric,
  dash, underscore) instead of denylist replacements
- Fix stale active_thread ID in get_or_create_thread: fall back
  to create_thread() when the stored ID is missing from the map
- Persist turn on approval rejection so user messages survive
  crashes after a tool is rejected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
jaswinder6991 pushed a commit to jaswinder6991/ironclaw that referenced this pull request Feb 26, 2026
…gs defaults (nearai#237)

* fix: harden openai-compatible tool flow and local defaults

* fix: close approval replay gaps and harden openai-compatible flow

* fix: address review feedback and code improvements (takeover nearai#112)

- Make ChatCompletionResponse.id Optional<String> to handle providers
  that omit or null the field
- Propagate HTTP client builder errors instead of silently dropping
  timeout configuration (openai_compatible_chat, nearai_chat)
- Add EMBEDDING_DIMENSION env var with smart per-model defaults instead
  of hardcoding 768/1536 everywhere
- Remove duplicated dimension inference logic from main.rs

Co-Authored-By: panosAthDBX <panosAthDBX@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: harden src/llm/ module from crate audit findings

- Replace 9x .expect() on RwLock with graceful poison recovery
  (nearai.rs: 7, nearai_chat.rs: 2) — eliminates production panics
- Propagate HTTP client builder errors in nearai.rs instead of
  silently dropping timeout config (NearAiProvider::new now returns Result)
- Make nearai_chat ChatCompletionResponse.id Optional<String>
  (mirrors openai_compatible_chat.rs fix for providers that omit id)
- Make nearai_chat usage fields optional with defensive parse_usage()
  helper (was required u32 fields that crash on null/missing)
- Truncate error responses to 512 chars in nearai_chat.rs error
  messages to prevent log bloat and potential data leakage
- Delegate 4 missing LlmProvider methods in FailoverProvider
  (model_metadata, seed_response_chain, get_response_chain_id,
  calculate_cost) to last-used provider instead of trait defaults

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(llm): add RetryProvider, remove openai_compatible_chat, harden decorators

- Add composable RetryProvider decorator wrapping any LlmProvider with
  exponential backoff + jitter, respecting RateLimited retry_after hints
- Remove openai_compatible_chat.rs — replaced by rig adapter + RetryProvider
- Remove internal retry loop from nearai.rs (was causing double-retry
  with external RetryProvider, up to 16 attempts instead of 4)
- Remove internal retry loop from nearai_chat.rs (same issue)
- Wire RetryProvider into main.rs composition chain: each provider gets
  its own retry wrapper before failover
- Move normalize_tool_name to rig_adapter.rs for all rig-based providers
- Reconcile is_retryable() vs is_transient() error classification:
  ModelNotAvailable no longer retryable, Json no longer transient
- Fix unchecked Duration subtraction panic in circuit_breaker.rs
- Make failover.rs use shared is_retryable() from retry.rs
- Remove stale #[allow(dead_code)] on NearAiResponse::id (field is used)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback — error handling, dimension validation, libSQL warning

- Replace response.text().await.unwrap_or_default() with proper error
  propagation in nearai.rs and nearai_chat.rs (4 call sites). Failures
  now return LlmError::RequestFailed with context instead of silently
  proceeding with an empty string.
- Add embedding dimension validation in OllamaEmbeddings::embed_batch():
  returns EmbeddingError if Ollama returns embeddings with a dimension
  that doesn't match the configured value.
- Add runtime warning when libSQL backend is used with non-1536 embedding
  dimension, since the libSQL schema uses F32_BLOB(1536) and cannot store
  different-dimension vectors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: panosAthDbx <packux@gmail.com>
Co-authored-by: panosAthDBX <127238517+panosAthDBX@users.noreply.github.com>
Co-authored-by: panosAthDBX <panosAthDBX@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
jaswinder6991 pushed a commit to jaswinder6991/ironclaw that referenced this pull request Feb 26, 2026
* fix: persist turns after approval and add agent-level tests

Port relevant changes from PR nearai#112 that were not carried over to nearai#237:

- Add persist_turn calls in process_approval for the response, error,
  and auth-required paths. Previously, turns completed after tool
  approval were never persisted to DB — if the process crashed after
  approval the entire turn (user message + assistant response) was lost.

- Add agent-level unit tests: StaticLlmProvider mock, make_test_agent
  helper, tests for auto-approval logic, destructive shell command
  detection, and PendingApproval backward-compatible deserialization
  (without deferred_tool_calls field).

- Remove unused _thread_state binding in process_approval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address 14 audit findings in src/agent/

Audit of the agent module found 2 High, 7 Medium, 3 Low, and 2 Nit
severity issues. This commit fixes all of them:

High:
- Remove 4 `.expect()` calls in session.rs (entry API, match, direct
  indexing, if-let) to eliminate panic paths in production
- Add typed RoutineError enum replacing Result<_, String> across
  routine.rs, routine_engine.rs, and callers in history/store.rs and
  db/libsql/mod.rs

Medium:
- Sanitize routine names in path construction to prevent directory
  traversal (routine_engine.rs)
- Log warnings for 5 silently-swallowed errors in scheduler.rs,
  compaction.rs, and worker.rs
- Extract shared handle_auth_intercept helper to deduplicate auth
  interception in thread_ops.rs
- Add session count warning threshold in session_manager.rs
- Make FullJob stub degradation visible via warn-level log and
  prepended warning in output

Low:
- Restrict dead code visibility with #[cfg(test)] on 19 unused items
  in submission.rs, task.rs, and undo.rs
- Narrow pub to pub(crate) on self_repair.rs builder methods
- Remove TaskStatus from mod.rs re-exports (test-only type)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review comments

- Reorder persist_turn before persist_response_chain so the
  conversation row exists before the metadata UPDATE runs
- Add persist_response_chain call to handle_auth_intercept so
  auth-required paths preserve the response chain
- Harden sanitize_routine_name to use allowlist (alphanumeric,
  dash, underscore) instead of denylist replacements
- Fix stale active_thread ID in get_or_create_thread: fall back
  to create_thread() when the stored ID is missing from the map
- Persist turn on approval rejection so user messages survive
  crashes after a tool is rejected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
…gs defaults (nearai#237)

* fix: harden openai-compatible tool flow and local defaults

* fix: close approval replay gaps and harden openai-compatible flow

* fix: address review feedback and code improvements (takeover nearai#112)

- Make ChatCompletionResponse.id Optional<String> to handle providers
  that omit or null the field
- Propagate HTTP client builder errors instead of silently dropping
  timeout configuration (openai_compatible_chat, nearai_chat)
- Add EMBEDDING_DIMENSION env var with smart per-model defaults instead
  of hardcoding 768/1536 everywhere
- Remove duplicated dimension inference logic from main.rs

Co-Authored-By: panosAthDBX <panosAthDBX@users.noreply.github.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: harden src/llm/ module from crate audit findings

- Replace 9x .expect() on RwLock with graceful poison recovery
  (nearai.rs: 7, nearai_chat.rs: 2) — eliminates production panics
- Propagate HTTP client builder errors in nearai.rs instead of
  silently dropping timeout config (NearAiProvider::new now returns Result)
- Make nearai_chat ChatCompletionResponse.id Optional<String>
  (mirrors openai_compatible_chat.rs fix for providers that omit id)
- Make nearai_chat usage fields optional with defensive parse_usage()
  helper (was required u32 fields that crash on null/missing)
- Truncate error responses to 512 chars in nearai_chat.rs error
  messages to prevent log bloat and potential data leakage
- Delegate 4 missing LlmProvider methods in FailoverProvider
  (model_metadata, seed_response_chain, get_response_chain_id,
  calculate_cost) to last-used provider instead of trait defaults

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(llm): add RetryProvider, remove openai_compatible_chat, harden decorators

- Add composable RetryProvider decorator wrapping any LlmProvider with
  exponential backoff + jitter, respecting RateLimited retry_after hints
- Remove openai_compatible_chat.rs — replaced by rig adapter + RetryProvider
- Remove internal retry loop from nearai.rs (was causing double-retry
  with external RetryProvider, up to 16 attempts instead of 4)
- Remove internal retry loop from nearai_chat.rs (same issue)
- Wire RetryProvider into main.rs composition chain: each provider gets
  its own retry wrapper before failover
- Move normalize_tool_name to rig_adapter.rs for all rig-based providers
- Reconcile is_retryable() vs is_transient() error classification:
  ModelNotAvailable no longer retryable, Json no longer transient
- Fix unchecked Duration subtraction panic in circuit_breaker.rs
- Make failover.rs use shared is_retryable() from retry.rs
- Remove stale #[allow(dead_code)] on NearAiResponse::id (field is used)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review feedback — error handling, dimension validation, libSQL warning

- Replace response.text().await.unwrap_or_default() with proper error
  propagation in nearai.rs and nearai_chat.rs (4 call sites). Failures
  now return LlmError::RequestFailed with context instead of silently
  proceeding with an empty string.
- Add embedding dimension validation in OllamaEmbeddings::embed_batch():
  returns EmbeddingError if Ollama returns embeddings with a dimension
  that doesn't match the configured value.
- Add runtime warning when libSQL backend is used with non-1536 embedding
  dimension, since the libSQL schema uses F32_BLOB(1536) and cannot store
  different-dimension vectors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Apply suggestions from code review

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: panosAthDbx <packux@gmail.com>
Co-authored-by: panosAthDBX <127238517+panosAthDBX@users.noreply.github.com>
Co-authored-by: panosAthDBX <panosAthDBX@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
bkutasi pushed a commit to bkutasi/ironclaw that referenced this pull request Mar 28, 2026
* fix: persist turns after approval and add agent-level tests

Port relevant changes from PR nearai#112 that were not carried over to nearai#237:

- Add persist_turn calls in process_approval for the response, error,
  and auth-required paths. Previously, turns completed after tool
  approval were never persisted to DB — if the process crashed after
  approval the entire turn (user message + assistant response) was lost.

- Add agent-level unit tests: StaticLlmProvider mock, make_test_agent
  helper, tests for auto-approval logic, destructive shell command
  detection, and PendingApproval backward-compatible deserialization
  (without deferred_tool_calls field).

- Remove unused _thread_state binding in process_approval.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address 14 audit findings in src/agent/

Audit of the agent module found 2 High, 7 Medium, 3 Low, and 2 Nit
severity issues. This commit fixes all of them:

High:
- Remove 4 `.expect()` calls in session.rs (entry API, match, direct
  indexing, if-let) to eliminate panic paths in production
- Add typed RoutineError enum replacing Result<_, String> across
  routine.rs, routine_engine.rs, and callers in history/store.rs and
  db/libsql/mod.rs

Medium:
- Sanitize routine names in path construction to prevent directory
  traversal (routine_engine.rs)
- Log warnings for 5 silently-swallowed errors in scheduler.rs,
  compaction.rs, and worker.rs
- Extract shared handle_auth_intercept helper to deduplicate auth
  interception in thread_ops.rs
- Add session count warning threshold in session_manager.rs
- Make FullJob stub degradation visible via warn-level log and
  prepended warning in output

Low:
- Restrict dead code visibility with #[cfg(test)] on 19 unused items
  in submission.rs, task.rs, and undo.rs
- Narrow pub to pub(crate) on self_repair.rs builder methods
- Remove TaskStatus from mod.rs re-exports (test-only type)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address PR review comments

- Reorder persist_turn before persist_response_chain so the
  conversation row exists before the metadata UPDATE runs
- Add persist_response_chain call to handle_auth_intercept so
  auth-required paths preserve the response chain
- Harden sanitize_routine_name to use allowlist (alphanumeric,
  dash, underscore) instead of denylist replacements
- Fix stale active_thread ID in get_or_create_thread: fall back
  to create_thread() when the stored ID is missing from the map
- Persist turn on approval rejection so user messages survive
  crashes after a tool is rejected

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants