fix: harden openai-compatible provider, approval replay, and embeddings defaults#237
Conversation
…ddings # Conflicts: # CHANGELOG.md # src/agent/agent_loop.rs # src/config.rs # src/llm/mod.rs
- Make ChatCompletionResponse.id Optional<String> to handle providers that omit or null the field - Propagate HTTP client builder errors instead of silently dropping timeout configuration (openai_compatible_chat, nearai_chat) - Add EMBEDDING_DIMENSION env var with smart per-model defaults instead of hardcoding 768/1536 everywhere - Remove duplicated dimension inference logic from main.rs Co-Authored-By: panosAthDBX <panosAthDBX@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @ilblackdragon, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness and flexibility of the LLM integration by hardening the OpenAI-compatible chat provider, introducing a sophisticated multi-tool approval replay mechanism, and expanding embedding capabilities to include Ollama and flexible dimensions. These changes improve compatibility with various LLM services, streamline complex agentic workflows, and provide greater control over embedding configurations, ultimately leading to a more reliable and adaptable system. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a substantial set of features and robustness improvements across the LLM provider landscape, including the addition of the OpenAiCompatibleChatProvider, support for Ollama embeddings, flexible embedding dimensions, and deferred tool call replay for multi-tool approvals. However, a high-severity security vulnerability was identified in the web UI's HTML sanitization logic. The custom regular expression-based sanitization is bypassable, which could lead to Cross-Site Scripting (XSS) and the theft of sensitive authentication tokens. It is strongly recommended to use a dedicated sanitization library like DOMPurify to mitigate this risk. Additionally, there are suggestions to improve maintainability by refactoring duplicated code and to enhance error handling for more robust provider interactions.
| let mut deferred_queue = std::collections::VecDeque::from(deferred_tool_calls); | ||
| while let Some(tc) = deferred_queue.pop_front() { | ||
| // Re-check approval for each deferred tool call | ||
| if let Some(tool) = self.tools().get(&tc.name).await | ||
| && tool.requires_approval() | ||
| { | ||
| let is_auto_approved = { | ||
| let sess = session.lock().await; | ||
| let mut approved = sess.is_tool_auto_approved(&tc.name); | ||
| if approved && tool.requires_approval_for(&tc.arguments) { | ||
| approved = false; | ||
| } | ||
| approved | ||
| }; | ||
|
|
||
| if !is_auto_approved { | ||
| let new_pending = PendingApproval { | ||
| request_id: Uuid::new_v4(), | ||
| tool_name: tc.name.clone(), | ||
| parameters: tc.arguments.clone(), | ||
| description: tool.description().to_string(), | ||
| tool_call_id: tc.id.clone(), | ||
| context_messages: context_messages.clone(), | ||
| deferred_tool_calls: deferred_queue.iter().cloned().collect(), | ||
| }; | ||
|
|
||
| let request_id = new_pending.request_id; | ||
| let tool_name = new_pending.tool_name.clone(); | ||
| let description = new_pending.description.clone(); | ||
| let parameters = new_pending.parameters.clone(); | ||
|
|
||
| { | ||
| let mut sess = session.lock().await; | ||
| if let Some(thread) = sess.threads.get_mut(&thread_id) { | ||
| thread.await_approval(new_pending); | ||
| } | ||
| } | ||
|
|
||
| let _ = self | ||
| .channels | ||
| .send_status( | ||
| &message.channel, | ||
| StatusUpdate::Status("Awaiting approval".into()), | ||
| &message.metadata, | ||
| ) | ||
| .await; | ||
|
|
||
| return Ok(SubmissionResult::NeedApproval { | ||
| request_id, | ||
| tool_name, | ||
| description, | ||
| parameters, | ||
| }); | ||
| } | ||
| } | ||
|
|
||
| let _ = self | ||
| .channels | ||
| .send_status( | ||
| &message.channel, | ||
| StatusUpdate::ToolStarted { | ||
| name: tc.name.clone(), | ||
| }, | ||
| &message.metadata, | ||
| ) | ||
| .await; | ||
|
|
||
| let deferred_result = self | ||
| .execute_chat_tool(&tc.name, &tc.arguments, &job_ctx) | ||
| .await; | ||
|
|
||
| let _ = self | ||
| .channels | ||
| .send_status( | ||
| &message.channel, | ||
| StatusUpdate::ToolCompleted { | ||
| name: tc.name.clone(), | ||
| success: deferred_result.is_ok(), | ||
| }, | ||
| &message.metadata, | ||
| ) | ||
| .await; | ||
|
|
||
| if let Ok(ref output) = deferred_result | ||
| && !output.is_empty() | ||
| { | ||
| let _ = self | ||
| .channels | ||
| .send_status( | ||
| &message.channel, | ||
| StatusUpdate::ToolResult { | ||
| name: tc.name.clone(), | ||
| preview: output.clone(), | ||
| }, | ||
| &message.metadata, | ||
| ) | ||
| .await; | ||
| } | ||
|
|
||
| // Record in thread | ||
| { | ||
| let mut sess = session.lock().await; | ||
| if let Some(thread) = sess.threads.get_mut(&thread_id) | ||
| && let Some(turn) = thread.last_turn_mut() | ||
| { | ||
| match &deferred_result { | ||
| Ok(output) => turn.record_tool_result(serde_json::json!(output)), | ||
| Err(e) => turn.record_tool_error(e.to_string()), | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Auth detection for deferred tools | ||
| if let Some((ext_name, instructions)) = | ||
| detect_auth_awaiting(&tc.name, &deferred_result) | ||
| { | ||
| let auth_data = parse_auth_result(&deferred_result); | ||
| { | ||
| let mut sess = session.lock().await; | ||
| if let Some(thread) = sess.threads.get_mut(&thread_id) { | ||
| thread.enter_auth_mode(ext_name.clone()); | ||
| thread.complete_turn(&instructions); | ||
| } | ||
| } | ||
| let _ = self | ||
| .channels | ||
| .send_status( | ||
| &message.channel, | ||
| StatusUpdate::AuthRequired { | ||
| extension_name: ext_name, | ||
| instructions: Some(instructions.clone()), | ||
| auth_url: auth_data.auth_url, | ||
| setup_url: auth_data.setup_url, | ||
| }, | ||
| &message.metadata, | ||
| ) | ||
| .await; | ||
| return Ok(SubmissionResult::response(instructions)); | ||
| } | ||
|
|
||
| let deferred_content = match deferred_result { | ||
| Ok(output) => { | ||
| let sanitized = | ||
| self.safety().sanitize_tool_output(&tc.name, &output); | ||
| self.safety().wrap_for_llm( | ||
| &tc.name, | ||
| &sanitized.content, | ||
| sanitized.was_modified, | ||
| ) | ||
| } | ||
| Err(e) => format!("Error: {}", e), | ||
| }; | ||
|
|
||
| context_messages | ||
| .push(ChatMessage::tool_result(&tc.id, &tc.name, deferred_content)); | ||
| } | ||
|
|
There was a problem hiding this comment.
There is significant code duplication between this block for handling deferred tool calls and the main tool execution loop in dispatcher.rs. Both sections handle iterating through tool calls, checking for approvals, executing tools, and processing results.
This duplication can make maintenance difficult, as changes in one place might need to be manually synchronized in the other. Consider refactoring this shared logic into a helper function that can be called from both dispatcher.rs and here. This would centralize the tool execution flow, improving maintainability and reducing the risk of inconsistencies.
There was a problem hiding this comment.
The deferred tool call duplication is a valid observation. However, refactoring this into a shared helper is a larger architectural change that touches the core agent loop — it warrants its own PR. Filed as a follow-up.
| }; | ||
|
|
||
| let status = response.status(); | ||
| let response_text = response.text().await.unwrap_or_default(); |
There was a problem hiding this comment.
The use of unwrap_or_default() here can hide potential errors when reading the response body. If response.text().await fails, this will proceed with an empty string, leading to a potentially misleading "JSON parse error" later on. It would be more robust to handle the Result from .text() and return a more specific error, which would aid in debugging issues with the response stream from the provider.
let response_text = match response.text().await {
Ok(text) => text,
Err(e) => {
return Err(LlmError::RequestFailed {
provider: "openai_compatible_chat".to_string(),
reason: format!("Failed to read response body: {}", e),
});
}
};References
- Ensuring robust error handling when reading API response bodies helps prevent client-side parsing failures, aligning with the principle that API endpoints should provide valid and consistent responses.
There was a problem hiding this comment.
Pull request overview
This PR hardens LLM provider robustness and embeddings flexibility while adding support for OpenAI-compatible endpoints and Ollama embeddings. It's a continuation of #112 that addresses reviewer feedback and merges with the latest main branch architecture changes.
Changes:
- Adds OpenAI-compatible chat provider with retry logic, tool-call normalization, and robust usage parsing
- Implements multi-tool approval replay with deferred tool calls to prevent orphaned tool_result protocol errors
- Adds Ollama embeddings provider with configurable dimensions and removes fixed 1536-dim PostgreSQL constraint
- Improves provider configuration with base URL overrides for proxies (OpenAI, Anthropic)
- Adds tool message sanitization across all LLM providers to prevent HTTP 400 errors
- Adds approval command aliases (
/approve,/always,/deny,a,n) for better UX - Changes default sandbox image from
ghcr.io/nearai/sandbox:latesttoironclaw-worker:latest - Adds thread_id to approval SSE events for better web gateway filtering
- Fixes REPL quit forwarding to gracefully exit the agent loop
Reviewed changes
Copilot reviewed 24 out of 24 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/llm/openai_compatible_chat.rs | New OpenAI-compatible chat provider with retry, tool normalization, and robust usage parsing |
| src/llm/provider.rs | Added sanitize_tool_messages function with comprehensive tests to prevent orphaned tool_result errors |
| src/llm/nearai_chat.rs | Added configurable tool-message flattening and improved URL construction |
| src/llm/nearai.rs | Integrated sanitize_tool_messages for robustness |
| src/llm/rig_adapter.rs | Integrated sanitize_tool_messages before sending to Rig adapters |
| src/llm/mod.rs | Wired OpenAI-compatible provider and added base URL support for OpenAI/Anthropic |
| src/workspace/embeddings.rs | Added OllamaEmbeddings provider with configurable model and dimension |
| src/workspace/mod.rs | Exported OllamaEmbeddings |
| src/config/embeddings.rs | Added dimension inference from model names with EMBEDDING_DIMENSION override |
| src/config/llm.rs | Added optional base_url to OpenAI and Anthropic configs for proxy support |
| src/config/sandbox.rs | Changed default sandbox image to ironclaw-worker:latest |
| src/sandbox/config.rs | Changed default sandbox image to ironclaw-worker:latest |
| src/settings.rs | Changed default sandbox image to ironclaw-worker:latest |
| src/agent/dispatcher.rs | Collects deferred tool calls when approval is needed for multi-tool responses |
| src/agent/thread_ops.rs | Replays deferred tool calls after approval with proper approval re-checking |
| src/agent/session.rs | Added deferred_tool_calls field to PendingApproval |
| src/agent/submission.rs | Added approval aliases for better UX |
| src/channels/web/types.rs | Added thread_id to ApprovalNeeded SSE event |
| src/channels/web/mod.rs | Wired thread_id in approval events and added with_llm_provider |
| src/channels/web/static/app.js | Added thread filtering for approval events |
| src/channels/repl.rs | Forwards /quit command through agent loop for graceful shutdown |
| src/main.rs | Updated embeddings setup with dimension config and Ollama support |
| migrations/V9__flexible_embedding_dimension.sql | Removes fixed dimension constraint and HNSW index for flexible embeddings |
| CHANGELOG.md | Documents all changes |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let result: OllamaEmbedResponse = response.json().await.map_err(|e| { | ||
| EmbeddingError::InvalidResponse(format!("Failed to parse Ollama response: {}", e)) | ||
| })?; | ||
|
|
There was a problem hiding this comment.
The Ollama embeddings implementation doesn't validate that the returned embedding dimensions match the configured dimension. If the model returns embeddings with a different dimension than expected, this could cause silent failures when storing to the database or performing vector operations. Consider adding validation after receiving the response to ensure each embedding has the expected dimension.
| // Validate that all returned embeddings match the configured dimension. | |
| let expected_dim = self.dimension; | |
| if let Some((index, actual_dim)) = result | |
| .embeddings | |
| .iter() | |
| .enumerate() | |
| .find_map(|(i, emb)| { | |
| let len = emb.len(); | |
| if len != expected_dim { | |
| Some((i, len)) | |
| } else { | |
| None | |
| } | |
| }) | |
| { | |
| return Err(EmbeddingError::InvalidResponse(format!( | |
| "Ollama returned embedding with incorrect dimension at index {}: expected {}, got {}", | |
| index, expected_dim, actual_dim | |
| ))); | |
| } |
There was a problem hiding this comment.
Good catch. Added dimension validation in OllamaEmbeddings::embed_batch() — returns EmbeddingError::InvalidResponse if any returned embedding doesn't match the configured dimension. Fixed in 0c42321.
| .with_model(&config.embeddings.model, config.embeddings.dimension), | ||
| )), | ||
| "ollama" => Some(Arc::new( | ||
| ironclaw::workspace::OllamaEmbeddings::new( | ||
| &config.embeddings.ollama_base_url, | ||
| ) | ||
| .with_model(&config.embeddings.model, config.embeddings.dimension), | ||
| )), | ||
| _ => { | ||
| if let Some(api_key) = config.embeddings.openai_api_key() { | ||
| let dim = match config.embeddings.model.as_str() { | ||
| "text-embedding-3-large" => 3072, | ||
| _ => 1536, | ||
| }; | ||
| Some(Arc::new(ironclaw::workspace::OpenAiEmbeddings::with_model( | ||
| api_key, | ||
| &config.embeddings.model, | ||
| dim, | ||
| config.embeddings.dimension, |
There was a problem hiding this comment.
The libSQL migrations still use F32_BLOB(1536) with a fixed dimension, meaning libSQL deployments cannot use embedding models with different dimensions (e.g., Ollama's 768-dim models or OpenAI's 3072-dim models). While the PR description notes this as a "known limitation," users who configure non-1536-dim models with libSQL will experience silent failures or dimension mismatches when storing embeddings. Consider adding runtime validation that checks if the configured embedding dimension matches the database schema dimension, and emitting a clear error or warning when there's a mismatch for libSQL deployments.
There was a problem hiding this comment.
Added a runtime tracing::warn! at startup when libSQL backend is configured with a non-1536 embedding dimension, explaining the F32_BLOB(1536) schema limitation and suggesting PostgreSQL or EMBEDDING_DIMENSION=1536. Fixed in 0c42321.
- Replace 9x .expect() on RwLock with graceful poison recovery (nearai.rs: 7, nearai_chat.rs: 2) — eliminates production panics - Propagate HTTP client builder errors in nearai.rs instead of silently dropping timeout config (NearAiProvider::new now returns Result) - Make nearai_chat ChatCompletionResponse.id Optional<String> (mirrors openai_compatible_chat.rs fix for providers that omit id) - Make nearai_chat usage fields optional with defensive parse_usage() helper (was required u32 fields that crash on null/missing) - Truncate error responses to 512 chars in nearai_chat.rs error messages to prevent log bloat and potential data leakage - Delegate 4 missing LlmProvider methods in FailoverProvider (model_metadata, seed_response_chain, get_response_chain_id, calculate_cost) to last-used provider instead of trait defaults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…en decorators - Add composable RetryProvider decorator wrapping any LlmProvider with exponential backoff + jitter, respecting RateLimited retry_after hints - Remove openai_compatible_chat.rs — replaced by rig adapter + RetryProvider - Remove internal retry loop from nearai.rs (was causing double-retry with external RetryProvider, up to 16 attempts instead of 4) - Remove internal retry loop from nearai_chat.rs (same issue) - Wire RetryProvider into main.rs composition chain: each provider gets its own retry wrapper before failover - Move normalize_tool_name to rig_adapter.rs for all rig-based providers - Reconcile is_retryable() vs is_transient() error classification: ModelNotAvailable no longer retryable, Json no longer transient - Fix unchecked Duration subtraction panic in circuit_breaker.rs - Make failover.rs use shared is_retryable() from retry.rs - Remove stale #[allow(dead_code)] on NearAiResponse::id (field is used) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| EmbeddingError::InvalidResponse(format!("Failed to parse Ollama response: {}", e)) | ||
| })?; | ||
|
|
||
| Ok(result.embeddings) |
There was a problem hiding this comment.
The OllamaEmbeddings provider doesn't validate that the returned embedding dimension matches the configured dimension. If the Ollama model returns embeddings of a different dimension than what was configured (e.g., due to model misconfiguration), this mismatch will only be caught later when storing to the database, leading to a confusing error. Consider adding validation after line 456 to check that each embedding in result.embeddings has length equal to self.dimension, and return an EmbeddingError::InvalidResponse if the dimension doesn't match.
| Ok(result.embeddings) | |
| let embeddings = result.embeddings; | |
| for (i, emb) in embeddings.iter().enumerate() { | |
| if emb.len() != self.dimension { | |
| return Err(EmbeddingError::InvalidResponse(format!( | |
| "Ollama returned embedding of dimension {}, expected {} at index {}", | |
| emb.len(), | |
| self.dimension, | |
| i | |
| ))); | |
| } | |
| } | |
| Ok(embeddings) |
There was a problem hiding this comment.
Duplicate of the Copilot comment above — fixed in 0c42321 with dimension validation in embed_batch().
…n, libSQL warning - Replace response.text().await.unwrap_or_default() with proper error propagation in nearai.rs and nearai_chat.rs (4 call sites). Failures now return LlmError::RequestFailed with context instead of silently proceeding with an empty string. - Add embedding dimension validation in OllamaEmbeddings::embed_batch(): returns EmbeddingError if Ollama returns embeddings with a dimension that doesn't match the configured value. - Add runtime warning when libSQL backend is used with non-1536 embedding dimension, since the libSQL schema uses F32_BLOB(1536) and cannot store different-dimension vectors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-replay-embeddings # Conflicts: # src/llm/failover.rs # src/llm/nearai_chat.rs # src/llm/rig_adapter.rs # src/main.rs
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let mut idx = 0usize; | ||
| while idx < tool_calls.len() { | ||
| let mut tc = tool_calls[idx].clone(); | ||
|
|
||
| // Check if tool requires approval |
There was a problem hiding this comment.
Switching to a manual while idx < tool_calls.len() loop requires ensuring idx is advanced on every control-flow path. There are continue branches later in this loop body (e.g. hook rejection / policy error) that will now skip idx += 1 and can cause an infinite loop on the same tool call.
| // Wait for at least 1 retry attempt (backoff is ~1s, so 1.5s should be enough) | ||
| tokio::time::sleep(Duration::from_millis(1500)).await; |
There was a problem hiding this comment.
This test introduces a 1.5s wall-clock sleep to wait for retry backoff, which will noticeably slow the suite and can be flaky under load. Consider making backoff delays configurable for tests, using an error with retry_after=Duration::ZERO, or using Tokio time control to advance time without real sleeps.
| // Wait for at least 1 retry attempt (backoff is ~1s, so 1.5s should be enough) | |
| tokio::time::sleep(Duration::from_millis(1500)).await; | |
| // Wait briefly before flipping; this should allow at least one failure before retry | |
| tokio::time::sleep(Duration::from_millis(10)).await; |
| let dimension = optional_env("EMBEDDING_DIMENSION")? | ||
| .map(|s| s.parse::<usize>()) | ||
| .transpose() | ||
| .map_err(|e| ConfigError::InvalidValue { | ||
| key: "EMBEDDING_DIMENSION".to_string(), | ||
| message: format!("must be a positive integer: {e}"), | ||
| })? | ||
| .unwrap_or_else(|| default_dimension_for_model(&model)); | ||
|
|
There was a problem hiding this comment.
EMBEDDING_DIMENSION is parsed as usize, but the code (and error message) implies it must be a positive integer. As written, 0 is accepted and will propagate into embedding providers / DB writes where a zero-dimension vector is invalid. Add an explicit > 0 validation and return ConfigError::InvalidValue when the parsed dimension is 0.
| let dimension = optional_env("EMBEDDING_DIMENSION")? | |
| .map(|s| s.parse::<usize>()) | |
| .transpose() | |
| .map_err(|e| ConfigError::InvalidValue { | |
| key: "EMBEDDING_DIMENSION".to_string(), | |
| message: format!("must be a positive integer: {e}"), | |
| })? | |
| .unwrap_or_else(|| default_dimension_for_model(&model)); | |
| let dimension_opt = optional_env("EMBEDDING_DIMENSION")? | |
| .map(|s| s.parse::<usize>()) | |
| .transpose() | |
| .map_err(|e| ConfigError::InvalidValue { | |
| key: "EMBEDDING_DIMENSION".to_string(), | |
| message: format!("must be a positive integer: {e}"), | |
| })?; | |
| if let Some(0) = dimension_opt { | |
| return Err(ConfigError::InvalidValue { | |
| key: "EMBEDDING_DIMENSION".to_string(), | |
| message: "must be a positive integer greater than zero".to_string(), | |
| }); | |
| } | |
| let dimension = dimension_opt.unwrap_or_else(|| default_dimension_for_model(&model)); |
| let ollama_base_url = optional_env("OLLAMA_BASE_URL")? | ||
| .or_else(|| settings.ollama_base_url.clone()) | ||
| .unwrap_or_else(|| "http://localhost:11434".to_string()); |
There was a problem hiding this comment.
New env vars (OLLAMA_BASE_URL, EMBEDDING_DIMENSION) were added for embeddings resolution. The env-mutating tests in this module should also clear these vars in the clear_embedding_env() helper to avoid flakiness when they are set in the surrounding test environment.
| input: texts, | ||
| }; | ||
|
|
||
| let url = format!("{}/api/embed", self.base_url); |
There was a problem hiding this comment.
OllamaEmbeddings builds the endpoint with format!("{}/api/embed", self.base_url). If base_url is configured with a trailing slash, this produces a //api/embed path which can 404 (depending on server path normalization / redirect behavior). Consider normalizing base_url (e.g., trimming trailing /) before formatting the endpoint URL.
| let url = format!("{}/api/embed", self.base_url); | |
| let base_url = self.base_url.trim_end_matches('/'); | |
| let url = format!("{}/api/embed", base_url); |
Port relevant changes from PR #112 that were not carried over to #237: - Add persist_turn calls in process_approval for the response, error, and auth-required paths. Previously, turns completed after tool approval were never persisted to DB — if the process crashed after approval the entire turn (user message + assistant response) was lost. - Add agent-level unit tests: StaticLlmProvider mock, make_test_agent helper, tests for auto-approval logic, destructive shell command detection, and PendingApproval backward-compatible deserialization (without deferred_tool_calls field). - Remove unused _thread_state binding in process_approval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port relevant changes from PR #112 that were not carried over to #237: - Add persist_turn calls in process_approval for the response, error, and auth-required paths. Previously, turns completed after tool approval were never persisted to DB — if the process crashed after approval the entire turn (user message + assistant response) was lost. - Add agent-level unit tests: StaticLlmProvider mock, make_test_agent helper, tests for auto-approval logic, destructive shell command detection, and PendingApproval backward-compatible deserialization (without deferred_tool_calls field). - Remove unused _thread_state binding in process_approval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: persist turns after approval and add agent-level tests Port relevant changes from PR #112 that were not carried over to #237: - Add persist_turn calls in process_approval for the response, error, and auth-required paths. Previously, turns completed after tool approval were never persisted to DB — if the process crashed after approval the entire turn (user message + assistant response) was lost. - Add agent-level unit tests: StaticLlmProvider mock, make_test_agent helper, tests for auto-approval logic, destructive shell command detection, and PendingApproval backward-compatible deserialization (without deferred_tool_calls field). - Remove unused _thread_state binding in process_approval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 14 audit findings in src/agent/ Audit of the agent module found 2 High, 7 Medium, 3 Low, and 2 Nit severity issues. This commit fixes all of them: High: - Remove 4 `.expect()` calls in session.rs (entry API, match, direct indexing, if-let) to eliminate panic paths in production - Add typed RoutineError enum replacing Result<_, String> across routine.rs, routine_engine.rs, and callers in history/store.rs and db/libsql/mod.rs Medium: - Sanitize routine names in path construction to prevent directory traversal (routine_engine.rs) - Log warnings for 5 silently-swallowed errors in scheduler.rs, compaction.rs, and worker.rs - Extract shared handle_auth_intercept helper to deduplicate auth interception in thread_ops.rs - Add session count warning threshold in session_manager.rs - Make FullJob stub degradation visible via warn-level log and prepended warning in output Low: - Restrict dead code visibility with #[cfg(test)] on 19 unused items in submission.rs, task.rs, and undo.rs - Narrow pub to pub(crate) on self_repair.rs builder methods - Remove TaskStatus from mod.rs re-exports (test-only type) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments - Reorder persist_turn before persist_response_chain so the conversation row exists before the metadata UPDATE runs - Add persist_response_chain call to handle_auth_intercept so auth-required paths preserve the response chain - Harden sanitize_routine_name to use allowlist (alphanumeric, dash, underscore) instead of denylist replacements - Fix stale active_thread ID in get_or_create_thread: fall back to create_thread() when the stored ID is missing from the map - Persist turn on approval rejection so user messages survive crashes after a tool is rejected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…gs defaults (nearai#237) * fix: harden openai-compatible tool flow and local defaults * fix: close approval replay gaps and harden openai-compatible flow * fix: address review feedback and code improvements (takeover nearai#112) - Make ChatCompletionResponse.id Optional<String> to handle providers that omit or null the field - Propagate HTTP client builder errors instead of silently dropping timeout configuration (openai_compatible_chat, nearai_chat) - Add EMBEDDING_DIMENSION env var with smart per-model defaults instead of hardcoding 768/1536 everywhere - Remove duplicated dimension inference logic from main.rs Co-Authored-By: panosAthDBX <panosAthDBX@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden src/llm/ module from crate audit findings - Replace 9x .expect() on RwLock with graceful poison recovery (nearai.rs: 7, nearai_chat.rs: 2) — eliminates production panics - Propagate HTTP client builder errors in nearai.rs instead of silently dropping timeout config (NearAiProvider::new now returns Result) - Make nearai_chat ChatCompletionResponse.id Optional<String> (mirrors openai_compatible_chat.rs fix for providers that omit id) - Make nearai_chat usage fields optional with defensive parse_usage() helper (was required u32 fields that crash on null/missing) - Truncate error responses to 512 chars in nearai_chat.rs error messages to prevent log bloat and potential data leakage - Delegate 4 missing LlmProvider methods in FailoverProvider (model_metadata, seed_response_chain, get_response_chain_id, calculate_cost) to last-used provider instead of trait defaults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(llm): add RetryProvider, remove openai_compatible_chat, harden decorators - Add composable RetryProvider decorator wrapping any LlmProvider with exponential backoff + jitter, respecting RateLimited retry_after hints - Remove openai_compatible_chat.rs — replaced by rig adapter + RetryProvider - Remove internal retry loop from nearai.rs (was causing double-retry with external RetryProvider, up to 16 attempts instead of 4) - Remove internal retry loop from nearai_chat.rs (same issue) - Wire RetryProvider into main.rs composition chain: each provider gets its own retry wrapper before failover - Move normalize_tool_name to rig_adapter.rs for all rig-based providers - Reconcile is_retryable() vs is_transient() error classification: ModelNotAvailable no longer retryable, Json no longer transient - Fix unchecked Duration subtraction panic in circuit_breaker.rs - Make failover.rs use shared is_retryable() from retry.rs - Remove stale #[allow(dead_code)] on NearAiResponse::id (field is used) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback — error handling, dimension validation, libSQL warning - Replace response.text().await.unwrap_or_default() with proper error propagation in nearai.rs and nearai_chat.rs (4 call sites). Failures now return LlmError::RequestFailed with context instead of silently proceeding with an empty string. - Add embedding dimension validation in OllamaEmbeddings::embed_batch(): returns EmbeddingError if Ollama returns embeddings with a dimension that doesn't match the configured value. - Add runtime warning when libSQL backend is used with non-1536 embedding dimension, since the libSQL schema uses F32_BLOB(1536) and cannot store different-dimension vectors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: panosAthDbx <packux@gmail.com> Co-authored-by: panosAthDBX <127238517+panosAthDBX@users.noreply.github.com> Co-authored-by: panosAthDBX <panosAthDBX@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: persist turns after approval and add agent-level tests Port relevant changes from PR nearai#112 that were not carried over to nearai#237: - Add persist_turn calls in process_approval for the response, error, and auth-required paths. Previously, turns completed after tool approval were never persisted to DB — if the process crashed after approval the entire turn (user message + assistant response) was lost. - Add agent-level unit tests: StaticLlmProvider mock, make_test_agent helper, tests for auto-approval logic, destructive shell command detection, and PendingApproval backward-compatible deserialization (without deferred_tool_calls field). - Remove unused _thread_state binding in process_approval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 14 audit findings in src/agent/ Audit of the agent module found 2 High, 7 Medium, 3 Low, and 2 Nit severity issues. This commit fixes all of them: High: - Remove 4 `.expect()` calls in session.rs (entry API, match, direct indexing, if-let) to eliminate panic paths in production - Add typed RoutineError enum replacing Result<_, String> across routine.rs, routine_engine.rs, and callers in history/store.rs and db/libsql/mod.rs Medium: - Sanitize routine names in path construction to prevent directory traversal (routine_engine.rs) - Log warnings for 5 silently-swallowed errors in scheduler.rs, compaction.rs, and worker.rs - Extract shared handle_auth_intercept helper to deduplicate auth interception in thread_ops.rs - Add session count warning threshold in session_manager.rs - Make FullJob stub degradation visible via warn-level log and prepended warning in output Low: - Restrict dead code visibility with #[cfg(test)] on 19 unused items in submission.rs, task.rs, and undo.rs - Narrow pub to pub(crate) on self_repair.rs builder methods - Remove TaskStatus from mod.rs re-exports (test-only type) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments - Reorder persist_turn before persist_response_chain so the conversation row exists before the metadata UPDATE runs - Add persist_response_chain call to handle_auth_intercept so auth-required paths preserve the response chain - Harden sanitize_routine_name to use allowlist (alphanumeric, dash, underscore) instead of denylist replacements - Fix stale active_thread ID in get_or_create_thread: fall back to create_thread() when the stored ID is missing from the map - Persist turn on approval rejection so user messages survive crashes after a tool is rejected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…gs defaults (nearai#237) * fix: harden openai-compatible tool flow and local defaults * fix: close approval replay gaps and harden openai-compatible flow * fix: address review feedback and code improvements (takeover nearai#112) - Make ChatCompletionResponse.id Optional<String> to handle providers that omit or null the field - Propagate HTTP client builder errors instead of silently dropping timeout configuration (openai_compatible_chat, nearai_chat) - Add EMBEDDING_DIMENSION env var with smart per-model defaults instead of hardcoding 768/1536 everywhere - Remove duplicated dimension inference logic from main.rs Co-Authored-By: panosAthDBX <panosAthDBX@users.noreply.github.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: harden src/llm/ module from crate audit findings - Replace 9x .expect() on RwLock with graceful poison recovery (nearai.rs: 7, nearai_chat.rs: 2) — eliminates production panics - Propagate HTTP client builder errors in nearai.rs instead of silently dropping timeout config (NearAiProvider::new now returns Result) - Make nearai_chat ChatCompletionResponse.id Optional<String> (mirrors openai_compatible_chat.rs fix for providers that omit id) - Make nearai_chat usage fields optional with defensive parse_usage() helper (was required u32 fields that crash on null/missing) - Truncate error responses to 512 chars in nearai_chat.rs error messages to prevent log bloat and potential data leakage - Delegate 4 missing LlmProvider methods in FailoverProvider (model_metadata, seed_response_chain, get_response_chain_id, calculate_cost) to last-used provider instead of trait defaults Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(llm): add RetryProvider, remove openai_compatible_chat, harden decorators - Add composable RetryProvider decorator wrapping any LlmProvider with exponential backoff + jitter, respecting RateLimited retry_after hints - Remove openai_compatible_chat.rs — replaced by rig adapter + RetryProvider - Remove internal retry loop from nearai.rs (was causing double-retry with external RetryProvider, up to 16 attempts instead of 4) - Remove internal retry loop from nearai_chat.rs (same issue) - Wire RetryProvider into main.rs composition chain: each provider gets its own retry wrapper before failover - Move normalize_tool_name to rig_adapter.rs for all rig-based providers - Reconcile is_retryable() vs is_transient() error classification: ModelNotAvailable no longer retryable, Json no longer transient - Fix unchecked Duration subtraction panic in circuit_breaker.rs - Make failover.rs use shared is_retryable() from retry.rs - Remove stale #[allow(dead_code)] on NearAiResponse::id (field is used) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review feedback — error handling, dimension validation, libSQL warning - Replace response.text().await.unwrap_or_default() with proper error propagation in nearai.rs and nearai_chat.rs (4 call sites). Failures now return LlmError::RequestFailed with context instead of silently proceeding with an empty string. - Add embedding dimension validation in OllamaEmbeddings::embed_batch(): returns EmbeddingError if Ollama returns embeddings with a dimension that doesn't match the configured value. - Add runtime warning when libSQL backend is used with non-1536 embedding dimension, since the libSQL schema uses F32_BLOB(1536) and cannot store different-dimension vectors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Apply suggestions from code review Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: panosAthDbx <packux@gmail.com> Co-authored-by: panosAthDBX <127238517+panosAthDBX@users.noreply.github.com> Co-authored-by: panosAthDBX <panosAthDBX@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
* fix: persist turns after approval and add agent-level tests Port relevant changes from PR nearai#112 that were not carried over to nearai#237: - Add persist_turn calls in process_approval for the response, error, and auth-required paths. Previously, turns completed after tool approval were never persisted to DB — if the process crashed after approval the entire turn (user message + assistant response) was lost. - Add agent-level unit tests: StaticLlmProvider mock, make_test_agent helper, tests for auto-approval logic, destructive shell command detection, and PendingApproval backward-compatible deserialization (without deferred_tool_calls field). - Remove unused _thread_state binding in process_approval. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address 14 audit findings in src/agent/ Audit of the agent module found 2 High, 7 Medium, 3 Low, and 2 Nit severity issues. This commit fixes all of them: High: - Remove 4 `.expect()` calls in session.rs (entry API, match, direct indexing, if-let) to eliminate panic paths in production - Add typed RoutineError enum replacing Result<_, String> across routine.rs, routine_engine.rs, and callers in history/store.rs and db/libsql/mod.rs Medium: - Sanitize routine names in path construction to prevent directory traversal (routine_engine.rs) - Log warnings for 5 silently-swallowed errors in scheduler.rs, compaction.rs, and worker.rs - Extract shared handle_auth_intercept helper to deduplicate auth interception in thread_ops.rs - Add session count warning threshold in session_manager.rs - Make FullJob stub degradation visible via warn-level log and prepended warning in output Low: - Restrict dead code visibility with #[cfg(test)] on 19 unused items in submission.rs, task.rs, and undo.rs - Narrow pub to pub(crate) on self_repair.rs builder methods - Remove TaskStatus from mod.rs re-exports (test-only type) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: address PR review comments - Reorder persist_turn before persist_response_chain so the conversation row exists before the metadata UPDATE runs - Add persist_response_chain call to handle_auth_intercept so auth-required paths preserve the response chain - Harden sanitize_routine_name to use allowlist (alphanumeric, dash, underscore) instead of denylist replacements - Fix stale active_thread ID in get_or_create_thread: fall back to create_thread() when the stored ID is missing from the map - Persist turn on approval rejection so user messages survive crashes after a tool is rejected Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Summary
Continuation of #112 by @panosAthDBX.
Hardens the OpenAI-compatible chat provider, adds multi-tool approval replay with deferred tool calls, wires Ollama embeddings, and fixes several robustness issues across LLM providers.
Changes included (from original PR)
sanitize_tool_messages): Rewrites orphaned tool_result messages as user messages to prevent HTTP 400 from Anthropic/others. Applied consistently across all providersOllamaEmbeddingsprovider with configurable model/dimension/approve,/always,/deny,a,netc. in submission parserOPENAI_BASE_URL,ANTHROPIC_BASE_URLfor proxy support;OLLAMA_BASE_URLfor embeddingsghcr.io/nearai/sandbox:latesttoironclaw-worker:latestthread_idonApprovalNeededSSE events for thread-scoped filtering/quitas message so agent loop exits when other channels are activewith_llm_providerfor OpenAI-compatible API proxyComposable RetryProvider & LLM module hardening
retry.rs): Composable wrapper for anyLlmProviderwith exponential backoff + jitter. RespectsRateLimited { retry_after }hints. Wired into main.rs composition chain so each provider retries independently before failover.openai_compatible_chat.rs: Replaced by rig adapter + RetryProvider. The rig-based adapter now handles all OpenAI-compatible endpoints (including custom base URLs) with tool name normalization moved torig_adapter.rs.nearai.rsandnearai_chat.rsno longer have internal retry loops — retries are handled by the external RetryProvider wrapper, eliminating double-retry (was up to 16 attempts, now correctly 4).is_retryable()(retry/failover) andis_transient()(circuit breaker) now have clear, documented semantics.ModelNotAvailableis no longer retryable;Jsonno longer trips the circuit breaker.circuit_breaker.rs— useschecked_subto avoid panic on TOCTOU race.is_retryable()fromretry.rsinstead of duplicating classification logic.Changes from original
ChatCompletionResponse.idtoOption<String>(reviewer feedback: some providers omit this field)unwrap_or_else(|_| Client::new())to properly propagate errors (reviewer feedback: silently dropped timeout config)EMBEDDING_DIMENSIONenv var with smart per-model defaults instead of hardcoding 768/1536/review-crate @src/llm/audit — fixed all Medium findingsOriginal PR
#112 - fix: harden openai-compatible provider, approval replay, and embeddings defaults
Review comments addressed
ChatCompletionResponse.idchanged fromStringtoOption<String>with#[serde(default)]EMBEDDING_DIMENSIONenv var or infers from model nameunwrap_or_else(|_| Client::new())replaced with proper error propagationF32_BLOB(1536)schema noted as known limitationTest plan
Co-Authored-By: panosAthDBX panosAthDBX@users.noreply.github.com
Generated with Claude Code