fix: model provider config hardening#2572
Conversation
…early validation - Extract duplicated isConfigured logic into shared isProviderConfigured() helper in gateway app.js to prevent render/activate divergence - Replace derive(Debug) with manual impl on CustomLlmProviderSettings and LlmBuiltinOverride to redact api_key from log output after hydration - Add base_url SSRF validation at save time in validate_custom_providers() instead of deferring to LlmConfig::resolve() - Downgrade tracing::info! to debug! in resolve_custom_provider and BEDROCK_REGION default (config resolution is not user-facing status) - Change test_connection 400/422 from ok:true to ok:false to avoid misleading green badge when model name or endpoint is wrong Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Code Review
This pull request introduces several enhancements to the LLM provider configuration and validation logic. Key changes include a new client-side check for provider configuration status, UI updates to display unconfigured badges, and stricter guards in the setActiveProvider function to ensure API keys, base URLs, and models are present before activation. On the backend, SSRF validation is now performed on custom provider base URLs, connection test error handling is improved to prevent misleading status badges, and sensitive API keys are now redacted in debug logs. Feedback suggests using const instead of var in JavaScript for better scoping and refactoring duplicated dialog logic into a helper function.
- isProviderConfigured() treated all custom providers as requiring an API key because CustomLlmProviderSettings has no api_key_required field (undefined !== false -> true). Derive needsKey from the adapter type instead: ollama runs locally and needs no key, other adapters do. - Extract status-code interpretation from interpret_chat_response() into a pure interpret_chat_status() helper so it can be tested without mocking reqwest::Response. - Add regression tests covering 400/422 -> ok:false (the bug that green badge was shown for wrong model/endpoint), plus 200/401 for coverage.
There was a problem hiding this comment.
Pull request overview
Hardens LLM model provider configuration handling to prevent activation of incomplete providers (avoiding restart/crash loops) and improves security around secrets, SSRF validation, and UI feedback for configuration state.
Changes:
- Redacts API keys from Rust
Debugoutput for settings structs after secrets hydration, with regression tests. - Adds SSRF validation for custom provider
base_urlat settings-save time and improves/api/llm/test_connectionstatus interpretation (400/422 nowok:false). - Updates gateway UI to detect configured providers consistently, show an “Not Configured” badge, hide “Use” until configured, add activation guards, and include
env_modelin model restoration.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/settings.rs | Manual Debug impls to redact api_key; adds tests to ensure redaction. |
| src/config/llm.rs | Lowers log level in config resolution paths to reduce noise. |
| src/channels/web/handlers/settings.rs | Adds SSRF validation for custom provider base_url during save; adds tests. |
| src/channels/web/handlers/llm.rs | Refactors status handling for test connection; fixes 400/422 to ok:false; adds tests. |
| src/channels/wasm/setup.rs | Suppresses clippy too_many_arguments warning for startup registration helper. |
| crates/ironclaw_gateway/static/style.css | Adds styling for “unconfigured provider” badge. |
| crates/ironclaw_gateway/static/i18n/en.js | Adds strings for “Not Configured” and activation guard toasts. |
| crates/ironclaw_gateway/static/i18n/zh-CN.js | Adds corresponding zh-CN strings for new provider UI states. |
| crates/ironclaw_gateway/static/i18n/ko.js | Adds corresponding ko strings for new provider UI states. |
| crates/ironclaw_gateway/static/app.js | Adds shared isProviderConfigured() + activation guards; updates provider rendering and model restore chain. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
As discussed today, @italic-jinxin will help fix the crash loop when a model is misconfigured in this PR. This is working in progress. |
There was a problem hiding this comment.
Pull request overview
Hardens LLM provider configuration so incomplete/unsafe provider setups don’t lead to misleading UI state or backend startup failures (issue #2514), while also tightening SSRF defenses and secret handling.
Changes:
- Add NearAI fallback resolution for unusable LLM configs and sync post-fallback backend/model state back to the DB.
- Harden provider UX/activation logic (configured badges, “Use” gating, model restoration chain incl. env model) and fix test-connection status interpretation.
- Redact API keys from
Debugoutput and validate custom providerbase_urlfor SSRF at save time.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
src/settings.rs |
Manual Debug impls to redact api_key; adds tests to ensure no secret leaks via logs. |
src/config/mod.rs |
Switch final LLM resolve to fallback-aware path; persist post-fallback backend to DB and clear stale selected_model; adds alias normalization helpers + tests. |
src/config/llm.rs |
Introduces resolve_with_fallback() (NearAI demotion), adds “unusable config” detection, and downgrades some config logs to debug!; adds extensive fallback tests. |
src/channels/web/handlers/llm.rs |
Extracts status-code interpretation for testability, fixes 400/422 to report ok:false, and adds base_url_required to provider metadata. |
src/channels/web/features/settings/mod.rs |
Validates custom provider base_url against SSRF on save and adds tests for safe/unsafe URLs. |
crates/ironclaw_gateway/static/styles/surfaces/config.css |
Adds styling for an “unconfigured provider” badge. |
crates/ironclaw_gateway/static/js/surfaces/config.js |
Centralizes “is configured” logic, gates activation, adds model/env-model restoration, and improves user feedback + config dialog opening. |
crates/ironclaw_gateway/static/i18n/en.js |
Adds strings for new provider-state/toast messages and restart tooltips. |
crates/ironclaw_gateway/static/i18n/ko.js |
Same i18n additions (Korean). |
crates/ironclaw_gateway/static/i18n/zh-CN.js |
Same i18n additions (Simplified Chinese). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…oid spurious DB rewrites normalize_backend() hardcoded only a handful of aliases, but providers.json declares many more (claude → anthropic, bigmodel → zai, github-copilot → github_copilot, open_router → openrouter, ...). A DB value like `claude` resolves to `anthropic` via resolve_registry_provider, which my fallback_fired() then mistook for a cross-backend demotion — overwriting the user's `llm_backend` row on every startup.
think-in-universe
left a comment
There was a problem hiding this comment.
LGTM. The PR has been well tested end-to-end.
We're ready to merge.
`test_auth_required_sse_without_duplicate_response` was failing in the merge queue on every PR (most recently bouncing #3197 and #3203 from the queue) because the `auth_sse_server` fixture never set `LLM_API_KEY` in the spawned ironclaw env. After #2572 added a missing- API-key check to the openai_compatible config validator (Apr 22), ironclaw rejected the env-supplied openai_compatible config, fell back to the NearAI default, hit "missing session token", and failed the turn before the github skill could even fire its 401. The chat thus reached `state: Failed` with no tool calls and no `onboarding_state/auth_required` event — which is exactly what the test asserted on, hence the consistent failure. Adding `LLM_API_KEY=mock-api-key` matches the value already used in every other e2e fixture (auth_matrix, conftest's ironclaw_server, v2_engine, etc.) and unblocks the assertion. Local run: PASSED in 8s.
…ange (#3235) * ci(e2e): replace deleted preflight test with tool_activate surface The v2-engine E2E group still listed test_v2_kernel_auth_preflight.py, but that file was removed in #2868 (engine-v2: callable-only available actions) and replaced with test_v2_tool_activate_surface.py for the new tool_activate / Activatable Integrations contract. The Web E2E Full job is skipped on PR-level CI but runs in the merge queue, so the bad path filter dequeued #3197 and #3203 with "file or directory not found: test_v2_kernel_auth_preflight.py". * test(e2e): unblock Live Canary auth lanes after engine-v2 contract change The Live Canary "Auth Smoke", "Auth Full", and "Auth Live Seeded" jobs have failed every scheduled run since 2026-05-01 (when the canary cut over to main). Three tests in test_v2_auth_oauth_matrix.py drive the failures, all rooted in the engine-v2 callable-only contract from #2868 that didn't exist when these tests were written. ## What was broken `test_mcp_same_server_multi_user_via_browser` After OAuth completes, sending "check mock mcp search" through each user's browser opens an `approval` pending_gate on the first MCP tool call (engine v2 default). The browser fixture has no auto-approve UI, so the chat sat in `pending_gate` for the full 5-min Playwright timeout — `expected_text_contains="Mock MCP search result"` could never match because the assistant bubble never received any text. `test_wasm_tool_oauth_refresh_on_demand` Same shape: gmail call gates on `approval` before reaching the http credential-injection layer that performs the OAuth refresh. Without approving, refresh_count never went above 0, so the test failed with "Timed out waiting for OAuth refresh request". `test_wasm_tool_first_chat_auth_attempt_emits_auth_url` Tested OLD engine-v2 behavior — that an LLM-emitted call to a not-yet- authed extension would surface a `gate_required` Authentication event with an auth URL. After #2868, the engine returns "action 'gmail' is not callable in this execution context" instead, and `tool_activate` became the model-facing enablement path. The mock LLM is canned to emit tool calls directly, so this scenario can't be reproduced from a scripted LLM until the canned response is updated. ## Fixes - `_wait_for_tool_call`: accept a `token` kwarg so multi-user tests can poll/approve through a per-user identity. Backwards-compatible. - `test_mcp_same_server_multi_user_via_browser`: drive approval through the per-user API while waiting for the tool to land. Drop the broken `expected_text_contains` predicate and the tied "Mock MCP search result" text assertions; the bearer-token isolation assertion (what this test actually exists to prove) is retained and unaffected. - `test_wasm_tool_oauth_refresh_on_demand`: insert a `_wait_for_tool_call` approval step between `_send_chat` and `_wait_for_refresh_request` so the http credential layer actually runs. - `test_wasm_tool_first_chat_auth_attempt_emits_auth_url`: marked xfail with an inline reason pointing at #2868 and the replacement coverage (`test_v2_tool_activate_surface.py`, `test_settings_first_gmail_auth_then_chat_runs`). - Drop the now-unused `send_chat_and_wait_for_terminal_message` import. ## conftest fix `ironclaw_server` now sets `SECRETS_MASTER_KEY` in the spawned env. On macOS without it, `auto_generate_and_persist` blocks on a Keychain authorization prompt that no one's home to click, so `wait_for_ready` times out at 60s and the fixture kills the process with SIGKILL — making any session-scoped browser test impossible to run locally. On Linux, the keychain backend errors fast and the auto-generate fallback writes to `.env`, so CI was unaffected. Setting the key explicitly matches the pattern already used in `auth_matrix_server`, `test_v2_engine_auth_cancel`, `test_v2_tool_activate_surface`, etc. ## Verification Local repro confirmed each failure mode (HTTP-only repro for the non-browser tests, server-side log inspection for the multi-user test). Reproduced the exact pending_gate=approval pattern, fixed it, verified the assertion semantics still hold: ``` $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py::test_wasm_tool_oauth_refresh_on_demand PASSED in 6.74s $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py::test_wasm_tool_first_chat_auth_attempt_emits_auth_url XFAIL in 93s $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py -v --timeout=120 12 passed, 2 skipped, 3 xfailed (browser tests errored locally; they'll run cleanly in CI) ``` The browser-driven `test_mcp_same_server_multi_user_via_browser` couldn't be exercised locally (chromium can't launch under this shell sandbox), but the API + auto-approve flow it now relies on is exercised by an HTTP-equivalent repro and matches the pattern used in `test_settings_first_gmail_auth_then_chat_runs`. * test(e2e): set LLM_API_KEY in auth_sse_server fixture `test_auth_required_sse_without_duplicate_response` was failing in the merge queue on every PR (most recently bouncing #3197 and #3203 from the queue) because the `auth_sse_server` fixture never set `LLM_API_KEY` in the spawned ironclaw env. After #2572 added a missing- API-key check to the openai_compatible config validator (Apr 22), ironclaw rejected the env-supplied openai_compatible config, fell back to the NearAI default, hit "missing session token", and failed the turn before the github skill could even fire its 401. The chat thus reached `state: Failed` with no tool calls and no `onboarding_state/auth_required` event — which is exactly what the test asserted on, hence the consistent failure. Adding `LLM_API_KEY=mock-api-key` matches the value already used in every other e2e fixture (auth_matrix, conftest's ironclaw_server, v2_engine, etc.) and unblocks the assertion. Local run: PASSED in 8s. * fix(gateway): suppress duplicate assistant bubble after streamed response [skip-regression-check] The SSE `response` handler unconditionally called addMessage('assistant', data.content) even when stream_chunks had already populated and finalized a bubble for the same response. This stayed invisible in the common case but surfaced as a hard test failure under the path test_switching_back_preserves_in_progress_turn: 1. Send "What is 2+2?" on thread A — stream chunks start filling an assistant bubble with "data-streaming". 2. Switch to thread B mid-stream — container clears (history reload). 3. Switch back to thread A — history rehydration shows the in-progress turn with no response yet, so 0 assistant bubbles in DOM. 4. Stream chunks continue to fire for A — appendToLastAssistant creates a new bubble and accumulates the response into it. 5. response event fires — flushes any remaining buffer, removes the data-streaming flag (good) — then addMessage('assistant', content) creates a SECOND identical bubble. Result: locator(".message.assistant").filter(has_text="4") matches two elements, Playwright strict mode rejects the wait_for, the test fails. Outside the test, two identical bubbles render to the user. Fix: only call addMessage in the response handler when there was no in-flight streaming bubble. If one existed, the streamed content is already correct (chunks accumulate `data.content` verbatim) and the data-streaming flag has just been cleared. Non-streaming responses (no chunks fired) still take the addMessage branch. Regression coverage: tests/e2e/scenarios/test_message_persistence.py:: test_switching_back_preserves_in_progress_turn already reproduces this exact scenario and was failing in the merge queue. With this fix it passes; skip-regression-check used because the existing E2E test is the regression test, and the gateway doesn't have a JS unit test harness for SSE handler state. * fix(gateway): dedupe history-rendered SSE responses * test(e2e): set mock LLM API key in standalone fixtures * fix(e2e): make v2 approval tests deterministic * test(e2e): stabilize duplicate skill install assertion * test(e2e): assert duplicate install stays ungated * fix(skills): skip approval for disk-installed duplicates * fix(v2): honor no-op skill installs without approval * test(e2e): wait for pending send marker to clear --------- Co-authored-by: Firat Sertgoz <f@nuff.tech>
Summary
Validate provider config completeness before activation and harden backend security.
isConfiguredcheck into sharedisProviderConfigured()helper to prevent render/activate logic divergencesetActiveProvider(): missing API key, missing base URL (custom providers), and missing model — each shows a toast and auto-opens the config dialogenv_modelto the model restoration chain (override → env → default)#[derive(Debug)]with manualDebugimpl onCustomLlmProviderSettingsandLlmBuiltinOverrideto redactapi_keyfrom log output after secrets hydrationbase_urlagainst SSRF at save time (validate_operator_base_url) instead of deferring toLlmConfig::resolve()test_connection400/422 response fromok: truetook: falseto avoid misleading green badge when model name or endpoint is wrongtracing::info!todebug!in config resolution paths to avoid polluting REPL/TUIChange Type
Linked Issue
Fixes #2514
Validation
cargo fmt --all -- --checkcargo clippy --all --benches --tests --examples --all-features -- -D warningscargo buildcargo test --features integrationif database-backed or integration behavior changedreview-prorpr-shepherd --fixwas run before requesting reviewSecurity Impact
Database Impact
Blast Radius
Rollback Plan
Review Follow-Through
Review track: