fix: model provider config hardening by italic-jinxin · Pull Request #2572 · nearai/ironclaw

italic-jinxin · 2026-04-17T08:42:21Z

Summary

Validate provider config completeness before activation and harden backend security.

Extract duplicated isConfigured check into shared isProviderConfigured() helper to prevent render/activate logic divergence
Show "Not Configured" badge on unconfigured providers and hide the "Use" button until API key is set
Add three activation guards in setActiveProvider(): missing API key, missing base URL (custom providers), and missing model — each shows a toast and auto-opens the config dialog
Add env_model to the model restoration chain (override → env → default)
Replace #[derive(Debug)] with manual Debug impl on CustomLlmProviderSettings and LlmBuiltinOverride to redact api_key from log output after secrets hydration
Validate custom provider base_url against SSRF at save time (validate_operator_base_url) instead of deferring to LlmConfig::resolve()
Change test_connection 400/422 response from ok: true to ok: false to avoid misleading green badge when model name or endpoint is wrong
Downgrade tracing::info! to debug! in config resolution paths to avoid polluting REPL/TUI

Change Type

Linked Issue

Fixes #2514

Validation

cargo fmt --all -- --check
cargo clippy --all --benches --tests --examples --all-features -- -D warnings
cargo build
Relevant tests pass:
cargo test --features integration if database-backed or integration behavior changed
Manual testing:
If a coding agent was used and supports it, review-pr or pr-shepherd --fix was run before requesting review

Security Impact

Database Impact

Blast Radius

Rollback Plan

Review Follow-Through

Review track:

…early validation - Extract duplicated isConfigured logic into shared isProviderConfigured() helper in gateway app.js to prevent render/activate divergence - Replace derive(Debug) with manual impl on CustomLlmProviderSettings and LlmBuiltinOverride to redact api_key from log output after hydration - Add base_url SSRF validation at save time in validate_custom_providers() instead of deferring to LlmConfig::resolve() - Downgrade tracing::info! to debug! in resolve_custom_provider and BEDROCK_REGION default (config resolution is not user-facing status) - Change test_connection 400/422 from ok:true to ok:false to avoid misleading green badge when model name or endpoint is wrong Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

gemini-code-assist

Code Review

This pull request introduces several enhancements to the LLM provider configuration and validation logic. Key changes include a new client-side check for provider configuration status, UI updates to display unconfigured badges, and stricter guards in the setActiveProvider function to ensure API keys, base URLs, and models are present before activation. On the backend, SSRF validation is now performed on custom provider base URLs, connection test error handling is improved to prevent misleading status badges, and sensitive API keys are now redacted in debug logs. Feedback suggests using const instead of var in JavaScript for better scoping and refactoring duplicated dialog logic into a helper function.

- isProviderConfigured() treated all custom providers as requiring an API key because CustomLlmProviderSettings has no api_key_required field (undefined !== false -> true). Derive needsKey from the adapter type instead: ollama runs locally and needs no key, other adapters do. - Extract status-code interpretation from interpret_chat_response() into a pure interpret_chat_status() helper so it can be tested without mocking reqwest::Response. - Add regression tests covering 400/422 -> ok:false (the bug that green badge was shown for wrong model/endpoint), plus 200/401 for coverage.

Copilot

Pull request overview

Hardens LLM model provider configuration handling to prevent activation of incomplete providers (avoiding restart/crash loops) and improves security around secrets, SSRF validation, and UI feedback for configuration state.

Changes:

Redacts API keys from Rust Debug output for settings structs after secrets hydration, with regression tests.
Adds SSRF validation for custom provider base_url at settings-save time and improves /api/llm/test_connection status interpretation (400/422 now ok:false).
Updates gateway UI to detect configured providers consistently, show an “Not Configured” badge, hide “Use” until configured, add activation guards, and include env_model in model restoration.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
src/settings.rs	Manual `Debug` impls to redact `api_key`; adds tests to ensure redaction.
src/config/llm.rs	Lowers log level in config resolution paths to reduce noise.
src/channels/web/handlers/settings.rs	Adds SSRF validation for custom provider `base_url` during save; adds tests.
src/channels/web/handlers/llm.rs	Refactors status handling for test connection; fixes 400/422 to `ok:false`; adds tests.
src/channels/wasm/setup.rs	Suppresses clippy `too_many_arguments` warning for startup registration helper.
crates/ironclaw_gateway/static/style.css	Adds styling for “unconfigured provider” badge.
crates/ironclaw_gateway/static/i18n/en.js	Adds strings for “Not Configured” and activation guard toasts.
crates/ironclaw_gateway/static/i18n/zh-CN.js	Adds corresponding zh-CN strings for new provider UI states.
crates/ironclaw_gateway/static/i18n/ko.js	Adds corresponding ko strings for new provider UI states.
crates/ironclaw_gateway/static/app.js	Adds shared `isProviderConfigured()` + activation guards; updates provider rendering and model restore chain.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

think-in-universe · 2026-04-17T14:48:31Z

As discussed today, @italic-jinxin will help fix the crash loop when a model is misconfigured in this PR. This is working in progress.

…startup

… API

…tent

Copilot

Pull request overview

Hardens LLM provider configuration so incomplete/unsafe provider setups don’t lead to misleading UI state or backend startup failures (issue #2514), while also tightening SSRF defenses and secret handling.

Changes:

Add NearAI fallback resolution for unusable LLM configs and sync post-fallback backend/model state back to the DB.
Harden provider UX/activation logic (configured badges, “Use” gating, model restoration chain incl. env model) and fix test-connection status interpretation.
Redact API keys from Debug output and validate custom provider base_url for SSRF at save time.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
`src/settings.rs`	Manual `Debug` impls to redact `api_key`; adds tests to ensure no secret leaks via logs.
`src/config/mod.rs`	Switch final LLM resolve to fallback-aware path; persist post-fallback backend to DB and clear stale `selected_model`; adds alias normalization helpers + tests.
`src/config/llm.rs`	Introduces `resolve_with_fallback()` (NearAI demotion), adds “unusable config” detection, and downgrades some config logs to `debug!`; adds extensive fallback tests.
`src/channels/web/handlers/llm.rs`	Extracts status-code interpretation for testability, fixes 400/422 to report `ok:false`, and adds `base_url_required` to provider metadata.
`src/channels/web/features/settings/mod.rs`	Validates custom provider `base_url` against SSRF on save and adds tests for safe/unsafe URLs.
`crates/ironclaw_gateway/static/styles/surfaces/config.css`	Adds styling for an “unconfigured provider” badge.
`crates/ironclaw_gateway/static/js/surfaces/config.js`	Centralizes “is configured” logic, gates activation, adds model/env-model restoration, and improves user feedback + config dialog opening.
`crates/ironclaw_gateway/static/i18n/en.js`	Adds strings for new provider-state/toast messages and restart tooltips.
`crates/ironclaw_gateway/static/i18n/ko.js`	Same i18n additions (Korean).
`crates/ironclaw_gateway/static/i18n/zh-CN.js`	Same i18n additions (Simplified Chinese).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…oid spurious DB rewrites normalize_backend() hardcoded only a handful of aliases, but providers.json declares many more (claude → anthropic, bigmodel → zai, github-copilot → github_copilot, open_router → openrouter, ...). A DB value like `claude` resolves to `anthropic` via resolve_registry_provider, which my fallback_fired() then mistook for a cross-backend demotion — overwriting the user's `llm_backend` row on every startup.

think-in-universe

LGTM. The PR has been well tested end-to-end.

We're ready to merge.

`test_auth_required_sse_without_duplicate_response` was failing in the merge queue on every PR (most recently bouncing #3197 and #3203 from the queue) because the `auth_sse_server` fixture never set `LLM_API_KEY` in the spawned ironclaw env. After #2572 added a missing- API-key check to the openai_compatible config validator (Apr 22), ironclaw rejected the env-supplied openai_compatible config, fell back to the NearAI default, hit "missing session token", and failed the turn before the github skill could even fire its 401. The chat thus reached `state: Failed` with no tool calls and no `onboarding_state/auth_required` event — which is exactly what the test asserted on, hence the consistent failure. Adding `LLM_API_KEY=mock-api-key` matches the value already used in every other e2e fixture (auth_matrix, conftest's ironclaw_server, v2_engine, etc.) and unblocks the assertion. Local run: PASSED in 8s.

…ange (#3235) * ci(e2e): replace deleted preflight test with tool_activate surface The v2-engine E2E group still listed test_v2_kernel_auth_preflight.py, but that file was removed in #2868 (engine-v2: callable-only available actions) and replaced with test_v2_tool_activate_surface.py for the new tool_activate / Activatable Integrations contract. The Web E2E Full job is skipped on PR-level CI but runs in the merge queue, so the bad path filter dequeued #3197 and #3203 with "file or directory not found: test_v2_kernel_auth_preflight.py". * test(e2e): unblock Live Canary auth lanes after engine-v2 contract change The Live Canary "Auth Smoke", "Auth Full", and "Auth Live Seeded" jobs have failed every scheduled run since 2026-05-01 (when the canary cut over to main). Three tests in test_v2_auth_oauth_matrix.py drive the failures, all rooted in the engine-v2 callable-only contract from #2868 that didn't exist when these tests were written. ## What was broken `test_mcp_same_server_multi_user_via_browser` After OAuth completes, sending "check mock mcp search" through each user's browser opens an `approval` pending_gate on the first MCP tool call (engine v2 default). The browser fixture has no auto-approve UI, so the chat sat in `pending_gate` for the full 5-min Playwright timeout — `expected_text_contains="Mock MCP search result"` could never match because the assistant bubble never received any text. `test_wasm_tool_oauth_refresh_on_demand` Same shape: gmail call gates on `approval` before reaching the http credential-injection layer that performs the OAuth refresh. Without approving, refresh_count never went above 0, so the test failed with "Timed out waiting for OAuth refresh request". `test_wasm_tool_first_chat_auth_attempt_emits_auth_url` Tested OLD engine-v2 behavior — that an LLM-emitted call to a not-yet- authed extension would surface a `gate_required` Authentication event with an auth URL. After #2868, the engine returns "action 'gmail' is not callable in this execution context" instead, and `tool_activate` became the model-facing enablement path. The mock LLM is canned to emit tool calls directly, so this scenario can't be reproduced from a scripted LLM until the canned response is updated. ## Fixes - `_wait_for_tool_call`: accept a `token` kwarg so multi-user tests can poll/approve through a per-user identity. Backwards-compatible. - `test_mcp_same_server_multi_user_via_browser`: drive approval through the per-user API while waiting for the tool to land. Drop the broken `expected_text_contains` predicate and the tied "Mock MCP search result" text assertions; the bearer-token isolation assertion (what this test actually exists to prove) is retained and unaffected. - `test_wasm_tool_oauth_refresh_on_demand`: insert a `_wait_for_tool_call` approval step between `_send_chat` and `_wait_for_refresh_request` so the http credential layer actually runs. - `test_wasm_tool_first_chat_auth_attempt_emits_auth_url`: marked xfail with an inline reason pointing at #2868 and the replacement coverage (`test_v2_tool_activate_surface.py`, `test_settings_first_gmail_auth_then_chat_runs`). - Drop the now-unused `send_chat_and_wait_for_terminal_message` import. ## conftest fix `ironclaw_server` now sets `SECRETS_MASTER_KEY` in the spawned env. On macOS without it, `auto_generate_and_persist` blocks on a Keychain authorization prompt that no one's home to click, so `wait_for_ready` times out at 60s and the fixture kills the process with SIGKILL — making any session-scoped browser test impossible to run locally. On Linux, the keychain backend errors fast and the auto-generate fallback writes to `.env`, so CI was unaffected. Setting the key explicitly matches the pattern already used in `auth_matrix_server`, `test_v2_engine_auth_cancel`, `test_v2_tool_activate_surface`, etc. ## Verification Local repro confirmed each failure mode (HTTP-only repro for the non-browser tests, server-side log inspection for the multi-user test). Reproduced the exact pending_gate=approval pattern, fixed it, verified the assertion semantics still hold: ``` $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py::test_wasm_tool_oauth_refresh_on_demand PASSED in 6.74s $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py::test_wasm_tool_first_chat_auth_attempt_emits_auth_url XFAIL in 93s $ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py -v --timeout=120 12 passed, 2 skipped, 3 xfailed (browser tests errored locally; they'll run cleanly in CI) ``` The browser-driven `test_mcp_same_server_multi_user_via_browser` couldn't be exercised locally (chromium can't launch under this shell sandbox), but the API + auto-approve flow it now relies on is exercised by an HTTP-equivalent repro and matches the pattern used in `test_settings_first_gmail_auth_then_chat_runs`. * test(e2e): set LLM_API_KEY in auth_sse_server fixture `test_auth_required_sse_without_duplicate_response` was failing in the merge queue on every PR (most recently bouncing #3197 and #3203 from the queue) because the `auth_sse_server` fixture never set `LLM_API_KEY` in the spawned ironclaw env. After #2572 added a missing- API-key check to the openai_compatible config validator (Apr 22), ironclaw rejected the env-supplied openai_compatible config, fell back to the NearAI default, hit "missing session token", and failed the turn before the github skill could even fire its 401. The chat thus reached `state: Failed` with no tool calls and no `onboarding_state/auth_required` event — which is exactly what the test asserted on, hence the consistent failure. Adding `LLM_API_KEY=mock-api-key` matches the value already used in every other e2e fixture (auth_matrix, conftest's ironclaw_server, v2_engine, etc.) and unblocks the assertion. Local run: PASSED in 8s. * fix(gateway): suppress duplicate assistant bubble after streamed response [skip-regression-check] The SSE `response` handler unconditionally called addMessage('assistant', data.content) even when stream_chunks had already populated and finalized a bubble for the same response. This stayed invisible in the common case but surfaced as a hard test failure under the path test_switching_back_preserves_in_progress_turn: 1. Send "What is 2+2?" on thread A — stream chunks start filling an assistant bubble with "data-streaming". 2. Switch to thread B mid-stream — container clears (history reload). 3. Switch back to thread A — history rehydration shows the in-progress turn with no response yet, so 0 assistant bubbles in DOM. 4. Stream chunks continue to fire for A — appendToLastAssistant creates a new bubble and accumulates the response into it. 5. response event fires — flushes any remaining buffer, removes the data-streaming flag (good) — then addMessage('assistant', content) creates a SECOND identical bubble. Result: locator(".message.assistant").filter(has_text="4") matches two elements, Playwright strict mode rejects the wait_for, the test fails. Outside the test, two identical bubbles render to the user. Fix: only call addMessage in the response handler when there was no in-flight streaming bubble. If one existed, the streamed content is already correct (chunks accumulate `data.content` verbatim) and the data-streaming flag has just been cleared. Non-streaming responses (no chunks fired) still take the addMessage branch. Regression coverage: tests/e2e/scenarios/test_message_persistence.py:: test_switching_back_preserves_in_progress_turn already reproduces this exact scenario and was failing in the merge queue. With this fix it passes; skip-regression-check used because the existing E2E test is the regression test, and the gateway doesn't have a JS unit test harness for SSE handler state. * fix(gateway): dedupe history-rendered SSE responses * test(e2e): set mock LLM API key in standalone fixtures * fix(e2e): make v2 approval tests deterministic * test(e2e): stabilize duplicate skill install assertion * test(e2e): assert duplicate install stays ungated * fix(skills): skip approval for disk-installed duplicates * fix(v2): honor no-op skill installs without approval * test(e2e): wait for pending send marker to clear --------- Co-authored-by: Firat Sertgoz <f@nuff.tech>

italic-jinxin and others added 3 commits April 17, 2026 15:17

feat(gateway): validate provider config completeness before activation

2a6f09d

Merge branch 'staging' into feat/provider-config

6a4e180

github-actions Bot added scope: channel/web Web gateway channel scope: config Configuration size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: regular 2-5 merged PRs labels Apr 17, 2026

gemini-code-assist Bot reviewed Apr 17, 2026

View reviewed changes

Comment thread crates/ironclaw_gateway/static/app.js Outdated

Comment thread crates/ironclaw_gateway/static/app.js Outdated

chore: minor

7d3601a

github-actions Bot added the scope: channel/wasm WASM channel runtime label Apr 17, 2026

chore: add ut

9b603b1

github-actions Bot added size: L 200-499 changed lines and removed size: M 50-199 changed lines labels Apr 17, 2026

serrrfirat reviewed Apr 17, 2026

View reviewed changes

Comment thread crates/ironclaw_gateway/static/app.js Outdated

serrrfirat reviewed Apr 17, 2026

View reviewed changes

Comment thread src/channels/web/handlers/llm.rs Outdated

italic-jinxin added 2 commits April 17, 2026 21:16

Merge branch 'staging' into feat/provider-config-hardening

1888cb8

think-in-universe requested a review from Copilot April 17, 2026 14:09

Copilot started reviewing on behalf of think-in-universe April 17, 2026 14:09 View session

Copilot AI reviewed Apr 17, 2026

View reviewed changes

Comment thread crates/ironclaw_gateway/static/app.js Outdated

Comment thread src/channels/web/handlers/settings.rs Outdated

fix(llm): fall back to NearAI when configured backend is unusable at …

5ab2156

…startup

github-actions Bot added scope: channel/cli TUI / CLI channel size: XL 500+ changed lines and removed size: L 200-499 changed lines labels Apr 17, 2026

italic-jinxin added 4 commits April 18, 2026 00:50

fix(web): treat freshly-entered custom provider api_key as configured

18eccd1

Merge branch 'staging' into feat/provider-config-hardening

63793c3

fix: Fix JavaScript syntax issues

1da8654

Merge branch 'fix/js-syntax' into feat/provider-config-hardening

3cb0d31

fix(llm): defer NearAI fallback until after secrets are hydrated

a3490cd

serrrfirat reviewed Apr 20, 2026

View reviewed changes

Comment thread crates/ironclaw_gateway/static/app.js Outdated

italic-jinxin added 4 commits April 20, 2026 16:46

fix(web): gate provider activation on base_url_required too, not just…

a4558c2

… API

chore: resolve conflicts

fab0606

Merge branch 'staging' into feat/provider-config-hardening

06a7e1b

fix(llm): sync DB llm_backend after NearAI fallback to keep UI consis…

73f26ed

…tent

github-actions Bot added contributor: experienced 6-19 merged PRs and removed contributor: regular 2-5 merged PRs labels Apr 20, 2026

Merge branch 'staging' into feat/provider-config-hardening

b4ed8ae

think-in-universe requested a review from Copilot April 21, 2026 06:22

Copilot started reviewing on behalf of think-in-universe April 21, 2026 06:22 View session

Copilot AI reviewed Apr 21, 2026

View reviewed changes

Comment thread src/config/mod.rs Outdated

Comment thread src/channels/web/features/settings/mod.rs Outdated

italic-jinxin added 2 commits April 21, 2026 15:19

chore: resolve conflicts

dd28f0b

think-in-universe approved these changes Apr 21, 2026

View reviewed changes

think-in-universe merged commit d4d5263 into staging Apr 22, 2026
17 checks passed

think-in-universe deleted the feat/provider-config-hardening branch April 22, 2026 03:15

This was referenced Apr 22, 2026

chore: promote staging to staging-promote/e29429d7-24742613700 (2026-04-22 03:54 UTC) #2820

Merged

chore: promote staging to main (2026-04-27 16:38 UTC) #2990

Merged

chore: promote staging to main (2026-04-27 18:18 UTC) #2995

Merged

henrypark133 mentioned this pull request Apr 29, 2026

chore: release #3059

Merged

This was referenced May 7, 2026

chore: release #3371

Closed

chore: release #3373

Closed

chore: release #3376

Merged

italic-jinxin self-assigned this May 11, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: model provider config hardening#2572

fix: model provider config hardening#2572
think-in-universe merged 20 commits into
stagingfrom
feat/provider-config-hardening

italic-jinxin commented Apr 17, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

think-in-universe commented Apr 17, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

think-in-universe left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

italic-jinxin commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Change Type

Linked Issue

Validation

Security Impact

Database Impact

Blast Radius

Rollback Plan

Review Follow-Through

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

think-in-universe commented Apr 17, 2026

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

think-in-universe left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

italic-jinxin commented Apr 17, 2026 •

edited

Loading