Skip to content

fix: model provider config hardening#2572

Merged
think-in-universe merged 20 commits into
stagingfrom
feat/provider-config-hardening
Apr 22, 2026
Merged

fix: model provider config hardening#2572
think-in-universe merged 20 commits into
stagingfrom
feat/provider-config-hardening

Conversation

@italic-jinxin
Copy link
Copy Markdown
Contributor

@italic-jinxin italic-jinxin commented Apr 17, 2026

Summary

Validate provider config completeness before activation and harden backend security.

  • Extract duplicated isConfigured check into shared isProviderConfigured() helper to prevent render/activate logic divergence
  • Show "Not Configured" badge on unconfigured providers and hide the "Use" button until API key is set
  • Add three activation guards in setActiveProvider(): missing API key, missing base URL (custom providers), and missing model — each shows a toast and auto-opens the config dialog
  • Add env_model to the model restoration chain (override → env → default)
  • Replace #[derive(Debug)] with manual Debug impl on CustomLlmProviderSettings and LlmBuiltinOverride to redact api_key from log output after secrets hydration
  • Validate custom provider base_url against SSRF at save time (validate_operator_base_url) instead of deferring to LlmConfig::resolve()
  • Change test_connection 400/422 response from ok: true to ok: false to avoid misleading green badge when model name or endpoint is wrong
  • Downgrade tracing::info! to debug! in config resolution paths to avoid polluting REPL/TUI

Change Type

  • Bug fix
  • New feature
  • Refactor
  • Documentation
  • CI/Infrastructure
  • Security
  • Dependencies

Linked Issue

Fixes #2514

Validation

  • cargo fmt --all -- --check
  • cargo clippy --all --benches --tests --examples --all-features -- -D warnings
  • cargo build
  • Relevant tests pass:
  • cargo test --features integration if database-backed or integration behavior changed
  • Manual testing:
  • If a coding agent was used and supports it, review-pr or pr-shepherd --fix was run before requesting review

Security Impact

Database Impact

Blast Radius

Rollback Plan

Review Follow-Through


Review track:

italic-jinxin and others added 3 commits April 17, 2026 15:17
…early validation

- Extract duplicated isConfigured logic into shared isProviderConfigured()
  helper in gateway app.js to prevent render/activate divergence
- Replace derive(Debug) with manual impl on CustomLlmProviderSettings and
  LlmBuiltinOverride to redact api_key from log output after hydration
- Add base_url SSRF validation at save time in validate_custom_providers()
  instead of deferring to LlmConfig::resolve()
- Downgrade tracing::info! to debug! in resolve_custom_provider and
  BEDROCK_REGION default (config resolution is not user-facing status)
- Change test_connection 400/422 from ok:true to ok:false to avoid
  misleading green badge when model name or endpoint is wrong

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added scope: channel/web Web gateway channel scope: config Configuration size: M 50-199 changed lines risk: medium Business logic, config, or moderate-risk modules contributor: regular 2-5 merged PRs labels Apr 17, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several enhancements to the LLM provider configuration and validation logic. Key changes include a new client-side check for provider configuration status, UI updates to display unconfigured badges, and stricter guards in the setActiveProvider function to ensure API keys, base URLs, and models are present before activation. On the backend, SSRF validation is now performed on custom provider base URLs, connection test error handling is improved to prevent misleading status badges, and sensitive API keys are now redacted in debug logs. Feedback suggests using const instead of var in JavaScript for better scoping and refactoring duplicated dialog logic into a helper function.

Comment thread crates/ironclaw_gateway/static/app.js Outdated
Comment thread crates/ironclaw_gateway/static/app.js Outdated
@github-actions github-actions Bot added the scope: channel/wasm WASM channel runtime label Apr 17, 2026
@github-actions github-actions Bot added size: L 200-499 changed lines and removed size: M 50-199 changed lines labels Apr 17, 2026
Comment thread crates/ironclaw_gateway/static/app.js Outdated
Comment thread src/channels/web/handlers/llm.rs Outdated
- isProviderConfigured() treated all custom providers as requiring an API
  key because CustomLlmProviderSettings has no api_key_required field
  (undefined !== false -> true). Derive needsKey from the adapter type
  instead: ollama runs locally and needs no key, other adapters do.
- Extract status-code interpretation from interpret_chat_response() into
  a pure interpret_chat_status() helper so it can be tested without
  mocking reqwest::Response.
- Add regression tests covering 400/422 -> ok:false (the bug that green
  badge was shown for wrong model/endpoint), plus 200/401 for coverage.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens LLM model provider configuration handling to prevent activation of incomplete providers (avoiding restart/crash loops) and improves security around secrets, SSRF validation, and UI feedback for configuration state.

Changes:

  • Redacts API keys from Rust Debug output for settings structs after secrets hydration, with regression tests.
  • Adds SSRF validation for custom provider base_url at settings-save time and improves /api/llm/test_connection status interpretation (400/422 now ok:false).
  • Updates gateway UI to detect configured providers consistently, show an “Not Configured” badge, hide “Use” until configured, add activation guards, and include env_model in model restoration.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/settings.rs Manual Debug impls to redact api_key; adds tests to ensure redaction.
src/config/llm.rs Lowers log level in config resolution paths to reduce noise.
src/channels/web/handlers/settings.rs Adds SSRF validation for custom provider base_url during save; adds tests.
src/channels/web/handlers/llm.rs Refactors status handling for test connection; fixes 400/422 to ok:false; adds tests.
src/channels/wasm/setup.rs Suppresses clippy too_many_arguments warning for startup registration helper.
crates/ironclaw_gateway/static/style.css Adds styling for “unconfigured provider” badge.
crates/ironclaw_gateway/static/i18n/en.js Adds strings for “Not Configured” and activation guard toasts.
crates/ironclaw_gateway/static/i18n/zh-CN.js Adds corresponding zh-CN strings for new provider UI states.
crates/ironclaw_gateway/static/i18n/ko.js Adds corresponding ko strings for new provider UI states.
crates/ironclaw_gateway/static/app.js Adds shared isProviderConfigured() + activation guards; updates provider rendering and model restore chain.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread crates/ironclaw_gateway/static/app.js Outdated
Comment thread src/channels/web/handlers/settings.rs Outdated
@think-in-universe
Copy link
Copy Markdown
Contributor

As discussed today, @italic-jinxin will help fix the crash loop when a model is misconfigured in this PR. This is working in progress.

@github-actions github-actions Bot added scope: channel/cli TUI / CLI channel size: XL 500+ changed lines and removed size: L 200-499 changed lines labels Apr 17, 2026
Comment thread crates/ironclaw_gateway/static/app.js Outdated
@github-actions github-actions Bot added contributor: experienced 6-19 merged PRs and removed contributor: regular 2-5 merged PRs labels Apr 20, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Hardens LLM provider configuration so incomplete/unsafe provider setups don’t lead to misleading UI state or backend startup failures (issue #2514), while also tightening SSRF defenses and secret handling.

Changes:

  • Add NearAI fallback resolution for unusable LLM configs and sync post-fallback backend/model state back to the DB.
  • Harden provider UX/activation logic (configured badges, “Use” gating, model restoration chain incl. env model) and fix test-connection status interpretation.
  • Redact API keys from Debug output and validate custom provider base_url for SSRF at save time.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
src/settings.rs Manual Debug impls to redact api_key; adds tests to ensure no secret leaks via logs.
src/config/mod.rs Switch final LLM resolve to fallback-aware path; persist post-fallback backend to DB and clear stale selected_model; adds alias normalization helpers + tests.
src/config/llm.rs Introduces resolve_with_fallback() (NearAI demotion), adds “unusable config” detection, and downgrades some config logs to debug!; adds extensive fallback tests.
src/channels/web/handlers/llm.rs Extracts status-code interpretation for testability, fixes 400/422 to report ok:false, and adds base_url_required to provider metadata.
src/channels/web/features/settings/mod.rs Validates custom provider base_url against SSRF on save and adds tests for safe/unsafe URLs.
crates/ironclaw_gateway/static/styles/surfaces/config.css Adds styling for an “unconfigured provider” badge.
crates/ironclaw_gateway/static/js/surfaces/config.js Centralizes “is configured” logic, gates activation, adds model/env-model restoration, and improves user feedback + config dialog opening.
crates/ironclaw_gateway/static/i18n/en.js Adds strings for new provider-state/toast messages and restart tooltips.
crates/ironclaw_gateway/static/i18n/ko.js Same i18n additions (Korean).
crates/ironclaw_gateway/static/i18n/zh-CN.js Same i18n additions (Simplified Chinese).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/config/mod.rs Outdated
Comment thread src/channels/web/features/settings/mod.rs Outdated
…oid spurious DB rewrites

normalize_backend() hardcoded only a handful of aliases, but
providers.json declares many more (claude → anthropic, bigmodel →
zai, github-copilot → github_copilot, open_router → openrouter, ...).
A DB value like `claude` resolves to `anthropic` via
resolve_registry_provider, which my fallback_fired() then mistook for
a cross-backend demotion — overwriting the user's `llm_backend` row
on every startup.
Copy link
Copy Markdown
Contributor

@think-in-universe think-in-universe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. The PR has been well tested end-to-end.

We're ready to merge.

@think-in-universe think-in-universe merged commit d4d5263 into staging Apr 22, 2026
17 checks passed
@think-in-universe think-in-universe deleted the feat/provider-config-hardening branch April 22, 2026 03:15
@henrypark133 henrypark133 mentioned this pull request Apr 29, 2026
ilblackdragon added a commit that referenced this pull request May 3, 2026
`test_auth_required_sse_without_duplicate_response` was failing in the
merge queue on every PR (most recently bouncing #3197 and #3203 from
the queue) because the `auth_sse_server` fixture never set
`LLM_API_KEY` in the spawned ironclaw env. After #2572 added a missing-
API-key check to the openai_compatible config validator (Apr 22),
ironclaw rejected the env-supplied openai_compatible config, fell back
to the NearAI default, hit "missing session token", and failed the
turn before the github skill could even fire its 401.

The chat thus reached `state: Failed` with no tool calls and no
`onboarding_state/auth_required` event — which is exactly what the
test asserted on, hence the consistent failure.

Adding `LLM_API_KEY=mock-api-key` matches the value already used in
every other e2e fixture (auth_matrix, conftest's ironclaw_server,
v2_engine, etc.) and unblocks the assertion. Local run: PASSED in 8s.
serrrfirat added a commit that referenced this pull request May 5, 2026
…ange (#3235)

* ci(e2e): replace deleted preflight test with tool_activate surface

The v2-engine E2E group still listed test_v2_kernel_auth_preflight.py,
but that file was removed in #2868 (engine-v2: callable-only available
actions) and replaced with test_v2_tool_activate_surface.py for the new
tool_activate / Activatable Integrations contract.

The Web E2E Full job is skipped on PR-level CI but runs in the merge
queue, so the bad path filter dequeued #3197 and #3203 with
"file or directory not found: test_v2_kernel_auth_preflight.py".

* test(e2e): unblock Live Canary auth lanes after engine-v2 contract change

The Live Canary "Auth Smoke", "Auth Full", and "Auth Live Seeded" jobs
have failed every scheduled run since 2026-05-01 (when the canary cut
over to main). Three tests in test_v2_auth_oauth_matrix.py drive the
failures, all rooted in the engine-v2 callable-only contract from #2868
that didn't exist when these tests were written.

## What was broken

`test_mcp_same_server_multi_user_via_browser`
After OAuth completes, sending "check mock mcp search" through each
user's browser opens an `approval` pending_gate on the first MCP tool
call (engine v2 default). The browser fixture has no auto-approve UI,
so the chat sat in `pending_gate` for the full 5-min Playwright
timeout — `expected_text_contains="Mock MCP search result"` could
never match because the assistant bubble never received any text.

`test_wasm_tool_oauth_refresh_on_demand`
Same shape: gmail call gates on `approval` before reaching the http
credential-injection layer that performs the OAuth refresh. Without
approving, refresh_count never went above 0, so the test failed with
"Timed out waiting for OAuth refresh request".

`test_wasm_tool_first_chat_auth_attempt_emits_auth_url`
Tested OLD engine-v2 behavior — that an LLM-emitted call to a not-yet-
authed extension would surface a `gate_required` Authentication event
with an auth URL. After #2868, the engine returns "action 'gmail' is
not callable in this execution context" instead, and `tool_activate`
became the model-facing enablement path. The mock LLM is canned to
emit tool calls directly, so this scenario can't be reproduced from a
scripted LLM until the canned response is updated.

## Fixes

- `_wait_for_tool_call`: accept a `token` kwarg so multi-user tests can
  poll/approve through a per-user identity. Backwards-compatible.
- `test_mcp_same_server_multi_user_via_browser`: drive approval through
  the per-user API while waiting for the tool to land. Drop the broken
  `expected_text_contains` predicate and the tied "Mock MCP search
  result" text assertions; the bearer-token isolation assertion (what
  this test actually exists to prove) is retained and unaffected.
- `test_wasm_tool_oauth_refresh_on_demand`: insert a `_wait_for_tool_call`
  approval step between `_send_chat` and `_wait_for_refresh_request`
  so the http credential layer actually runs.
- `test_wasm_tool_first_chat_auth_attempt_emits_auth_url`: marked xfail
  with an inline reason pointing at #2868 and the replacement coverage
  (`test_v2_tool_activate_surface.py`,
  `test_settings_first_gmail_auth_then_chat_runs`).
- Drop the now-unused `send_chat_and_wait_for_terminal_message` import.

## conftest fix

`ironclaw_server` now sets `SECRETS_MASTER_KEY` in the spawned env.
On macOS without it, `auto_generate_and_persist` blocks on a Keychain
authorization prompt that no one's home to click, so `wait_for_ready`
times out at 60s and the fixture kills the process with SIGKILL —
making any session-scoped browser test impossible to run locally.
On Linux, the keychain backend errors fast and the auto-generate
fallback writes to `.env`, so CI was unaffected. Setting the key
explicitly matches the pattern already used in
`auth_matrix_server`, `test_v2_engine_auth_cancel`,
`test_v2_tool_activate_surface`, etc.

## Verification

Local repro confirmed each failure mode (HTTP-only repro for the
non-browser tests, server-side log inspection for the multi-user
test). Reproduced the exact pending_gate=approval pattern, fixed it,
verified the assertion semantics still hold:

```
$ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py::test_wasm_tool_oauth_refresh_on_demand
PASSED in 6.74s

$ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py::test_wasm_tool_first_chat_auth_attempt_emits_auth_url
XFAIL in 93s

$ pytest tests/e2e/scenarios/test_v2_auth_oauth_matrix.py -v --timeout=120
12 passed, 2 skipped, 3 xfailed (browser tests errored locally;
they'll run cleanly in CI)
```

The browser-driven `test_mcp_same_server_multi_user_via_browser`
couldn't be exercised locally (chromium can't launch under this
shell sandbox), but the API + auto-approve flow it now relies on is
exercised by an HTTP-equivalent repro and matches the pattern used
in `test_settings_first_gmail_auth_then_chat_runs`.

* test(e2e): set LLM_API_KEY in auth_sse_server fixture

`test_auth_required_sse_without_duplicate_response` was failing in the
merge queue on every PR (most recently bouncing #3197 and #3203 from
the queue) because the `auth_sse_server` fixture never set
`LLM_API_KEY` in the spawned ironclaw env. After #2572 added a missing-
API-key check to the openai_compatible config validator (Apr 22),
ironclaw rejected the env-supplied openai_compatible config, fell back
to the NearAI default, hit "missing session token", and failed the
turn before the github skill could even fire its 401.

The chat thus reached `state: Failed` with no tool calls and no
`onboarding_state/auth_required` event — which is exactly what the
test asserted on, hence the consistent failure.

Adding `LLM_API_KEY=mock-api-key` matches the value already used in
every other e2e fixture (auth_matrix, conftest's ironclaw_server,
v2_engine, etc.) and unblocks the assertion. Local run: PASSED in 8s.

* fix(gateway): suppress duplicate assistant bubble after streamed response

[skip-regression-check]

The SSE `response` handler unconditionally called addMessage('assistant',
data.content) even when stream_chunks had already populated and
finalized a bubble for the same response. This stayed invisible in the
common case but surfaced as a hard test failure under the path
test_switching_back_preserves_in_progress_turn:

1. Send "What is 2+2?" on thread A — stream chunks start filling an
   assistant bubble with "data-streaming".
2. Switch to thread B mid-stream — container clears (history reload).
3. Switch back to thread A — history rehydration shows the in-progress
   turn with no response yet, so 0 assistant bubbles in DOM.
4. Stream chunks continue to fire for A — appendToLastAssistant creates
   a new bubble and accumulates the response into it.
5. response event fires — flushes any remaining buffer, removes the
   data-streaming flag (good) — then addMessage('assistant', content)
   creates a SECOND identical bubble.

Result: locator(".message.assistant").filter(has_text="4") matches
two elements, Playwright strict mode rejects the wait_for, the test
fails. Outside the test, two identical bubbles render to the user.

Fix: only call addMessage in the response handler when there was no
in-flight streaming bubble. If one existed, the streamed content is
already correct (chunks accumulate `data.content` verbatim) and the
data-streaming flag has just been cleared. Non-streaming responses
(no chunks fired) still take the addMessage branch.

Regression coverage: tests/e2e/scenarios/test_message_persistence.py::
test_switching_back_preserves_in_progress_turn already reproduces this
exact scenario and was failing in the merge queue. With this fix it
passes; skip-regression-check used because the existing E2E test is
the regression test, and the gateway doesn't have a JS unit test
harness for SSE handler state.

* fix(gateway): dedupe history-rendered SSE responses

* test(e2e): set mock LLM API key in standalone fixtures

* fix(e2e): make v2 approval tests deterministic

* test(e2e): stabilize duplicate skill install assertion

* test(e2e): assert duplicate install stays ungated

* fix(skills): skip approval for disk-installed duplicates

* fix(v2): honor no-op skill installs without approval

* test(e2e): wait for pending send marker to clear

---------

Co-authored-by: Firat Sertgoz <f@nuff.tech>
This was referenced May 7, 2026
@italic-jinxin italic-jinxin self-assigned this May 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: experienced 6-19 merged PRs risk: medium Business logic, config, or moderate-risk modules scope: channel/cli TUI / CLI channel scope: channel/wasm WASM channel runtime scope: channel/web Web gateway channel scope: config Configuration size: XL 500+ changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Issue: Incomplete Provider Configuration Causes Instance Startup Failure (Non-TEE)

4 participants