fix(config): keep startup LLM fallback in-memory only (#3229) by ilblackdragon · Pull Request #3324 · nearai/ironclaw

ilblackdragon · 2026-05-07T05:12:34Z

Closes #3229.

Summary

A transient hydration failure on startup (DB read race, secrets decryption hiccup) used to overwrite the user's persisted llm_backend with nearai and delete selected_model from the DB, turning a one-off fallback into a permanent reversion of the configured provider. After the fix, the fallback stays in-memory: the runtime still demotes to NearAI to keep the instance usable (Issue: Incomplete Provider Configuration Causes Instance Startup Failure (Non-TEE) #2514 crash-loop prevention is preserved), but the user's DB row is left untouched, so the next restart with credentials available picks up their original config.
Removed the post-fallback DB write block in Config::resolve_llm_with_secrets_inner, plus the now-unused fallback_fired / normalize_backend helpers and their five unit tests (those existed solely to gate the deleted write). Net -88 lines on the bug fix.
Updated the stale comment in LlmConfig::resolve_nearai_fallback that referenced the removed DB sync.

Scope expansion: workflow brittleness fix

While debugging this PR's CI, GitHub failed to publish refs/pull/3324/head (confirmed by git ls-remote against origin — only pull/3324/merge exists; comparable open PRs have both refs). The regression-test-check.yml workflow was hard-failing on git fetch origin pull/N/head:pr-head before any of its actual checks could run. This was a pre-existing brittleness — the workflow assumes refs/pull/N/head is always published, which GitHub does not guarantee.

Bundled fix: derive the head SHA from github.event.pull_request.head.sha (always present in the event payload) instead of fetching refs/pull/N/head. The merge commit actions/checkout already produces with fetch-depth: 0 has the head SHA as its second parent, so it's locally reachable. Future PRs hit by the same GitHub ref-publishing bug will not need this manual unsticking.

The empty ci: re-trigger workflows commit in the history is residue from earlier debugging — it pushed a refreshed SHA to nudge GitHub into publishing the missing ref (didn't work, hence the workflow change).

Test plan

cargo fmt
cargo clippy --all --benches --tests --examples --all-features — zero warnings
cargo test --lib — 5614 passed
New regression test config::tests::startup_fallback_must_not_overwrite_persisted_user_backend — seeds llm_backend = "openrouter" + selected_model = "openai/gpt-4o-mini", calls the startup path with no secrets store (forces the unusable-config fallback), asserts the runtime falls back to NearAI and the DB row is unchanged. Would have failed on main (the row got rewritten to nearai and selected_model deleted).
scripts/pre-commit-safety.sh passes
Workflow change verified in CI on this PR (the very thing it's trying to fix)

🤖 Generated with Claude Code

A transient hydration failure (DB read race, secrets decryption hiccup) during startup used to overwrite the user's persisted llm_backend with "nearai" and delete selected_model from the DB, turning a one-off fallback into a permanent reversion of the configured provider. The fallback now stays in-memory: the runtime still demotes to NearAI to keep the instance usable (#2514), but the user's DB-persisted selection is preserved so the next restart with credentials available picks up their original config. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

gemini-code-assist · 2026-05-07T05:12:38Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Empty commit to refresh GitHub's pull/3324/head ref, which was not published when the PR was opened, blocking the regression-test enforcement workflow's git fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Stop fetching `refs/pull/${PR_NUMBER}/head:pr-head` and source the head SHA from `github.event.pull_request.head.sha` instead. GitHub occasionally fails to publish `refs/pull/N/head` for in-repo PRs (#3324 hit this — the workflow could not get past its first git command, blocking the regression-test enforcement check before it ran any of its actual heuristics). The merge commit produced by actions/checkout has the head SHA as its second parent and is fetched in full by fetch-depth: 0, so the SHA is always locally reachable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Prevents the startup-time “fallback to NearAI” safety mechanism from permanently overwriting a user’s persisted LLM provider/model in the DB when secrets/config hydration is temporarily unavailable, while keeping the crash-loop prevention behavior (#2514) intact.

Changes:

Removed the post-fallback DB synchronization in Config::resolve_llm_with_secrets_inner so the fallback remains in-memory only (#3229).
Updated NearAI fallback commentary in LlmConfig::resolve_nearai_fallback to reflect the new persistence behavior.
Adjusted the regression-test enforcement workflow to use the PR head SHA directly instead of fetching refs/pull/N/head.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
src/config/mod.rs	Stops persisting fallback backend/model into DB; adds regression test ensuring DB row is preserved on startup fallback.
src/config/llm.rs	Updates fallback comments to reflect in-memory-only behavior for `selected_model`.
.github/workflows/regression-test-check.yml	Makes the workflow more robust by comparing against the PR head SHA from the event payload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

github-actions Bot added size: L 200-499 changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels May 7, 2026

ilblackdragon and others added 2 commits May 7, 2026 05:37

ci: re-trigger workflows for #3229

dca8c49

Empty commit to refresh GitHub's pull/3324/head ref, which was not published when the PR was opened, blocking the regression-test enforcement workflow's git fetch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings May 7, 2026 06:21

github-actions Bot added scope: ci CI/CD workflows risk: medium Business logic, config, or moderate-risk modules and removed risk: low Changes to docs, tests, or low-risk modules labels May 7, 2026

Copilot started reviewing on behalf of ilblackdragon May 7, 2026 06:22 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

ilblackdragon merged commit 89443ac into main May 7, 2026
35 checks passed

ilblackdragon deleted the fix/3229-llm-fallback-no-persist branch May 7, 2026 15:23

This was referenced May 7, 2026

chore: release #3373

Closed

chore: release #3376

Merged

ironclaw-ci Bot mentioned this pull request May 8, 2026

chore: release #3388

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(config): keep startup LLM fallback in-memory only (#3229)#3324

fix(config): keep startup LLM fallback in-memory only (#3229)#3324
ilblackdragon merged 3 commits into
mainfrom
fix/3229-llm-fallback-no-persist

ilblackdragon commented May 7, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ilblackdragon commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Scope expansion: workflow brittleness fix

Test plan

Uh oh!

gemini-code-assist Bot commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ilblackdragon commented May 7, 2026 •

edited

Loading