Skip to content

fix(config): keep startup LLM fallback in-memory only (#3229)#3324

Merged
ilblackdragon merged 3 commits into
mainfrom
fix/3229-llm-fallback-no-persist
May 7, 2026
Merged

fix(config): keep startup LLM fallback in-memory only (#3229)#3324
ilblackdragon merged 3 commits into
mainfrom
fix/3229-llm-fallback-no-persist

Conversation

@ilblackdragon
Copy link
Copy Markdown
Member

@ilblackdragon ilblackdragon commented May 7, 2026

Closes #3229.

Summary

  • A transient hydration failure on startup (DB read race, secrets decryption hiccup) used to overwrite the user's persisted llm_backend with nearai and delete selected_model from the DB, turning a one-off fallback into a permanent reversion of the configured provider. After the fix, the fallback stays in-memory: the runtime still demotes to NearAI to keep the instance usable (Issue: Incomplete Provider Configuration Causes Instance Startup Failure (Non-TEE) #2514 crash-loop prevention is preserved), but the user's DB row is left untouched, so the next restart with credentials available picks up their original config.
  • Removed the post-fallback DB write block in Config::resolve_llm_with_secrets_inner, plus the now-unused fallback_fired / normalize_backend helpers and their five unit tests (those existed solely to gate the deleted write). Net -88 lines on the bug fix.
  • Updated the stale comment in LlmConfig::resolve_nearai_fallback that referenced the removed DB sync.

Scope expansion: workflow brittleness fix

While debugging this PR's CI, GitHub failed to publish refs/pull/3324/head (confirmed by git ls-remote against origin — only pull/3324/merge exists; comparable open PRs have both refs). The regression-test-check.yml workflow was hard-failing on git fetch origin pull/N/head:pr-head before any of its actual checks could run. This was a pre-existing brittleness — the workflow assumes refs/pull/N/head is always published, which GitHub does not guarantee.

Bundled fix: derive the head SHA from github.event.pull_request.head.sha (always present in the event payload) instead of fetching refs/pull/N/head. The merge commit actions/checkout already produces with fetch-depth: 0 has the head SHA as its second parent, so it's locally reachable. Future PRs hit by the same GitHub ref-publishing bug will not need this manual unsticking.

The empty ci: re-trigger workflows commit in the history is residue from earlier debugging — it pushed a refreshed SHA to nudge GitHub into publishing the missing ref (didn't work, hence the workflow change).

Test plan

  • cargo fmt
  • cargo clippy --all --benches --tests --examples --all-features — zero warnings
  • cargo test --lib — 5614 passed
  • New regression test config::tests::startup_fallback_must_not_overwrite_persisted_user_backend — seeds llm_backend = "openrouter" + selected_model = "openai/gpt-4o-mini", calls the startup path with no secrets store (forces the unusable-config fallback), asserts the runtime falls back to NearAI and the DB row is unchanged. Would have failed on main (the row got rewritten to nearai and selected_model deleted).
  • scripts/pre-commit-safety.sh passes
  • Workflow change verified in CI on this PR (the very thing it's trying to fix)

🤖 Generated with Claude Code

A transient hydration failure (DB read race, secrets decryption hiccup)
during startup used to overwrite the user's persisted llm_backend with
"nearai" and delete selected_model from the DB, turning a one-off
fallback into a permanent reversion of the configured provider. The
fallback now stays in-memory: the runtime still demotes to NearAI to
keep the instance usable (#2514), but the user's DB-persisted selection
is preserved so the next restart with credentials available picks up
their original config.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added size: L 200-499 changed lines risk: low Changes to docs, tests, or low-risk modules contributor: core 20+ merged PRs labels May 7, 2026
ilblackdragon and others added 2 commits May 7, 2026 05:37
Empty commit to refresh GitHub's pull/3324/head ref, which was not
published when the PR was opened, blocking the regression-test
enforcement workflow's git fetch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stop fetching `refs/pull/${PR_NUMBER}/head:pr-head` and source the head
SHA from `github.event.pull_request.head.sha` instead. GitHub occasionally
fails to publish `refs/pull/N/head` for in-repo PRs (#3324 hit this — the
workflow could not get past its first git command, blocking the
regression-test enforcement check before it ran any of its actual
heuristics). The merge commit produced by actions/checkout has the head
SHA as its second parent and is fetched in full by fetch-depth: 0, so
the SHA is always locally reachable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 7, 2026 06:21
@github-actions github-actions Bot added scope: ci CI/CD workflows risk: medium Business logic, config, or moderate-risk modules and removed risk: low Changes to docs, tests, or low-risk modules labels May 7, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Prevents the startup-time “fallback to NearAI” safety mechanism from permanently overwriting a user’s persisted LLM provider/model in the DB when secrets/config hydration is temporarily unavailable, while keeping the crash-loop prevention behavior (#2514) intact.

Changes:

  • Removed the post-fallback DB synchronization in Config::resolve_llm_with_secrets_inner so the fallback remains in-memory only (#3229).
  • Updated NearAI fallback commentary in LlmConfig::resolve_nearai_fallback to reflect the new persistence behavior.
  • Adjusted the regression-test enforcement workflow to use the PR head SHA directly instead of fetching refs/pull/N/head.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File Description
src/config/mod.rs Stops persisting fallback backend/model into DB; adds regression test ensuring DB row is preserved on startup fallback.
src/config/llm.rs Updates fallback comments to reflect in-memory-only behavior for selected_model.
.github/workflows/regression-test-check.yml Makes the workflow more robust by comparing against the PR head SHA from the event payload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@ilblackdragon ilblackdragon merged commit 89443ac into main May 7, 2026
35 checks passed
@ilblackdragon ilblackdragon deleted the fix/3229-llm-fallback-no-persist branch May 7, 2026 15:23
This was referenced May 7, 2026
@ironclaw-ci ironclaw-ci Bot mentioned this pull request May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

contributor: core 20+ merged PRs risk: medium Business logic, config, or moderate-risk modules scope: ci CI/CD workflows size: L 200-499 changed lines

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM provider fallback persists to DB on startup, permanently destroying user's model/provider config

2 participants