Skip to content

fix(auth): bypass auth for local API requests during onboarding#386

Merged
penso merged 7 commits intomainfrom
stt-401-during-onboarding
Mar 10, 2026
Merged

fix(auth): bypass auth for local API requests during onboarding#386
penso merged 7 commits intomainfrom
stt-401-during-onboarding

Conversation

@penso
Copy link
Copy Markdown
Collaborator

@penso penso commented Mar 10, 2026

Summary

  • During first-time onboarding, the STT "Test" button fails with 401 AUTH_NOT_AUTHENTICATED because transcribeAudio() uses HTTP fetch (POST /api/sessions/{key}/upload) which goes through auth_gate, unlike the WS-based voice config RPCs
  • Adds a narrow bypass in auth_gate's Unauthorized branch: when the request is local, targets /api/* or /ws/*, and onboarding hasn't completed yet (wizard_status reports onboarded=false), allow the request through with Loopback identity
  • Updates NoopOnboardingService::wizard_status() to return "onboarded": true so the bypass doesn't activate in existing tests

Validation

Completed

  • cargo test -p moltis-gateway --test auth_middleware — all 53 tests pass (3 new)
  • cargo +nightly-2025-11-30 fmt --all -- --check — clean
  • cargo +nightly-2025-11-30 clippy -p moltis-gateway --all-targets -- -D warnings — clean

Remaining

  • ./scripts/local-validate.sh — full CI validation
  • Manual QA: fresh install → onboarding → voice step → STT test works without 401

Manual QA

  1. Start with a fresh data directory (no .onboarded sentinel)
  2. Complete auth setup step (password)
  3. Navigate to voice configuration step
  4. Click "Test" on STT provider — should succeed (previously got 401)
  5. Complete onboarding
  6. Verify that unauthenticated API requests now correctly return 401

Closes #378

During first-time onboarding, the STT "Test" button fails with 401
because transcribeAudio() uses HTTP fetch (POST /api/sessions/{key}/upload)
which goes through auth_gate, unlike the WS-based voice config RPCs.

Add a narrow bypass in auth_gate's Unauthorized branch: when the request
is local, targets /api/* or /ws/*, and onboarding hasn't completed yet
(wizard_status reports onboarded=false), allow the request through with
Loopback identity.

Also fix a pre-existing missing .send().await.unwrap() in the /ws
public route test and two nightly rustfmt formatting issues.

Closes #378

Entire-Checkpoint: 333257de8c57
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Mar 10, 2026

Greptile Summary

This PR fixes a 401 AUTH_NOT_AUTHENTICATED error on the STT "Test" button during onboarding by adding a narrow auth bypass in auth_gate: when a request is local, targets /api/* or /ws/*, and OnboardingService::wizard_status() reports onboarded=false, the request is allowed through with Loopback identity. NoopOnboardingService::wizard_status() is updated to return "onboarded": true so the bypass never activates in test environments. Three new integration tests (local during onboarding, local after onboarding, remote during onboarding) validate the boundary conditions, and a pre-existing missing .send().await.unwrap() in public_routes_accessible_without_auth is also fixed.

  • Critical: unwrap_or(false) in the wizard_status() result chain (line 171 of auth_middleware.rs) creates a fail-open condition — if the onboarding service returns an error or omits the "onboarded" key, !onboarded becomes true and the bypass activates. In a post-onboarding production environment, a transient service failure would allow all unauthenticated local API requests through. This should be unwrap_or(true).
  • Style: start_auth_server_with_onboarding and start_proxied_server_with_onboarding are near-identical; collapsing them into a single helper with a behind_proxy parameter would reduce maintenance burden.
  • Style: Both test helper functions call std::mem::forget(tmp), leaking the temporary directory until OS cleanup; moving tmp into the spawned task or returning it to the caller would be cleaner.

Confidence Score: 2/5

  • Not safe to merge: a fail-open default in the bypass condition can permanently disable local auth if the onboarding service errors post-onboarding.
  • The core idea (narrow local-only bypass conditioned on onboarded=false) is sound, but unwrap_or(false) inverts the safe-failure direction: a service error leaves auth permanently disabled for all local API requests rather than requiring authentication. This is a security regression that must be fixed before merging.
  • crates/gateway/src/auth_middleware.rs — specifically line 171's unwrap_or(false) in the onboarding bypass check.

Important Files Changed

Filename Overview
crates/gateway/src/auth_middleware.rs Adds onboarding auth bypass in the Unauthorized branch of auth_gate; unwrap_or(false) on the wizard_status() call creates a fail-open security hole if the onboarding service errors post-onboarding.
crates/gateway/tests/auth_middleware.rs Adds three targeted integration tests for the onboarding bypass (local during onboarding, local after onboarding, remote during onboarding); also fixes a pre-existing missing .send().await.unwrap() in public_routes_accessible_without_auth. Duplicate helper functions and std::mem::forget leaks are minor quality concerns.
crates/service-traits/src/lib.rs Updates NoopOnboardingService::wizard_status() to return "onboarded": true, correctly preventing the new onboarding bypass from activating in all tests that use the noop service.
crates/cli/src/node_commands.rs Formatting-only change: wraps a long node_id field assignment across multiple lines; no logic changes.

Sequence Diagram

sequenceDiagram
    participant C as Client (local)
    participant AG as auth_gate
    participant CA as check_auth
    participant OS as OnboardingService

    C->>AG: POST /api/sessions/{key}/upload (no session cookie)
    AG->>CA: check_auth(store, headers, is_local=true)
    CA-->>AG: AuthResult::Unauthorized

    Note over AG: New bypass logic (Unauthorized branch)
    AG->>AG: is_local=true AND path starts with /api/?
    AG->>OS: wizard_status()
    OS-->>AG: {"onboarded": false}
    AG->>AG: !onboarded → true → bypass activates
    AG->>AG: insert AuthIdentity { method: Loopback }
    AG->>C: 200 OK (request passes through)

    Note over AG: After onboarding completes
    C->>AG: POST /api/sessions/{key}/upload (no session cookie)
    AG->>CA: check_auth(store, headers, is_local=true)
    CA-->>AG: AuthResult::Unauthorized
    AG->>OS: wizard_status()
    OS-->>AG: {"onboarded": true}
    AG->>AG: !onboarded → false → bypass skipped
    AG->>C: 401 AUTH_NOT_AUTHENTICATED
Loading

Comments Outside Diff (2)

  1. crates/gateway/tests/auth_middleware.rs, line 637-641 (link)

    Duplicate test-server setup functions

    start_auth_server_with_onboarding and start_proxied_server_with_onboarding are identical except for the single behind_proxy boolean passed to GatewayState::with_options. Consider collapsing them into one helper that accepts a behind_proxy: bool parameter to avoid keeping two bodies in sync as the with_options signature evolves:

    async fn start_server_with_onboarding(
        onboarded: bool,
        behind_proxy: bool,
    ) -> (SocketAddr, Arc<CredentialStore>, Arc<GatewayState>) {
        // … single body …
    }

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. crates/gateway/tests/auth_middleware.rs, line 627-628 (link)

    std::mem::forget(tmp) leaks the temporary directory

    Both start_auth_server_with_onboarding and start_proxied_server_with_onboarding call std::mem::forget(tmp) to prevent the TempDir destructor from deleting the directory while the background server task is still referencing it. This effectively leaks the directory until the OS cleans it up.

    The idiomatic approach is to move tmp into the spawned task (or into the returned GatewayState) so it is dropped together with the server. Alternatively, passing it back to the caller so the test owns the lifetime avoids the leak without unsafe forgetting.

Last reviewed commit: b64ff28

Comment thread crates/gateway/src/auth_middleware.rs Outdated
.await
.ok()
.and_then(|v| v.get("onboarded").and_then(|v| v.as_bool()))
.unwrap_or(false);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fail-open default makes bypass permanent if onboarding service errors

unwrap_or(false) means that if wizard_status() returns an Err or a response without an "onboarded" boolean key, onboarded defaults to false, so !onboarded is true and the bypass activates. In a post-onboarding production deployment, if the onboarding service becomes unavailable (crash, misconfiguration, etc.) at any point, every unauthenticated local GET /api/* or POST /api/* request will be allowed through with Loopback identity — completely bypassing auth.

The safe default should be unwrap_or(true): if we cannot determine whether onboarding is done, assume it is done and require authentication.

Suggested change
.unwrap_or(false);
.unwrap_or(true);

@codspeed-hq
Copy link
Copy Markdown
Contributor

codspeed-hq Bot commented Mar 10, 2026

Merging this PR will not alter performance

✅ 39 untouched benchmarks
⏩ 5 skipped benchmarks1


Comparing stt-401-during-onboarding (5e5ba40) with main (16585c0)

Open in CodSpeed

Footnotes

  1. 5 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b64ff28a60

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/gateway/src/auth_middleware.rs Outdated
// session cookie (e.g. STT test button uses HTTP fetch, not WS).
// Allow them through with Loopback identity so the onboarding
// flow can complete without requiring a login first.
if is_local && (path.starts_with("/api/") || path.starts_with("/ws/")) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Restrict onboarding auth bypass to required routes only

This branch bypasses authentication for every local /api/* and /ws/* request while onboarded=false, which is much broader than the STT upload use case and exposes privileged handlers (for example /api/config and /api/restart in crates/web/src/lib.rs) without credentials during onboarding. I checked crates/auth/src/locality.rs and it already documents a known case where proxied traffic can be misclassified as local (bare_proxy_without_env_var_appears_local), so this new broad bypass can turn that misclassification into unauthenticated remote API access until onboarding completes.

Useful? React with 👍 / 👎.

penso added 4 commits March 10, 2026 17:20
Prefix unused `config` params with underscore and replace `vec!` with
array literal in `generate_launchd_plist` / `generate_systemd_unit`.

Entire-Checkpoint: 6b182a1aca6f
Revert NoopOnboardingService to not include "onboarded" field — the
chat_ui test depends on noop returning a not-onboarded state.

Instead, change the auth bypass default from unwrap_or(false) to
unwrap_or(true): when the onboarded field is absent (noop service)
or wizard_status() fails, assume onboarded (safe default = require
auth). The bypass only activates when onboarded is explicitly false.

Revert node-host service.rs to main — the clippy warnings and test
failure are pre-existing and unrelated to this PR.

Entire-Checkpoint: a6217c77e389
The generate_launchd_plist() and generate_systemd_unit() functions
accepted a ServiceConfig parameter but never used it, causing clippy
unused-variable warnings. Wire the config fields (gateway-url,
device-token, node-id, name, working-dir, timeout) into the generated
service templates as CLI arguments.

Entire-Checkpoint: dc2070e5181f
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7fad7cc001

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread crates/node-host/src/service.rs Outdated
Comment on lines +425 to +427
let mut extra_args = format!(
" --gateway-url {} --device-token {}",
config.gateway_url, config.device_token
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove unsupported flags from node service command

moltis node run only accepts the optional --timeout flag (crates/cli/src/node_commands.rs), but this unit template now always appends --gateway-url and --device-token (plus other optional flags), so the service process will exit on argument parsing before it can start the node host. This breaks newly installed node services on Linux (and the same pattern is present in the launchd generator), so the service command should stay compatible with the Run subcommand or the CLI needs matching Run options.

Useful? React with 👍 / 👎.

@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 10, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

- Restrict onboarding auth bypass to narrow path set (sessions, bootstrap,
  gon, tailscale, ws) instead of all /api/* and /ws/* paths
- Add test verifying privileged endpoints remain protected during onboarding
- Consolidate duplicate test helpers into single start_server_with_onboarding()
- Fix node-host: only pass --timeout to service templates (the only flag
  `moltis node run` accepts); fix pre-existing broken tests that asserted
  config fields in template output

Entire-Checkpoint: 38493b795e8f
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 702bb6476c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

path.starts_with("/api/sessions/") // STT upload / media
|| path.starts_with("/api/bootstrap")
|| path == "/api/gon"
|| path.starts_with("/api/tailscale/")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove tailscale routes from onboarding auth bypass

Fresh evidence after the earlier narrowing: is_onboarding_bypass_path() still allows "/api/tailscale/", and tailscale_router() exposes POST /configure (configure_handler) that applies Serve/Funnel mode changes. Because this bypass runs before the normal 401 path, an unauthenticated request is accepted whenever onboarded=false and the request is classified local; with the documented bare-proxy locality limitation, that can permit remote callers to mutate networking config during onboarding.

Useful? React with 👍 / 👎.

/// `/api/restart` are never reachable without credentials.
#[cfg(feature = "web-ui")]
fn is_onboarding_bypass_path(path: &str) -> bool {
path.starts_with("/api/sessions/") // STT upload / media
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Limit sessions bypass to upload/media endpoints only

Using path.starts_with("/api/sessions/") bypasses auth for the whole sessions namespace, not just STT upload/media; this includes /api/sessions/{session_key}/history, which returns conversation history from api_session_history_handler. That means unauthenticated local requests can read chat data while onboarding is incomplete, which is broader than the stated upload use case and can expose existing data in installs that are still onboarded=false (for example when config/data exists without the onboarding sentinel).

Useful? React with 👍 / 👎.

Resolve conflict in crates/node-host/src/service.rs — main also fixed
the unused-config clippy warning (prefixed _config). Kept our version
that passes --timeout to the service templates since `node run` accepts
it, and restored the `config` parameter name accordingly.

Entire-Checkpoint: 2a076d9d6f26
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e5ba407d8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

#[cfg(feature = "web-ui")]
fn is_onboarding_bypass_path(path: &str) -> bool {
path.starts_with("/api/sessions/") // STT upload / media
|| path.starts_with("/api/bootstrap")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict bootstrap bypass to onboarding-safe payload

Allowlisting path.starts_with("/api/bootstrap") for unauthenticated local requests during onboarding exposes more data than the wizard needs: BootstrapQuery defaults every include_* flag to true in crates/web/src/api.rs, so calling /api/bootstrap without query params returns sessions/projects/models metadata and identity. The onboarding UI only needs the reduced payload (include_* = false in crates/web/src/assets/js/onboarding-view.js), so this bypass lets any unauthenticated local caller read existing workspace metadata while onboarded=false (e.g., partial/migrated installs).

Useful? React with 👍 / 👎.

@penso penso merged commit 679f700 into main Mar 10, 2026
50 of 53 checks passed
@penso penso deleted the stt-401-during-onboarding branch March 10, 2026 22:27
penso added a commit that referenced this pull request Mar 23, 2026
* fix(auth): bypass auth for local API requests during onboarding

During first-time onboarding, the STT "Test" button fails with 401
because transcribeAudio() uses HTTP fetch (POST /api/sessions/{key}/upload)
which goes through auth_gate, unlike the WS-based voice config RPCs.

Add a narrow bypass in auth_gate's Unauthorized branch: when the request
is local, targets /api/* or /ws/*, and onboarding hasn't completed yet
(wizard_status reports onboarded=false), allow the request through with
Loopback identity.

Also fix a pre-existing missing .send().await.unwrap() in the /ws
public route test and two nightly rustfmt formatting issues.

Closes #378

Entire-Checkpoint: 333257de8c57

* fix: resolve pre-existing clippy warnings in node-host service

Prefix unused `config` params with underscore and replace `vec!` with
array literal in `generate_launchd_plist` / `generate_systemd_unit`.

Entire-Checkpoint: 6b182a1aca6f

* style: fix nightly rustfmt formatting in node-host service

Entire-Checkpoint: cc7596feb984

* fix: revert NoopOnboardingService and node-host changes

Revert NoopOnboardingService to not include "onboarded" field — the
chat_ui test depends on noop returning a not-onboarded state.

Instead, change the auth bypass default from unwrap_or(false) to
unwrap_or(true): when the onboarded field is absent (noop service)
or wizard_status() fails, assume onboarded (safe default = require
auth). The bypass only activates when onboarded is explicitly false.

Revert node-host service.rs to main — the clippy warnings and test
failure are pre-existing and unrelated to this PR.

Entire-Checkpoint: a6217c77e389

* fix(node-host): wire ServiceConfig fields into launchd/systemd templates

The generate_launchd_plist() and generate_systemd_unit() functions
accepted a ServiceConfig parameter but never used it, causing clippy
unused-variable warnings. Wire the config fields (gateway-url,
device-token, node-id, name, working-dir, timeout) into the generated
service templates as CLI arguments.

Entire-Checkpoint: dc2070e5181f

* fix: address PR review comments

- Restrict onboarding auth bypass to narrow path set (sessions, bootstrap,
  gon, tailscale, ws) instead of all /api/* and /ws/* paths
- Add test verifying privileged endpoints remain protected during onboarding
- Consolidate duplicate test helpers into single start_server_with_onboarding()
- Fix node-host: only pass --timeout to service templates (the only flag
  `moltis node run` accepts); fix pre-existing broken tests that asserted
  config fields in template output

Entire-Checkpoint: 38493b795e8f
jmikedupont2 pushed a commit to meta-introspector/moltis that referenced this pull request Mar 23, 2026
…is-org#386)

* fix(auth): bypass auth for local API requests during onboarding

During first-time onboarding, the STT "Test" button fails with 401
because transcribeAudio() uses HTTP fetch (POST /api/sessions/{key}/upload)
which goes through auth_gate, unlike the WS-based voice config RPCs.

Add a narrow bypass in auth_gate's Unauthorized branch: when the request
is local, targets /api/* or /ws/*, and onboarding hasn't completed yet
(wizard_status reports onboarded=false), allow the request through with
Loopback identity.

Also fix a pre-existing missing .send().await.unwrap() in the /ws
public route test and two nightly rustfmt formatting issues.

Closes moltis-org#378

Entire-Checkpoint: 333257de8c57

* fix: resolve pre-existing clippy warnings in node-host service

Prefix unused `config` params with underscore and replace `vec!` with
array literal in `generate_launchd_plist` / `generate_systemd_unit`.

Entire-Checkpoint: 6b182a1aca6f

* style: fix nightly rustfmt formatting in node-host service

Entire-Checkpoint: cc7596feb984

* fix: revert NoopOnboardingService and node-host changes

Revert NoopOnboardingService to not include "onboarded" field — the
chat_ui test depends on noop returning a not-onboarded state.

Instead, change the auth bypass default from unwrap_or(false) to
unwrap_or(true): when the onboarded field is absent (noop service)
or wizard_status() fails, assume onboarded (safe default = require
auth). The bypass only activates when onboarded is explicitly false.

Revert node-host service.rs to main — the clippy warnings and test
failure are pre-existing and unrelated to this PR.

Entire-Checkpoint: a6217c77e389

* fix(node-host): wire ServiceConfig fields into launchd/systemd templates

The generate_launchd_plist() and generate_systemd_unit() functions
accepted a ServiceConfig parameter but never used it, causing clippy
unused-variable warnings. Wire the config fields (gateway-url,
device-token, node-id, name, working-dir, timeout) into the generated
service templates as CLI arguments.

Entire-Checkpoint: dc2070e5181f

* fix: address PR review comments

- Restrict onboarding auth bypass to narrow path set (sessions, bootstrap,
  gon, tailscale, ws) instead of all /api/* and /ws/* paths
- Add test verifying privileged endpoints remain protected during onboarding
- Consolidate duplicate test helpers into single start_server_with_onboarding()
- Fix node-host: only pass --timeout to service templates (the only flag
  `moltis node run` accepts); fix pre-existing broken tests that asserted
  config fields in template output

Entire-Checkpoint: 38493b795e8f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: STT test fails with 401 during onboarding

1 participant