Skip to content

Commit 5b3c170

Browse files
ilblackdragonclaude
authored andcommitted
feat: DB-backed user management, admin secrets provisioning, and multi-tenant isolation (nearai#1626)
* feat: complete multi-tenant isolation — per-user budgets, model selection, heartbeat cycling Finishes the remaining isolation work from phases 2–4 of nearai#59: Phase 2 (DB scoping): Fix /status and /list commands to use _for_user DB variants instead of global queries that leaked cross-user job data. Phase 3 (Runtime isolation): Per-user workspace in routine engine's spawn_fire so lightweight routines run in the correct user context. Per-user daily cost tracking in CostGuard with configurable budget via MAX_COST_PER_USER_PER_DAY_CENTS. Multi-user heartbeat that cycles through all users with routines, auto-detected from GATEWAY_USER_TOKENS. Phase 4 (Provider/tools): Per-user model selection via preferred_model setting — looked up from SettingsStore on first iteration, threaded through ReasoningContext.model_override to CompletionRequest. Works with providers that support per-request model overrides (NearAI). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use selected_model setting key to match /model command persistence The dispatcher was reading "preferred_model" but the /model command (merged from staging) persists to "selected_model". Since set_setting is already per-user scoped, using the same key makes /model work as the per-user model override in multi-tenant mode. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: heartbeat hygiene, /model multi-tenant guard, RigAdapter model override Three follow-up fixes for multi-tenant isolation: 1. Multi-user heartbeat now runs memory hygiene per user before each heartbeat check, matching single-user heartbeat behavior. 2. /model command in multi-tenant mode only persists to per-user settings (selected_model) without calling set_model() on the shared LlmProvider. The per-request model_override in the dispatcher reads from the same setting. Added multi_tenant flag to AgentConfig (auto-detected from GATEWAY_USER_TOKENS). 3. RigAdapter now supports per-request model overrides by injecting the model name into rig-core's additional_params. OpenAI/Anthropic/Ollama API servers use last-key-wins for duplicate JSON keys, so the override takes effect via serde's flatten serialization order. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review — cost model attribution, heartbeat concurrency, pruning Fixes from review comments on nearai#1614: - Cost tracking now uses the override model name (not active_model_name) when a per-user model override is active, for accurate attribution. - Multi-user heartbeat runs per-user checks concurrently via JoinSet instead of sequentially, preventing one slow user from blocking others. - Per-user failure counts tracked independently; users exceeding max_failures are skipped (matching single-user semantics). - per_user_daily_cost HashMap pruned on day rollover to prevent unbounded growth in long-lived deployments. - Doc comment fixed: says "routines" not "active routines". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: /status ownership, model persistence scoping, heartbeat robustness Addresses second round of PR review on nearai#1614: - /status <job_id> DB path now validates job.user_id == requesting user before returning data (was missing ownership check, security fix). - persist_selected_model takes user_id param instead of owner_id, and skips .env/TOML writes in multi-tenant mode (these are shared global files). handle_system_command now receives user_id from caller. - JoinSet collection handles Err(JoinError) explicitly instead of silently dropping panicked tasks. - Notification forwarder extracts owner_id from response metadata in multi-tenant mode for per-user routing instead of broadcasting to the agent owner. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: cost pricing, fire_manual workspace, heartbeat concurrency cap Round 3 review fixes: - Cost tracking passes None for cost_per_token when model override is active, letting CostGuard look up pricing by model name instead of using the default provider's rates (serrrfirat). - fire_manual() now uses per-user workspace, matching spawn_fire() pattern (serrrfirat). - Removed MULTI_TENANT env var — multi-tenant mode is auto-detected solely from GATEWAY_USER_TOKENS presence (serrrfirat + Copilot). - Multi-user heartbeat capped at 8 concurrent tasks to avoid flooding the LLM provider (serrrfirat + Copilot). - Fixed inject_model_override doc comment accuracy (Copilot). - Added comment explaining multi-tenant notification routing priority (Copilot). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: user-scoped webhook endpoint for multi-tenant isolation Adds POST /api/webhooks/u/{user_id}/{path} — a user-scoped webhook endpoint that filters the routine lookup by user_id, preventing cross-user webhook triggering when paths collide. The existing /api/webhooks/{path} endpoint remains unchanged for backward compatibility in single-user deployments. Changes: - get_webhook_routine_by_path gains user_id: Option<&str> param - Both postgres and libsql implementations add AND user_id = ? filter when user_id is provided - New webhook_trigger_user_scoped_handler extracts (user_id, path) from URL and passes to shared fire_webhook_inner logic - Route registered on public router (webhooks are called by external services that can't send bearer tokens) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(db): add UserStore trait with users, api_tokens, invitations tables Foundation for DB-backed user management (nearai#1605): - UserRecord, ApiTokenRecord, InvitationRecord types in db/mod.rs - UserStore sub-trait (17 methods) added to Database supertrait - PostgreSQL migration V14__users.sql (users, api_tokens, invitations) - libSQL schema + incremental migration V14 - Full implementations for both PgBackend (via Store delegation) and LibSqlBackend (direct SQL in libsql/users.rs) - authenticate_token JOINs api_tokens+users with active/non-revoked checks; has_any_users for bootstrap detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(web): DB-backed auth, user/token/invitation API handlers Adds the web gateway layer for DB-backed user management (nearai#1605): Auth refactor: - CombinedAuthState wraps env-var tokens (MultiAuthState) + optional DbAuthenticator for DB-backed token lookup with LRU cache (60s TTL, 1024 max entries) - auth_middleware tries env-var tokens first, then DB fallback - From<MultiAuthState> impl for backward compatibility - main.rs wires with_db_auth when database is available API handlers (12 new endpoints): - /api/admin/users — CRUD: create, list, detail, update, suspend, activate - /api/tokens — create (returns plaintext once), list, revoke - /api/invitations — create, list, accept (creates user + first token) Token creation: 32 random bytes → hex plaintext, SHA-256 hash stored. Invitation accept: validates hash + pending + not expired, creates user record and first API token atomically. All test files updated for CombinedAuthState type change. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: startup env-var user migration + UserStore integration tests Completes the DB-backed user management feature (nearai#1605): - Startup migration: when GATEWAY_USER_TOKENS is set and the users table is empty, inserts env-var users + hashed tokens into DB. Logs deprecation notice when DB already has users. - hash_token made pub for reuse in migration code. - 10 integration tests for UserStore (libsql file-backed): - has_any_users bootstrap detection - create/get/get_by_email/list/update user lifecycle - token create → authenticate → revoke → reject cycle - suspended user tokens rejected - wrong-user token revoke returns false - invitation create → accept → user created - record_login and record_token_usage timestamps - libSQL migration: removed FK constraints from V14 (incompatible with execute_batch inside transactions). Tables in both base SCHEMA and incremental migration for fresh and existing databases. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove GATEWAY_USER_TOKENS, fix review feedback GATEWAY_USER_TOKENS never went to production — replaced entirely by DB-backed user management via /api/admin/users and /api/tokens. Removed: - UserTokenConfig struct and GATEWAY_USER_TOKENS env var parsing - user_tokens field from GatewayConfig - GatewayChannel::new_multi_auth() constructor - Env-var user migration block in main.rs (~90 lines) - multi_tenant auto-detection from GATEWAY_USER_TOKENS (now runtime via db.has_any_users() in app.rs) Review fixes (zmanian): - User ID generation: UUID instead of display-name derivation (nearai#1) - Invitation accept moved to public router (no auth needed) (nearai#3) - libSQL get_invitation_by_hash aligned with postgres: filters status='pending' AND expires_at > now (nearai#4) - UUID parse: returns DatabaseError::Serialization instead of unwrap_or_default (nearai#7) - PostgreSQL SELECT * replaced with explicit column lists (nearai#8) - Sort order aligned (both backends use DESC) (nearai#6) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add role-based access control (admin/member) Adds a `role` field (admin|member) to user management: Schema: - `role TEXT NOT NULL DEFAULT 'member'` added to users table in both PostgreSQL V14 migration and libSQL schema/incremental migration - UserRecord gains `role: String` field - UserIdentity gains `role: String` field, populated from DB in DbAuthenticator and defaulting to "admin" for single-user mode Access control: - AdminUser extractor: returns 403 Forbidden if role != "admin" - /api/admin/users/* handlers: require AdminUser (create, list, detail, update, suspend, activate) - POST /api/invitations: requires AdminUser (only admins can invite) - User creation accepts optional "role" param (defaults to "member") - Invitation acceptance creates users with "member" role Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(web): add Users admin tab to web UI Adds a Users tab to the web gateway UI for managing users, tokens, and roles without needing direct API calls. Features: - User list table with ID, name, email, role, status, created date - Create user form with display name, email, role selector - Suspend/activate actions per user - Create API token for any user (shows plaintext once with copy button) - Role badges (admin highlighted, member muted) - Non-admin users see "Admin access required" message - Keyboard shortcut: Cmd/Ctrl+5 switches to Users tab CSS: - Reuses routines-table styles for the user list - Badge, token-display, btn-small, btn-danger, btn-primary components Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: move Users to Settings subtab, bootstrap admin user on first run - Moved Users from top-level tab to Settings sidebar subtab (under Skills, before Theme toggle) - On first startup with empty users table, automatically creates an admin user from GATEWAY_USER_ID config with a corresponding API token from GATEWAY_AUTH_TOKEN. This ensures the owner appears in the Users panel immediately. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: user creation shows token, + Token works, no password save popup Three UI/UX fixes: 1. Create user now generates an initial API token and shows it in a copy-able banner instead of triggering the browser's password save dialog. Uses autocomplete="off" and type="text" for email field. 2. "+ Token" button works: exposed createTokenForUser/suspendUser/ activateUser on window for inline onclick handlers in dynamically generated table rows. Token creation uses showTokenBanner helper. 3. Admin token creation: POST /api/tokens now accepts optional "user_id" field when the requesting user is admin, allowing token creation for other users from the Users panel. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: use event delegation for user action buttons (CSP compliance) Inline onclick handlers are blocked by the Content-Security-Policy (script-src 'self' without 'unsafe-inline'). Switched to data-action attributes with a delegated click listener on the users table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add i18n for Users subtab, show login link on user creation - Added 'settings.users' i18n key for English and Chinese - Token banner now shows a full login link (domain/?token=xxx) with a Copy Link button, plus the raw token below - Login link works automatically via existing ?token= auto-auth Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: token hash mismatch — hash hex string, not raw bytes Critical auth bug: token creation hashed the raw 32 bytes (hasher.update(token_bytes)) but authentication hashed the hex-encoded string (hash_token(candidate) where candidate is the hex string the user sends). This meant newly created tokens could never authenticate. Fixed all 4 token creation sites (users, tokens, invitations create, invitations accept) to use hash_token(&plaintext_token) which hashes the hex string consistently with the auth lookup path. Removed now-unused sha2::Digest imports from handlers. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: remove invitation system The invitation flow is redundant — admin create user already generates a token and shows a login link. Invitations add complexity without value until email integration exists. Removed: - InvitationRecord struct and 4 UserStore trait methods - invitations table from V14 migration (postgres + both libsql schemas) - PostgreSQL Store methods (create/get/accept/list invitations) - libSQL UserStore invitation methods + row_to_invitation helper - invitations.rs handler file (212 lines) - /api/invitations routes (create, list, accept) - test_invitation_lifecycle test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: user deletion, self-service profile, per-user job limits, usage API Four multi-tenancy improvements: 1. User deletion cascade (DELETE /api/admin/users/{id}): Deletes user and all data across 11 user-scoped tables (settings, secrets, routines, memory, jobs, conversations, etc.). Admin only. 2. Self-service profile (GET/PATCH /api/profile): Users can read and update their own display_name and metadata without admin privileges. 3. Per-user job concurrency (MAX_JOBS_PER_USER env var): Scheduler checks active_jobs_for(user_id) before dispatch. Prevents one user from exhausting all job slots. 4. Usage reporting (GET /api/admin/usage?user_id=X&period=day|week|month): Aggregates LLM costs from llm_calls via agent_jobs.user_id. Returns per-user, per-model breakdown of calls, tokens, and cost. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add TenantCtx for compile-time tenant isolation Implements zmanian's architectural proposal from nearai#1614 review: two-tier scoped database access (TenantScope/AdminScope) so handler code cannot accidentally bypass tenant scoping. TenantScope (default): wraps user_id + Arc<dyn Database>, auto-binds user_id on every operation. ID-based lookups return None for cross- tenant resources. No escape hatch — forgetting to scope is a compile error. AdminScope (explicit opt-in): cross-tenant access for system-level components (heartbeat, routine engine, self-repair, scheduler, worker). TenantCtx bundles TenantScope + workspace + cost guard + per-user rate limiting. Constructed once per request in handle_message, threaded through all command handlers and ChatDelegate. Key changes: - New src/tenant.rs (~920 lines): TenantScope, AdminScope, TenantCtx, TenantRateState, TenantRateRegistry - All command handlers: user_id: &str → ctx: &TenantCtx - ChatDelegate: cost check/record/settings via self.tenant - System components: store field changed to AdminScope - Config: TENANT_MAX_LLM_CONCURRENT, TENANT_MAX_JOBS_CONCURRENT env vars - Fixes bug: /status <job_id> cross-tenant leak (now auto-filtered) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR nearai#1626 review feedback — bounded LRU cache, admin auth, FK cleanup - Replace HashMap with lru::LruCache in DbAuthenticator so the token cache is hard-bounded at 1024 entries (evicts LRU, not just expired) - Gate admin user endpoints (list/detail/update/suspend/activate) with AdminUser extractor so members get 403 instead of full access - Add api_tokens to libSQL delete_user cleanup list to prevent orphaned tokens (libSQL has no FK cascade) - Add regression tests for all three fixes Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: update CA certificates in runtime Docker image Ensures the root certificate bundle is current so TLS handshakes to services like Supabase succeed on Railway. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: resolve CI failures — formatting, no-panics check - Run cargo fmt on test code - Replace .expect() with const NonZeroUsize in DbAuthenticator - Add // safety: comments for test-only code in multi_tenant.rs Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: switch PostgreSQL TLS from rustls to native-tls rustls with rustls-native-certs fails TLS handshake on Railway's slim container (empty or stale root cert store). native-tls delegates to OpenSSL on Linux which handles system certs more reliably. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Adding user management api * feat: admin secrets provisioning API + API documentation - Add PUT/GET/DELETE /api/admin/users/{id}/secrets/{name} endpoints for application backends to provision per-user secrets (AES-256-GCM encrypted) - Add secrets_store field to GatewayState with builder wiring - Create docs/USER_MANAGEMENT_API.md with full API spec covering users, secrets, tokens, profile, and usage endpoints - Update web gateway CLAUDE.md route table Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add CatchPanicLayer to capture handler panics Without this, panics in async handlers silently drop the connection and the edge proxy returns a generic 503. Now panics are caught, logged, and returned as 500 with the panic message. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address second-round review — transactional delete, overflow, error logging - C1: Wrap PostgreSQL delete_user() in a transaction so partial cleanup can't leave users in a half-deleted state - M2: Add job_events to delete cleanup (both backends) — FK to agent_jobs without CASCADE would cause FK violation - H1/M4: Cap expires_in_days to 36500 before i64 cast (tokens + secrets) - H2: Validate target user exists before creating admin token to prevent orphan tokens on libSQL - H3: Log DB errors in DbAuthenticator::authenticate() instead of silently swallowing them as 401 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert to rustls with webpki-roots fallback for PostgreSQL TLS native-tls/OpenSSL caused silent crashes (segfaults in C code) during DB writes on Railway containers. Switch back to rustls but add webpki-roots as a fallback when system certs are missing, which was the original TLS handshake failure on slim container images. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: update Cargo.lock for rustls + webpki-roots Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * debug: add /api/debug/db-write endpoint to diagnose user insert failure Temporary diagnostic endpoint that tests DB INSERT to users table with full error logging. No auth required. Will be removed after debugging. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * perf: use cargo-chef in Dockerfile for dependency caching Splits the build into planner/deps/builder stages. Dependencies are only recompiled when Cargo.toml or Cargo.lock change. Source-only changes skip straight to the final build stage. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * debug: add tracing to users_create_handler Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: guard created_by FK in user creation handler The auth identity user_id (from owner_id scope) may not match any user row in the DB, causing a FK violation on the created_by column. Check that the referenced user exists before setting created_by. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * refactor: collapse GATEWAY_USER_ID into IRONCLAW_OWNER_ID Remove the separate GATEWAY_USER_ID config. The gateway now uses IRONCLAW_OWNER_ID (config.owner_id) directly for auth identity, bootstrap user creation, and workspace scoping. Previously, with_owner_scope() rebinds the auth identity to owner_id while keeping default_sender_id as the gateway user_id. This caused a FK constraint violation when creating users because the auth identity ("default") didn't match any user in the DB ("nearai"). Changes: - Remove GATEWAY_USER_ID env var and gateway_user_id from settings - Remove user_id field from GatewayConfig - Add owner_id parameter to GatewayChannel::new() - Remove with_owner_scope() method - Remove default_sender_id from GatewayState - Remove sender override logic in chat/approval handlers - Remove debug endpoint and tracing from prior debugging - Update all tests and E2E fixtures Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: hide Users tab for non-admins, remove auth hint text - Fetch /api/profile after login and hide the Users settings tab when the user's role is not admin - Remove the "Enter the GATEWAY_AUTH_TOKEN" hint from the login page since tokens are now managed via the admin panel, not .env files Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review feedback (auth 503, token expiry, CORS PATCH) - DB auth errors now return 503 instead of 401 so outages are distinguishable from invalid tokens (serrrfirat H3) - Cap expires_in_days to 36500 before i64 cast to prevent negative duration from u64 overflow (serrrfirat H1) - Add PATCH to CORS allowed methods for profile/user update endpoints (Copilot) - Stop leaking panic details in CatchPanicLayer response body Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: harden multi-tenant isolation — review fixes from nearai#1614 - Add conversation ownership checks in TenantScope: add_conversation_message, touch_conversation, list_conversation_messages (+ paginated), update_conversation_metadata_field, get_conversation_metadata now return NotFound for conversations not owned by the tenant (cross-tenant data leak) - Fix multi-user heartbeat: clear notify_user_id per runner so notifications persist to the correct user, not the shared config target - Move hygiene tasks into bounded JoinSet instead of unbounded tokio::spawn - Revert send_notification to private visibility (only used within module) - Use effective_model_name() for cost attribution in dispatcher so providers that ignore per-request model overrides report the actual model used - Fix inject_model_override doc comment; add 3 unit tests - Fix heartbeat doc comment ("routines" not "active routines") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add Jobs, Cost, Last Active columns to admin Users table Add UserSummaryStats struct and user_summary_stats() batch query to the UserStore trait (both PostgreSQL and libSQL backends). The admin users list endpoint now fetches per-user aggregates (job count, total LLM spend, most recent activity) in a single query and includes them inline in the response. The frontend Users table displays three new columns. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments and CI formatting failures CI fixes: - cargo fmt fixes in cli/mod.rs and db/tls.rs Security/correctness (from Copilot + serrrfirat + pranavraja99 reviews): - Token create: reject expires_in_days > 36500 with 400 instead of silent clamp - Token create: return 404 when admin targets non-existent user - User create: map duplicate email constraint violations to 409 Conflict - User create: remove unnecessary DB roundtrip for created_by (use AdminUser directly) - DB auth: log warn on DB lookup failures instead of silently swallowing errors - libSQL: add FK constraints on users.created_by and api_tokens.user_id Config fixes: - agent.multi_tenant: resolve from AGENT_MULTI_TENANT env var instead of hardcoding false - heartbeat.multi_tenant: fix doc comment to match actual env-var-based behavior UI fix: - showTokenBanner: pass correct title ("Token created!" vs "User created!") Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining review comments (round 2) - Secrets handlers: normalize name to lowercase before store operations, validate target user_id exists (returns 404 if not found) - libSQL: propagate cost parsing errors instead of unwrap_or_default() in both user_usage_stats and user_summary_stats - users_list_handler: propagate user_summary_stats DB errors (was silently swallowed with unwrap_or_default) - loadUsers: distinguish 401/403 (admin required) from other errors - Docs: fix users.id type (TEXT not UUID), remove "invitation flow" from V14 migration comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: i18n for Users tab, atomic user+token creation, transactional delete_user i18n: - Add 31 translation keys for all Users tab strings (en + zh-CN) - Wire data-i18n attributes on HTML elements (headings, buttons, inputs, table headers, empty state) - Replace all hard-coded strings in app.js with I18n.t() calls Atomic user+token creation: - Add create_user_with_token() to UserStore trait - PostgreSQL: wraps both INSERTs in conn.transaction() with auto-rollback - libSQL: wraps in explicit BEGIN/COMMIT with ROLLBACK on error - Handler uses single atomic call instead of two separate operations Transactional delete_user for libSQL: - Wrap multi-table DELETE cascade in BEGIN/COMMIT transaction - ROLLBACK on any error to prevent partial cleanup / inconsistent state - Matches the PostgreSQL implementation which already used transactions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert V14 migration to match deployed checksum [skip-regression-check] Refinery checksums applied migrations — editing V14__users.sql after it was already applied causes deployment failures. Revert the cosmetic comment changes (added in df40b22) to restore the original checksum. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: bootstrap onboarding flow for multi-tenant users The bootstrap greeting and workspace seeding only ran for the owner workspace at startup, so new users created via the admin API never received the welcome message or identity files (BOOTSTRAP.md, SOUL.md, AGENTS.md, USER.md, etc.). Three fixes: - tenant_ctx(): seed per-user workspace on first creation via seed_if_empty(), which writes identity files and sets bootstrap_pending when the workspace is truly fresh - handle_message(): check take_bootstrap_pending() on the tenant workspace (not the owner workspace) and persist the greeting to the user's own assistant conversation + broadcast via SSE - WorkspacePool: seed new per-user workspaces in the web gateway so memory tools also see identity files immediately The existing single-user bootstrap in Agent::run() is preserved for non-multi-tenant deployments. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address remaining PR review comments (round 3) - Docs: fix metadata description from "merge patch" to "full replacement" - Secrets: reject expires_in_days > 36500 with 400 (was silently clamped) - libSQL: CAST(SUM(cost) AS TEXT) in user_usage_stats and user_summary_stats to prevent SQLite numeric coercion from crashing get_text() — this was the root cause of the Copilot "SUM returns numeric type" comments - Add 3 regression tests: user_summary_stats (empty + with data) and user_usage_stats (multi-model aggregation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat: add role change support for users (admin/member toggle) - Add update_user_role() to UserStore trait + both backends (PostgreSQL and libSQL) - Extend PATCH /api/admin/users/{id} to accept optional "role" field with validation (must be "admin" or "member") - Add "Make Admin" / "Make Member" toggle button in Users table actions - Add i18n keys for role change (en + zh-CN) - Update API docs to document the role field on PATCH - Fix test helpers to use fmt_ts() for timestamps (was using SQLite datetime('now') which produces incompatible format for string comparison) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: show live LLM spend in Users table instead of only DB-recorded costs [skip-regression-check] Chat turns record LLM cost in CostGuard (in-memory) but don't create agent_jobs/llm_calls DB rows — those are only written for background jobs. The Users table was querying only from DB, so it showed $0.00 for users who only chatted. Now supplements DB stats with CostGuard.daily_spend_for_user() — the same source displayed in the status bar token counter. Shows whichever is larger (DB historical total vs live daily spend). Also falls back to last_login_at for "Last Active" when no DB job activity exists. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: persist chat LLM calls to DB and fix usage stats query Two root causes for zero usage stats: 1. ChatDelegate only recorded LLM costs to CostGuard (in-memory) — never to the llm_calls DB table. Added DB persistence via TenantScope.record_llm_call() after each chat LLM call, with job_id=NULL and conversation_id=thread_id. 2. user_summary_stats query only joined agent_jobs→llm_calls, missing chat calls (which have job_id=NULL). Redesigned query to start from llm_calls and resolve user_id via COALESCE(agent_jobs.user_id, conversations.user_id) — covers both job and chat LLM calls. Both PostgreSQL and libSQL queries updated. TenantScope gets record_llm_call() method. Tests updated for new query semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address review comments — input validation, cost semantics, panic safety [skip-regression-check] - Validate display_name: trim whitespace, reject empty strings (create + update) - Validate metadata: must be a JSON object, return 400 if not (admin + profile) - secrets_list_handler: verify target user_id exists before listing - Cost display: use DB total directly (chat calls now persist to DB), remove confusing max(db,live) CostGuard fallback - CatchPanicLayer: truncate panic payload to 200 chars in log to limit potential sensitive data exposure Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address Copilot round 5 — docs, secrets consistency, token name, provider field [skip-regression-check] - Docs: users.id note updated to "typically UUID v4 strings (bootstrap admin may use a custom ID)" - secrets_list_handler: return 503 when DB store is None (was falling through to list secrets without user validation) - tokens_create: trim + reject empty token name (matching display_name pattern) - LlmCallRecord.provider: use llm_backend ("nearai","openai") instead of model_name() which returns the model identifier - user_summary_stats zero-LLM users: acceptable — handler already falls back to 0 cost and last_login_at for missing entries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: DB auth returns 503 on outage, scheduler counts only blocking jobs From serrrfirat review: - DB auth: return Err(()) on database errors so middleware returns 503 instead of silently returning Ok(None) → 401 (auth miss) - Scheduler: add parallel_blocking_count_for() that uses is_parallel_blocking() (Pending/InProgress/Stuck) instead of is_active() for per-user concurrency — Completed/Submitted jobs no longer count against MAX_JOBS_PER_USER From Copilot: - CLAUDE.md: fix secrets route paths from {id} to {user_id} - token_hash: use .as_slice() instead of .to_vec() to avoid heap allocation on every token auth/creation call Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: immediate auth cache invalidation on security-critical actions (zmanian review nearai#6) Add DbAuthenticator::invalidate_user() that evicts all cached entries for a user. Called after: - Suspend user (immediate lockout, was 60s delay) - Activate user (immediate access restoration) - Role change (admin↔member takes effect immediately) - Token revocation (revoked token can't be reused from cache) The DbAuthenticator is shared (via Clone, which Arc-clones the cache) between the auth middleware and GatewayState, so handlers can evict entries from the same cache the middleware reads. Also from zmanian's review: - Items 1-5, 7-11 were already resolved in prior commits - Item 12 (String→enum for status/role) is deferred as a broader refactor Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: last-admin protection, usage stats for chat calls, UTF-8 safe panic truncation Last-admin protection: - Suspend, delete, and role-demotion of the last active admin now return 409 Conflict instead of succeeding and locking out the admin API - Helper is_last_admin() checks active admin count before destructive ops Usage stats: - user_usage_stats() now includes chat LLM calls (job_id=NULL) by joining via conversations.user_id, matching user_summary_stats() - Both PostgreSQL and libSQL queries updated Panic handler: - Use floor_char_boundary(200) instead of byte-index [..200] to prevent panic on multi-byte UTF-8 characters in panic messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: workspace seed race, bootstrap atomicity, email trim, secrets upsert response [skip-regression-check] - WorkspacePool: await seed_if_empty() synchronously after inserting into cache (drop lock first to avoid blocking), so callers see identity files immediately instead of racing a background task - Bootstrap admin: use create_user_with_token() for atomic user+token creation, matching the admin create endpoint - Email: trim whitespace, treat empty as None to prevent " " being stored and breaking uniqueness - Secrets PUT: report "updated" vs "created" based on prior existence - Last token_hash.to_vec() → .as_slice() in authenticate_token Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: disable unscoped webhook endpoint in multi-tenant mode [skip-regression-check] The original /api/webhooks/{path} endpoint looks up routines across all users. In multi-tenant mode, anyone who knows the webhook path + secret could trigger another user's routine. Now returns 410 Gone with a message pointing to the scoped endpoint /api/webhooks/u/{user_id}/{path}. Detection uses state.db_auth.is_some() — present only when DB-backed auth is enabled (multi-tenant). Single-user deployments are unaffected. From: standardtoaster review comment Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: webhook multi-tenant check, secrets error propagation, stale doc comment [skip-regression-check] - Webhook: use workspace_pool.is_some() instead of db_auth.is_some() for multi-tenant detection — db_auth is set for any DB deployment, workspace_pool is only set when has_any_users() was true at startup - Secrets: propagate exists() errors instead of unwrap_or(false) so backend outages surface as 500 rather than incorrect "created" status - Config: fix stale workspace_read_scopes comment referencing user_id Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 55e21b6 commit 5b3c170

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

49 files changed

+4620
-356
lines changed

Cargo.lock

Lines changed: 2 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,7 @@ refinery = { version = "0.8", features = ["tokio-postgres"], optional = true }
5757
tokio-postgres-rustls = { version = "0.13", optional = true }
5858
rustls = { version = "0.23", optional = true, default-features = false }
5959
rustls-native-certs = { version = "0.8", optional = true }
60+
webpki-roots = { version = "0.26", optional = true }
6061

6162
# Database - libSQL/Turso (optional embedded database)
6263
libsql = { version = "0.6", optional = true, default-features = false, features = ["core", "replication", "remote", "tls"] }
@@ -95,7 +96,7 @@ termimad = "0.34"
9596
# Channel integrations
9697
axum = { version = "0.8", features = ["ws"] }
9798
tower = "0.5"
98-
tower-http = { version = "0.6", features = ["trace", "cors", "set-header"] }
99+
tower-http = { version = "0.6", features = ["trace", "cors", "set-header", "catch-panic"] }
99100

100101
# Cron scheduling for routines
101102
cron = "0.13"
@@ -219,6 +220,7 @@ postgres = [
219220
"dep:tokio-postgres-rustls",
220221
"dep:rustls",
221222
"dep:rustls-native-certs",
223+
"dep:webpki-roots",
222224
"dep:postgres-types",
223225
"dep:refinery",
224226
"dep:pgvector",

Dockerfile

Lines changed: 34 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,45 +1,71 @@
11
# Multi-stage Dockerfile for the IronClaw agent (cloud deployment).
22
#
3+
# Uses cargo-chef for dependency caching — only rebuilds deps when
4+
# Cargo.toml/Cargo.lock change, not on every source edit.
5+
#
36
# Build:
47
# docker build --platform linux/amd64 -t ironclaw:latest .
58
#
69
# Run:
710
# docker run --env-file .env -p 3000:3000 ironclaw:latest
811

9-
# Stage 1: Build
10-
FROM rust:1.92-slim-bookworm AS builder
12+
# Stage 1: Install cargo-chef
13+
FROM rust:1.92-slim-bookworm AS chef
1114

1215
RUN apt-get update && apt-get install -y --no-install-recommends \
1316
pkg-config libssl-dev cmake gcc g++ \
1417
&& rm -rf /var/lib/apt/lists/* \
1518
&& rustup target add wasm32-wasip2 \
16-
&& cargo install wasm-tools
19+
&& cargo install cargo-chef wasm-tools
1720

1821
WORKDIR /app
1922

20-
# Copy manifests first for layer caching
23+
# Stage 2: Generate the dependency recipe (changes only when Cargo.toml/lock change)
24+
FROM chef AS planner
25+
2126
COPY Cargo.toml Cargo.lock ./
2227
COPY crates/ crates/
23-
24-
# Copy source, build script, tests, and supporting directories
2528
COPY build.rs build.rs
2629
COPY src/ src/
2730
COPY tests/ tests/
31+
COPY benches/ benches/
2832
COPY migrations/ migrations/
2933
COPY registry/ registry/
3034
COPY channels-src/ channels-src/
3135
COPY wit/ wit/
3236
COPY providers.json providers.json
33-
# [[bench]] entries in Cargo.toml require bench sources to exist for cargo to parse the manifest
37+
38+
RUN cargo chef prepare --recipe-path recipe.json
39+
40+
# Stage 3: Build dependencies (cached unless Cargo.toml/lock change)
41+
FROM chef AS deps
42+
43+
COPY --from=planner /app/recipe.json recipe.json
44+
RUN cargo chef cook --release --recipe-path recipe.json
45+
46+
# Stage 4: Build the actual binary (only recompiles ironclaw source)
47+
FROM deps AS builder
48+
49+
COPY Cargo.toml Cargo.lock ./
50+
COPY crates/ crates/
51+
COPY build.rs build.rs
52+
COPY src/ src/
53+
COPY tests/ tests/
3454
COPY benches/ benches/
55+
COPY migrations/ migrations/
56+
COPY registry/ registry/
57+
COPY channels-src/ channels-src/
58+
COPY wit/ wit/
59+
COPY providers.json providers.json
3560

3661
RUN cargo build --release --bin ironclaw
3762

38-
# Stage 2: Runtime
63+
# Stage 5: Runtime
3964
FROM debian:bookworm-slim
4065

4166
RUN apt-get update && apt-get install -y --no-install-recommends \
4267
ca-certificates libssl3 \
68+
&& update-ca-certificates \
4369
&& rm -rf /var/lib/apt/lists/*
4470

4571
COPY --from=builder /app/target/release/ironclaw /usr/local/bin/ironclaw

0 commit comments

Comments
 (0)