fix(credential-proxy): proactively refresh expiring Anthropic OAuth tokens (v2 port of #1102)#2363
Open
chiptoe-svg wants to merge 86 commits into
Open
fix(credential-proxy): proactively refresh expiring Anthropic OAuth tokens (v2 port of #1102)#2363chiptoe-svg wants to merge 86 commits into
chiptoe-svg wants to merge 86 commits into
Conversation
Revert OneCLI integration and add built-in credential proxy that reads API key or OAuth token from .env, injecting credentials into container API requests without exposing secrets. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pino was replaced with a built-in logger on main. For branches with baileys (WhatsApp), pino resolves as a transitive dependency of @whiskeysockets/baileys. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Picks up main's changes while preserving native credential proxy: - Built-in logger replacing pino/pino-pretty - Removed unused deps (yaml, zod, @vitest/coverage-v8) - CLAUDE.md template copy fix (nanocoai#1391) - MAX_MESSAGES_PER_PROMPT config - Kept credential proxy (not OneCLI) for credential injection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
… into HEAD # Conflicts: # src/config.ts # src/container-runner.test.ts # src/container-runner.ts # src/index.ts
- src/auth-switch.ts: ported from fork; toggles api-key/oauth by commenting/uncommenting ANTHROPIC_API_KEY in .env; adapted logger import to v2's log/log.js convention - src/credential-proxy.ts: integrated fork's OAuth token refresh logic (getOAuthToken with 5-min buffer, ~/.claude/.credentials.json fallback, in-memory cache) and OpenAI routing (/openai/* prefix); fixed logger → log import to match v2 convention - src/credential-proxy.test.ts: updated vi.mock from logger.js to log.js Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Port fork's photo/voice/PDF/auth features onto the v2 Chat SDK bridge adapter pattern via an onInbound interceptor chain. Also copies image.ts from the fork (logger import updated to v2 log module). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Recovered from the prior session that ran /add-codex and got
SIGTERM'd mid-build. The /add-codex skill had already:
- Copied codex provider source files into container/agent-runner/
and src/providers/
- Wired self-registration imports into both barrels
- Added codex CLI install to container/Dockerfile
Then SIGTERM hit during ./container/build.sh, leaving these in the
working tree. Carrying them as their own commit so the history shows
the codex install separately from the v2-startup auto-migration that
got bundled with them in the original safety pin.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These deletes happened automatically at the first v2 host startup — src/claude-md-compose.ts:migrateGroupsToClaudeLocal() runs idempotently and renames each group's CLAUDE.md to CLAUDE.local.md (per-group memory the v2 spawn re-composes around). groups/global/ is removed entirely since shared global content moved into container/CLAUDE.md. The renamed CLAUDE.local.md files aren't tracked (they're gitignored under groups/<folder>/), so this commit just records the deletion of the old tracked files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Was untracked at conversation start; bundled into the original safety pin commit by accident. Splitting into its own commit for clarity. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings in the migration tooling that was supposed to seed v2.db from a
v1 install but had never been run on this machine. Used in-place with
NANOCLAW_MIGRATE_SKIP=preflight,owner,guide,safety,copy,rebuild,verify
to seed the (empty) central DB from store/messages.db + .env.
Includes:
migrate-v2.sh v1→v2 entry point (sibling-clone or in-place)
setup/migrate.ts sequencer
setup/migrate/*.ts detect/extract/seed/jid/owner/guide modules
.nanoclaw-migrations/ audit trail of what was extracted
Three seeder bugs were patched in the resulting data after running:
- messaging_groups.platform_id stayed in v1 'tg:' format instead of
being normalized to v2 'telegram:' format
- users.id was 'telegram:tg:<id>' (double-prefixed) — owner-propose
bypasses userIdFromJid for is_main fallback path
- engage_mode='pattern',pattern='@felix' for v1 requires_trigger=0
case (which means "trigger optional"); should be pattern='.'
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…backup
The v2 rewrite reintroduced OneCLI gateway calls in container-runner and
the approvals module, which fail-open with 401s on this install (which
runs the native credential-proxy skill, not OneCLI). Without OneCLI
auth, every container spawn threw and the agent stopped responding.
Native credential proxy already existed in v2 (src/credential-proxy.ts,
PROXY_BIND_HOST in container-runtime.ts) but wasn't wired through to
container env injection or to the proxy listen address.
Changes:
- container-runner.ts: drop onecli.ensureAgent / applyContainerConfig;
inject ANTHROPIC_BASE_URL and OPENAI_BASE_URL pointing at
host.docker.internal:CREDENTIAL_PROXY_PORT so containers route
through the proxy with placeholder credentials.
- index.ts: pass PROXY_BIND_HOST to startCredentialProxy so on Linux
the proxy binds where containers can actually reach it (docker0 IP
or 0.0.0.0 fallback), not just 127.0.0.1.
- modules/approvals/index.ts: stop starting the OneCLI long-poll
approval handler — it 401s on app.onecli.sh and the credential
approval flow isn't used here.
Plus periodic central-DB backup (the original ask):
- db/backup.ts: SQLite online .backup() to data/backups/, ring of 60
timestamped files (~1 hour at sweep cadence). Failures logged, never
thrown — must not break the sweep.
- host-sweep.ts: call backupCentralDb() at the start of each tick.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-commit format:fix hook auto-reformatted these during a separate commit; carrying the diff into git as its own change so future diffs on these files don't carry unrelated noise. No semantic changes — purely line-collapse and import reflow. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two issues kept OpenAI tools (image gen, etc.) failing in containers
even after the native-proxy port:
1. OPENAI_BASE_URL was set to .../v1 — but the proxy multiplexes
providers via path prefix /openai/* (credential-proxy.ts:111). With
no /openai prefix, the proxy treated requests as Anthropic and
forwarded /v1/chat/completions to api.anthropic.com. Fix: set
OPENAI_BASE_URL to .../openai/v1 so the proxy strips /openai and
forwards /v1/<endpoint> to api.openai.com.
2. OPENAI_API_KEY was never set in container env. OpenAI SDKs refuse
to initialize without it even when OPENAI_BASE_URL is overridden
(the SDK's own env-presence check, not server-side). Set a
placeholder so the SDK is happy; the proxy substitutes the real
key in the Authorization header before forwarding upstream.
Verified end-to-end: container makes POST to host:3001/openai/v1/...
with Authorization: Bearer placeholder, proxy returns a valid
chatcmpl-* response from gpt-4o-mini.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This install runs the native credential proxy (src/credential-proxy.ts),
not the OneCLI gateway. Earlier commits in this branch (72422af, 58edbc5)
removed OneCLI from the runtime path; this commit removes the rest.
Removed:
- src/modules/approvals/onecli-approvals.ts (handler module — was
no longer started; deleted)
- @onecli-sh/sdk dependency from package.json (lockfile regenerated;
-1 package, no transitives needed elsewhere)
- ONECLI_URL / ONECLI_API_KEY exports from src/config.ts
- resolveOneCLIApproval / ONECLI_ACTION import + branch in
src/modules/approvals/response-handler.ts (always returned false
once the handler stopped registering; removing simplifies the
handler down to its DB-backed-approvals path)
CLAUDE.md updates:
- Dropped the v1→v2 migration "STOP — READ THIS FIRST" banner —
migration is complete on this install
- Replaced the "Secrets / Credentials / OneCLI" section with a
native-proxy explanation matching what the code actually does
(proxy bind, container env vars, OAuth handling, rotation, how
to add a new provider)
- Dropped /init-onecli skill from the operational-skills list
- Updated container-runner.ts row in the file table; added a row
for src/credential-proxy.ts; dropped the dead src/onecli-approvals.ts
row (file never existed at that path on this branch anyway)
Verified host still boots clean with no "OneCLI approval handler
started" line, TypeScript build passes, agent round-trip still works.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a per-agent-group "draft" workspace, a web UI for editing it, and
glue plumbing (per-group model column, /model and /playground Telegram
commands).
DB:
- migration 014 adds `agent_groups.model` (per-group model override).
- createAgentGroup() now actually inserts the model column (was being
silently dropped previously, masked by the column not existing).
Core library (src/agent-builder/core.ts):
- Pure DB+filesystem API for draft lifecycle: createDraft, applyDraft,
discardDraft, listDrafts, listAgentGroups, diffDraftAgainstTarget,
getDraftStatus.
- Channel helpers: ensureDraftMessagingGroup, ensureDraftWiring —
auto-create the messaging_group + wiring per draft so test sessions
flow through the standard router.
- 18 vitest cases + a CLI smoke script (scripts/agent-builder-smoke.ts).
Channel adapter (src/channels/playground.ts + public/):
- Registers as channel_type='playground'. Each draft gets its own
auto-created messaging_group. Test chat reuses the standard
router/container/delivery path; adapter.deliver() pushes outbound
messages over Server-Sent Events to the connected browser.
- Lazy-start: HTTP server NOT bound at host boot. /playground on
Telegram calls startPlaygroundServer() which binds the port and
issues a magic-link URL. /playground stop closes it.
- Magic-link auth: per-restart random token, single-use, sets a 7-day
HttpOnly cookie. /playground stop or 30-min idle scrubs the cookie.
- 0.0.0.0 by default with magic-link auth; PLAYGROUND_BIND_HOST=
127.0.0.1 forces SSH-tunnel-only access.
- Public host autodetected from os.networkInterfaces(), preferring
public over private IPv4. PLAYGROUND_PUBLIC_HOST overrides.
UI (5 panes via topbar tabs):
- Picker: list drafts + non-draft agent groups, create/discard/open.
- Chat: SSE-streamed conversation with the draft agent.
- Persona: CLAUDE.local.md editor + reload + save.
- Skills: enable/disable per-draft, anthropic/skills library browser
with compatibility badges (compatible/partial/incompatible). Library
cached at data/playground/library-cache/.
- Files: file tree + textarea editor. Path-traversal guarded.
- Diff: side-by-side draft vs target.
- Topbar provider toggle: claude/codex; switching kills the running
container and bumps sessions.agent_provider so the next message uses
the new provider.
- Status badge: ● unsaved / ✓ in sync / ⚠ target deleted.
/model Telegram command (src/channels/telegram.ts + src/model-switch.ts):
- /model — show current provider + model + suggested-models hint list.
- /model <name> — persist to agent_groups.model, kill running
container so next message uses it. Trust-first: any string accepted,
server validates.
Provider sync fix (src/container-runner.ts):
- ensureRuntimeFields now also writes the resolved provider + model
into container.json, so the in-container runner picks the right
runtime. Without this, host-side resolveProviderName picked codex
correctly but the container's loadConfig fell through to 'claude'
because container.json didn't have a provider field.
Codex provider:
- Default model bumped from gpt-5.4-mini to gpt-5.5.
- container/agent-runner/src/index.ts forwards container.json's
`model` into CODEX_MODEL/ANTHROPIC_MODEL env so providers honor it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SKILL.md updated for the Phase 12 multi-tier role system on the
classroom branch:
- description + summary advertise admin/instructor/TA/student tiers
- prerequisites note the Phase 12.1 main-side dependency
(gate signature change, commit 0441eaf) — needed for role-aware
playground gating
- copy list adds class-pair-instructor.ts and class-pair-ta.ts
- imports list grows to five lines (greeting + instructor + ta +
playground-gate + container-env)
- provision example shows --instructors and --tas flags
- "What members experience after pairing" section split by role
(student / TA / instructor get different greeting text)
- customization section explains where each role's persona lives
+ that the class-shared.md is symlinked from data/
REMOVE.md and VERIFY.md don't need changes — they already describe
the file set as a list rather than enumerating individual files,
and the verify script just checks tsc/tests pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two changes:
1. plans/gws-mcp.md (new) — Phase 13 design. A thin Node MCP host-side
using per-API @googleapis/* packages (small, Google-published,
no monolith bloat), fronted by a per-agent-scoping relay. V1
surface is exactly two tools — drive_doc_read_as_markdown and
drive_doc_write_from_markdown — closing the gap rclone leaves
(rclone gives binary .gdoc pointers; this gives editable text).
Reuses ~/.config/gws/credentials.json (already minted).
Architecture rejects three alternatives explicitly:
- @googleworkspace/cli backend → subprocess overhead, no benefit
over googleapis directly.
- googleapis monolith → 250+ auto-generated clients, dragged the
VPS to a halt at install time. Per-API packages instead.
- Community Python MCP (taylorwilsdon's) → adds Python runtime
and we don't control the surface; usable but not preferred.
Per-agent role scoping is the security boundary (uses existing
canAccessAgentGroup primitive from Phase 12). V2 expansions
(Sheet/Calendar/Gmail) gated on actual use cases.
2. Remove .claude/skills/add-gmail-tool/ and .claude/skills/add-gcal-tool/
(the OneCLI-only Google MCP wrappers). Both required the OneCLI
gateway to inject OAuth tokens; this install uses the native
credential proxy and never installed OneCLI. The skills couldn't
run here. Phase 13's /add-gws-tool will replace them with a
working skill that uses the credential proxy.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
V1 of the Google Workspace MCP layer. Two tools, all of the auth
plumbing lives in the existing credential proxy.
src/credential-proxy.ts:
- New `/googleapis/*` route. Strip prefix, forward to
googleapis.com.
- Reads ~/.config/gws/credentials.json (authorized_user OAuth
format with refresh_token). Caches access_token in memory
until 5 min before expiry.
- On miss, POSTs to oauth2.googleapis.com/token with grant_type=
refresh_token to mint a new access token. Standard Google OAuth
— no library needed; raw https.request keeps the proxy single-
file and dependency-light.
- Substitutes Authorization: Bearer placeholder → real token on
every forwarded request.
- 502 with actionable message if no creds present.
src/container-runner.ts:
- Inject GWS_BASE_URL=http://<gateway>:3001/googleapis at spawn,
alongside the existing ANTHROPIC_BASE_URL / OPENAI_BASE_URL.
container/agent-runner/src/mcp-tools/gws.ts:
- drive_doc_read_as_markdown({ fileId }): GETs Drive's export
endpoint with mimeType=text/markdown, returns the markdown.
- drive_doc_write_from_markdown({ markdown, title?, fileId? }):
multipart upload to Drive's resumable-upload endpoint with
metadata { mimeType: application/vnd.google-apps.document }.
POST creates a new Doc; PATCH (when fileId given) replaces.
Returns { fileId, webViewLink, name }.
- Uses fetch() against GWS_BASE_URL with Authorization: Bearer
placeholder. The proxy substitutes the real token.
container/agent-runner/src/mcp-tools/index.ts: appends `import './gws.js';`.
container/skills/google-workspace/SKILL.md: rewritten end-to-end.
Was previously documenting a `gws` CLI that wasn't actually
installed in the Dockerfile (Felix has been reading misleading
instructions). Now describes the two MCP tools above + workflow
examples + explicit list of what's NOT in V1 (Sheets / Calendar /
Gmail / Slides come later, gated on real use cases).
Phase 13.3 (per-agent role gating at the proxy URL layer) is
deferred — V1 is full-access for the instructor. When class roles
need it, we add URL-pattern matching to the proxy that consults
canAccessAgentGroup.
345/345 host tests green, host + container tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex CLI (and similar tools) look for AGENTS.md at the project root.
This file imports CLAUDE.md so all the project-level instructions
(architecture, file map, conventions, supply-chain rules, gotchas)
apply regardless of which agent is editing — none of NanoClaw's
structure depends on whether the developer is using Claude Code or
Codex.
Adds a small Codex-specific notes section covering:
- apply_patch for edits (vs. Claude Code's Edit tool)
- bash + ripgrep for search (vs. Grep tool)
- cat/head/sed for reading (vs. Read tool)
- update_plan is the in-session widget; plans/<feature>.md is the
durable on-disk plan that survives sessions
- Pre-commit prettier hook leaves uncommitted reformat output —
every agent hits this; commit a follow-up "chore: apply prettier
formatting" when it bites
- Push proactively at phase boundaries
Everything that doesn't change between Claude Code and Codex
(architecture, file paths, no-stash rule, supply-chain policy,
container/host runtime split, branch model) is just listed
explicitly so a Codex-driven session doesn't second-guess the
existing rules.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Codex CLI (and similar tools) look for AGENTS.md at the project root.
This file imports CLAUDE.md so all the project-level instructions
(architecture, file map, conventions, supply-chain rules, gotchas)
apply regardless of which agent is editing — none of NanoClaw's
structure depends on whether the developer is using Claude Code or
Codex.
Adds a small Codex-specific notes section covering:
- apply_patch for edits (vs. Claude Code's Edit tool)
- bash + ripgrep for search (vs. Grep tool)
- cat/head/sed for reading (vs. Read tool)
- update_plan is the in-session widget; plans/<feature>.md is the
durable on-disk plan that survives sessions
- Pre-commit prettier hook leaves uncommitted reformat output —
every agent hits this; commit a follow-up "chore: apply prettier
formatting" when it bites
- Push proactively at phase boundaries
Everything that doesn't change between Claude Code and Codex
(architecture, file paths, no-stash rule, supply-chain policy,
container/host runtime split, branch model) is just listed
explicitly so a Codex-driven session doesn't second-guess the
existing rules.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
src/gws-auth.ts — reusable OAuth helpers:
- loadOAuthClient: read existing client_id/secret from
~/.config/gws/credentials.json
- buildAuthorizationUrl: Google OAuth consent URL with prompt=
consent + access_type=offline (required to get a refresh_token
back; without these Google omits it on re-auth)
- exchangeCodeForTokens: code → access_token + refresh_token via
POST to oauth2.googleapis.com/token
- writeCredentialsJson: atomic write at 0600 with merged
client_id/secret + new tokens. Defensive against missing
refresh_token (preserves old if Google declines to issue new).
No HTTP server, no CLI logic — pure helpers, reusable across the
one-off CLI today and Phase 14's magic-link server.
scripts/gws-authorize.ts — one-off CLI that wraps the helpers:
- Spins up a localhost HTTP server (default :8765)
- Prints the consent URL for the user to open
- Receives Google's redirect, exchanges code, writes credentials
- Documents the SSH-port-forward workflow for VPS setups
Solves "Google OAuth not configured" / 502 errors when the cached
refresh token has expired or been revoked (typical after ~6 months
of disuse for unverified clients, or when the user revokes access
in Google Account settings).
plans/gws-mcp.md — adds Phase 14 section:
- Per-student Google OAuth, mirroring Phase 9's Codex auth pattern
- student_google-auth.ts storage (analog to student-auth.ts)
- Magic-link flow added to existing student-auth-server (port 3003)
- Per-student bearer lookup in credential-proxy keyed on agent
group's student_user_id metadata
- /gauth Telegram command (analog to /login)
- GCP Console one-time: add NANOCLAW_PUBLIC_URL/google-auth/callback
as authorized redirect URI
Why Phase 14 matters: Phase 13's V1 routes every agent's GWS calls
through the instructor's bearer. Single-instructor case it's fine.
For class deploy it's a real boundary problem — student agents
could read instructor's Docs by guessing fileIds. Per-student OAuth
makes Google enforce the boundary instead of relying on URL parsing
in our proxy.
Today's gws-authorize.ts is the foundation: when Phase 14 lands, the
magic-link flow imports the same exchangeCodeForTokens +
writeCredentialsJson helpers; only the storage path and redirect URI
differ.
345/345 tests green, tsc clean.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the always-on "Web hosting" block in container/CLAUDE.md with a discoverable skill. The old block forbade `cloudflared`/`ngrok`/ `localtunnel`, but agents routed around it via `npx cloudflared` and their own `node server.js`. Piling on more prohibitions wasn't helping. The new skill names the loophole tools explicitly (npx, npm exec, trycloudflare.com, pages.dev, etc.) and pairs the publish recipe with positive design guidance — typography, color, motion, layout — adapted from Anthropic's frontend-design skill (Apache 2.0, attributed). The goal is to make the right path the obvious path, not just the permitted one. Shared prompt drops from ~35 lines on this topic to one pointer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug: when an outbound message had an odd count of `*` or `_`, the legacy- Markdown sanitizer dropped EVERY occurrence of those chars to keep Telegram's parser happy. That silently mangled URLs whose path contained an underscore — e.g. `http://host/telegram_main/the-view/` became `http://host/telegrammain/the-view/` after sanitize, and the user got a 404 from a link they couldn't have typoed (they clicked it). Fix: backslash-escape stray `\\*`/`\\_` instead of dropping them. Telegram's legacy Markdown renders `\\_` as a literal underscore, so URLs survive verbatim. Same logic for `\\*`. Even-balanced messages still pass through untouched, so legitimate `_italic_` and `*bold*` rendering is preserved. This unblocks every group folder slug containing an underscore, including the classroom convention (`student_01`, `ta_01`, `instructor_01`). Regression test added for the original `telegram_main` failure case. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two follow-ups from a real-world miss where the agent (a) didn't invoke the skill and used a tunnel anyway, and (b) sent the URL before writing the files, giving the user a blank page. - container/CLAUDE.md: shared prompt now says "first action is `Skill: make-website`" instead of "invoke the skill" — imperative, hard to read as optional. Names trycloudflare/ngrok explicitly so an ambitious agent can't loophole into them. - skill: publish recipe is now a 4-step ordered list with an explicit curl verification before sending the URL. The prior wording let the agent post a URL optimistically while assets were still being written. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Container skill that turns an agent into the librarian for a persistent, interlinked markdown wiki under /workspace/agent/wiki/, with raw inputs under /workspace/agent/sources/. Implements the three operations from the pattern: - Ingest: per-source, sequential, 5–15 pages touched per source (summary + entities + concepts + cross-refs + index + log). - Query: read index.md first, synthesize with citations, file noteworthy answers back as new pages. - Lint: contradictions, orphans, stale claims, missing cross-refs, data gaps. Append findings to log.md. Spelled out so the agent doesn't fall back to RAG-style "summarize each file in isolation" behavior — the whole point is per-source integration into a compounding artifact, not parallel skim. Installed via /add-karpathy-llm-wiki. Per-group activation requires scaffolding wiki/ + sources/ trees and a CLAUDE.local.md section (both gitignored under groups/*); this commit only ships the skill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rk cross-provider Codex doesn't have Claude Code's discoverable Skill tool. With only CLAUDE.md/CLAUDE.local.md inlined into baseInstructions, agents running on Codex couldn't act on phrases like "your first action is Skill: make-website" — the tool didn't exist, the skill bodies weren't loaded, and the per-group prompt's references just dangled. This adds composeAvailableSkills(): scans the per-group skill symlinks at /home/node/.claude/skills/ (the same set Claude Code sees, scoped by container.json's skill selection), parses each SKILL.md's frontmatter for name + description, and emits a markdown discovery list as part of baseInstructions. The list directs Codex agents to Read /app/skills/<name>/SKILL.md when a description matches the user's request — mirroring the "lazy-load full body" approach Claude Code uses internally rather than inlining tens of KB up front. Net effect: persona, CLAUDE.local.md, and the skill catalog all work the same on Claude or Codex (or any future non-Claude provider that uses the same agent-runner shim). Switching providers is now a config change, not a content rewrite. Tests cover frontmatter parsing edge cases (missing description, missing name field, no frontmatter at all), determinism (alphabetical sort), and the empty-dir/no-eligible-skills paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Switching providers required editing container.json AND updating sessions.agent_provider in v2.db AND stopping the running container. Three places, three different commands; forgetting any one leaves the system in a half-switched state (running container still on the old provider, or session row disagreeing with config file). `pnpm exec tsx scripts/switch-provider.ts <group> <provider>` does all three in order, prints what changed, and is idempotent (no-op when already on the requested provider). Resolves the group folder to its agent_groups row, updates sessions.agent_provider for every session in that group, and stops every running container whose name matches the group prefix. Provider name is intentionally not whitelisted — registered providers are an open set determined at runtime by which provider modules the barrel imports, and this script shouldn't gate that. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds `/provider` to the Telegram slash-command registry, mirroring `/model` and `/auth`: /provider — show current provider + hint list /provider codex — switch group to Codex /provider claude — switch back to Claude Behind it, factor the switch logic out of scripts/switch-provider.ts into src/provider-switch.ts (setProvider/getCurrentProvider/listProviderHints) so the CLI and the Telegram handler share one implementation. Trust-first: any string is accepted; an unregistered provider surfaces server-side at next spawn rather than being whitelisted at command time. Idempotent — already-on-provider returns ok:false reason:no-change so the chat reply can say "no change" honestly instead of misleading "switched". Update path is atomic across all three places provider state lives: container.json, sessions.agent_provider, and any running container. Tests cover container.json read/write, no-change path, no-container-json path, group-not-found path, sessions.agent_provider update, and that unrelated container.json fields (skills, packages, mcpServers) survive the switch. Uses TEST_GROUPS_DIR env to point at a tmpdir without mocking the config module. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Pre-existing whitespace cleanup from the GWS work. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
# Conflicts: # .claude/skills/add-gmail-tool/SKILL.md # CLAUDE.md # container/Dockerfile # migrate-v2.sh # package.json # pnpm-lock.yaml # setup/verify.ts # src/index.ts
Upstream's AgentGroup interface requires model: string|null, but a few callers (channel-approval.ts:272, agent-route.test.ts cross-agent-group guard, host-core.test.ts ~10 spots) construct AgentGroup objects without that field. Build broke after merging upstream/main; this fixes the call sites with model: null. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…okens
OAuth tokens issued by `~/.claude/.credentials.json` (or the macOS
keychain) expire ~1 hour after issuance. Today the proxy only re-reads
the file when the in-memory cache hits expiry, then trusts whatever's
in the file. On a host that doesn't have Claude CLI actively keeping
the file fresh — the typical NanoClaw-as-systemd-service deployment on
a Linux server — the file goes stale, the proxy returns an expired
access token, and containers start getting 401s with no recovery path.
Adds a self-sufficient refresh flow that the proxy owns:
- `readFullOAuthCredentials()` — reads `~/.claude/.credentials.json`
first; on macOS, falls back to the `Claude Code-credentials` keychain
entry. Keychain branch is platform-gated (`process.platform ===
'darwin'`) so Linux installs are a clean no-op.
- `saveOAuthCredentials()` — atomic write back to the credentials file
(tmp + rename, 0600), so process restarts pick up the latest token.
- `refreshAnthropicOAuthToken()` — POST to platform.claude.com's
/v1/oauth/token with grant_type=refresh_token. Single-flight guarded
so concurrent in-flight requests share one refresh.
- `getOAuthToken()` is now async and triggers a refresh when:
* token is past `expiresAt - REFRESH_BUFFER_MS` (5 min), or
* `expiresAt` is undefined (the macOS keychain path doesn't store it
— refresh now so we learn the real expiry).
Static tokens from `.env` (CLAUDE_CODE_OAUTH_TOKEN / ANTHROPIC_AUTH_TOKEN)
still win and are never refreshed. The Google OAuth path is unchanged.
Adapted from PR nanocoai#1102 (nanocoai#1102) which was authored
against v1; ported to v2's credential-proxy.ts shape and naming.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 10, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
This fix is for users on the native credential proxy only. OneCLI users (
/init-onecli) have a separate vault gateway that handles credential refresh in its own daemon —src/credential-proxy.tsis not in their request path. The functions added here are no-ops for that audience.The audience that benefits:
/use-native-credential-proxy, or any fork that didn't install OneCLI)~/.claude/.credentials.jsonviaclaude login— notclaude setup-token, which already issues long-lived tokens)systemd --user/ launchd, where Claude Code CLI is not running and therefore not refreshing the file)Summary
Adapts PR #1102 to the current v2 credential-proxy. Same problem, same approach, different file shape.
OAuth tokens from
~/.claude/.credentials.json(or the macOS keychain) issued byclaude loginexpire ~1 hour after issuance. Today the proxy only re-reads the file on cache expiry and trusts whatever's there. On a host that doesn't have Claude CLI actively keeping the file fresh — the typical NanoClaw-as-systemd-service deployment on a Linux server — the file goes stale, the proxy returns an expired access token, and containers start getting 401s with no recovery path.(
claude setup-tokenissues year-long tokens specifically for unattended use, so single-instructor installs that authenticated that way don't see the bug — but multi-user forks where each user doesclaude loginwill hit it within an hour.)Changes
All in
src/credential-proxy.ts(+210/-22):readFullOAuthCredentials()— reads~/.claude/.credentials.jsonfirst; on macOS, falls back to theClaude Code-credentialskeychain entry. Keychain branch is platform-gated (process.platform === 'darwin') so Linux installs are a clean no-op.saveOAuthCredentials()— atomic write back (tmp + rename, 0600), so process restarts pick up the latest token. Creates~/.claudewith0700if missing.refreshAnthropicOAuthToken()— POST toplatform.claude.com/v1/oauth/tokenwithgrant_type=refresh_token. Single-flight guarded via a module-levelrefreshInFlightpromise so concurrent callers share one refresh.getOAuthToken()is now async and triggers a refresh when:expiresAt - REFRESH_BUFFER_MS(5 min), ORexpiresAtis undefined (macOS keychain path doesn't store it — refresh immediately so we learn the real expiry).What's preserved
.env(CLAUDE_CODE_OAUTH_TOKEN,ANTHROPIC_AUTH_TOKEN) still win and are never refreshed.REFRESH_BUFFER_MS) unchanged.codex app-serverinside the container handles its own refresh).Differences from #1102
getOAuthToken()and acachedOAuthToken/cachedExpiresAtpair; reused those instead of inventing new shape (tokenCacheinterface). Result: smaller surface area.log.warn(...), notlogger.warn({...}, '...'). Adapted accordingly.refreshInFlight(module-level) instead of embedded instartCredentialProxy. Two reasons: cleaner type inference, andgetOAuthTokenis callable without going throughstartCredentialProxy(e.g. unit tests).Test plan
pnpm run build— clean.pnpm test— 418/418 pass (existing credential-proxy tests still pass; they don't exercise the OAuth refresh path because that requires mockingplatform.claude.com, but the static + cached + read-from-file paths are covered).Risk
try/finallyto clearrefreshInFlightregardless of outcome.execSync: synchronous — could block the event loop briefly on the keychain query. In practice this only runs ondarwinand only when the file is absent; the typical case is the file path which is async-friendly.child_process,fs,os,path,https).🤖 Generated with Claude Code