fix: restore container/hooks for CEO tool logging deployment#8
Closed
Jeffrey-Keyser wants to merge 1094 commits intomainfrom
Closed
fix: restore container/hooks for CEO tool logging deployment#8Jeffrey-Keyser wants to merge 1094 commits intomainfrom
Jeffrey-Keyser wants to merge 1094 commits intomainfrom
Conversation
First default channel that ships with main. Unix-socket adapter + thin
client; plugs into the running daemon rather than spawning its own host.
## src/channels/cli.ts
- ChannelAdapter with channelType='cli', platformId='local'.
- setup() unlinks any stale socket, listens on $DATA_DIR/cli.sock (mode 0600
so only the local user can connect).
- On client connect: reads newline-delimited JSON ({"text": "..."}) and
calls config.onInbound('local', null, {id, kind:'chat', content, ts}).
- deliver() writes {"text": <body>} back to the connected socket; silently
no-ops when no client is attached (outbound row still persists).
- Single-client policy: a second connection supersedes the first with a
[superseded] notice.
- teardown() closes the client, closes the server, removes the socket file.
## scripts/chat.ts + pnpm run chat
One-shot client:
- pnpm run chat <message...>
- Connects to the socket, writes one JSON line with the message.
- Reads replies; exits 2s after the first reply lands (hard timeout 120s).
- ENOENT/ECONNREFUSED prints a hint to start the daemon.
## scripts/init-first-agent.ts
- Fix stale imports after earlier module extractions (permissions +
agent-to-agent moved their DB helpers into modules/).
- After wiring the DM channel, also create cli/local messaging_group
(unknown_sender_policy='public' — local socket perms handle auth) and
wire it to the same agent. User can `pnpm run chat` immediately.
## package.json
- Add "chat": "tsx scripts/chat.ts" script.
## Validation
- pnpm run build clean.
- pnpm test — 137 host tests pass.
- bun test in container/agent-runner — 17 pass.
- Service boot logs: "CLI channel listening" + "Channel adapter started
channel=cli type=cli". Clean SIGTERM shutdown; socket file removed.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(channels): add CLI channel — talk to your agent from the terminal
Single forward-looking reference that replaces the two untracked planning docs (REFACTOR_PLAN.md + REFACTOR_EXECUTION.md) which had become a mix of historical PR timeline and still-relevant decisions. Keeps only what's actionable going forward: - Module tiers, the four registries, and the module distribution model (architecture summary). - Remaining work: Phase 5 (v2 → main) and the modules-branch decision. - Operational patterns worth preserving (standing checks, TDZ rule, branch-sync file-presence diff procedure, prettier drift pattern). - 17 curated open questions across design, distribution, core slotting, and documentation. Canonical references (docs/module-contract.md, docs/architecture.md, etc.) are linked but not duplicated. This doc is transient — retire when the refactor is fully behind us. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new operational skill that routes any agent group to a local Ollama instance instead of the Anthropic API. Ollama speaks the Anthropic /v1/messages endpoint natively, so no new provider code is needed — just env var overrides and a model setting in the shared settings file. The skill also documents and applies two prerequisite source changes: - ContainerConfig gains env and blockedHosts fields (container-config.ts) - container-runner wires those fields as -e and --add-host Docker flags - Dockerfile home dir set to chmod 777 so containers running as the host uid can write ~/.claude config (discovered during implementation) docs/ollama.md covers the architecture, OneCLI proxy bypass rationale, network isolation via blockedHosts, model selection tradeoffs for Apple Silicon, and revert instructions. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Aggregates the loose OneCLI install, secret registration, and first-agent wiring commands from /setup into three new dispatcher steps. Adds --cli-only mode to init-first-agent so /new-setup can reach a working 2-way CLI chat with the bare minimum. - setup/onecli.ts: idempotent install + PATH + api-host + .env, polls /health - setup/auth.ts: --check verifies secret; --create --value registers it - setup/cli-agent.ts: wraps init-first-agent --cli-only - scripts/init-first-agent.ts: --cli-only mode; DM mode unchanged Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single upfront parallel scan the SKILL.md renders via `!`...`` so Claude sees system state before generating its first response. Each field maps to a routing decision (skip/run/ask) for a downstream step. Reports: OS, SHELL, DOCKER + IMAGE_PRESENT, ONECLI_STATUS + ONECLI_URL, ANTHROPIC_SECRET, SERVICE_STATUS, CLI_AGENT_WIRED, INFERRED_DISPLAY_NAME, TZ_STATUS + TZ_ENV + TZ_SYSTEM. Runs in ~200ms on a fully-set-up host. Not a replacement for per-step idempotency — each step keeps its own checks since probe is a snapshot and can go stale by execution time. Uses /api/health (OneCLI's actual endpoint). Anthropic secret check uses the CLI client so it works whenever onecli is installed, even if the direct HTTP health probe fails (different network paths). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shortest path from zero to a working two-way agent chat via the CLI channel. Renders `!`pnpm exec tsx setup/index.ts --step probe`` at the top for dynamic context injection — Claude sees current system state before generating its first response and routes each subsequent step (skip/ask/run) off the probe snapshot. Pre-approves the Bash patterns it needs via `allowed-tools` so setup runs without per-step prompts. Lives alongside /setup for now; will replace it once proven. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port probe to zero-dep plain ESM (setup/probe.mjs) so /new-setup can inject dynamic context on a fresh machine where pnpm/node_modules don't yet exist. Skill falls back to a STATUS: unavailable block if Node itself isn't on PATH, and the flow treats that as "run every step from 1" (each step is idempotent). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ion check The chained `&& / ||` inline command tripped Claude Code's per-operation permission check. Move the Node-missing fallback into setup/probe.sh so the skill's `!` block is a single `bash setup/probe.sh` call. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Node Two fixes to the fresh-install path: 1. setup.sh: when `corepack enable` runs as a non-root user against a system-wide Node install (apt-installed to /usr/bin), it fails EACCES trying to symlink /usr/bin/pnpm, leaving pnpm off PATH. Retry with sudo when pnpm is still missing — gated to Linux/WSL so macOS Homebrew prefixes aren't polluted with root-owned shims. 2. SKILL.md step 1: if the probe reports STATUS: unavailable (Node not installed), install Node BEFORE invoking `bash setup.sh`. The old flow ran setup.sh first as a diagnostic, which always failed fast, installed Node, then re-ran — two bootstraps for no reason. Combined: fresh Linux box now goes Node install -> single setup.sh run. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ners
Two flow fixes:
1. Add "Ordering and parallelism" section making explicit that step 4
(auth) must block until step 3 (OneCLI) is complete — auth writes
the secret into the vault, so firing an AskUserQuestion while
OneCLI is still installing asks the user for a credential the
system can't store. Step 2 (container build) is safe to run past
step 4, joined before step 6 (first CLI agent).
2. Drop the per-step quoted one-liners. They duplicated Claude's own
natural narration ("While those build, let's get your credential
set up." → immediately echoed by the scripted "Your agent needs an
Anthropic credential..."). Each step now has a short description
instead; Claude narrates in its own voice.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…alled Probe now emits HOST_DEPS (ok|missing) based on whether node_modules/better-sqlite3/build/Release/better_sqlite3.node exists — the canonical proof that `pnpm install` ran and the native build step succeeded. Step 1 (Node bootstrap) skips when HOST_DEPS=ok instead of always re-running setup.sh. Probe now genuinely routes step 1 the same way it routes every other step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
onecli step: - Poll /api/health (was /health) so the step's health check matches the probe's. On hosted OneCLI (app.onecli.sh) the old path returned non-ok, flagging the gateway as "degraded" even though install succeeded. - Drop the "try `onecli start`" hint — no such subcommand exists and it sent the skill off chasing fabricated commands. A failed health poll is demoted to a soft warning; the auth step surfaces a real outage via `onecli secrets list`. SKILL.md step 4: rewrite to match the /setup skill's pattern — the user generates the token themselves, picks dashboard or CLI to register it with OneCLI, and the skill verifies via `auth --check`. Tokens no longer travel through chat. Co-Authored-By: Koshkoshinsk <daniel.milliner@gmail.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lback The probe now returns a real snapshot from second zero, so every step consults real probe fields instead of falling back to "run every step blindly" when Node isn't installed. Also drops the redundant CLI_AGENT_WIRED field (it gated the last step on its own end-state) and scopes timezone out of the probe (timezone is not part of /new-setup). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…wiring Update SKILL.md with tested setup: dedicated bot account prerequisite, GITHUB_BOT_USERNAME env var for @-mention detection, private vs public repo sender policy guidance, member registration for strict mode, per-thread session mode, and wiring example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite SKILL.md with tested setup: OAuth app with client credentials (recommended), bridge catchAll patch for platforms without @-mention, LINEAR_TEAM_KEY for team-based routing, webhook setup with delay note, private vs public sender policy, and wiring example. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on and env vars - Pin @opencode-ai/sdk and opencode-ai CLI both to 1.4.17; warn against latest (1.14.x has a breaking session API rewrite incompatible with the current provider code) - Add step 7: propagate provider files into existing per-group overlays (data/v2-sessions/*/agent-runner-src/providers/) which override the image at runtime and are never auto-updated by rebuilds - Add build cache gotcha: prune builder if "Unknown provider" after rebuild - Document ANTHROPIC_BASE_URL as required for non-anthropic providers, with correct base URL per provider (DeepSeek, OpenRouter examples) - Add OPENCODE_SMALL_MODEL to all examples - Document OneCLI credential grant (set-secrets replaces, not appends) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module docs + ACTION-ITEMS rollup with decisions + timezone recreation spec). - container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness used to characterise Claude Agent SDK event/hook/stderr timing for the stuck-detection design in item 9. - src/channels/chat-sdk-bridge.ts: document the conversations Map staleness in a code comment; fix deferred to when dynamic group registration lands (ACTION-ITEMS item 17). No runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…IPC_POLL_INTERVAL These three constants were carried over from v1's polling + IPC architecture and have zero callers in the v2 runtime: - POLL_INTERVAL (2000ms) — v1 message loop; replaced by event-driven delivery + delivery.ts's ACTIVE_POLL_MS (hardcoded 1000ms) - SCHEDULER_POLL_INTERVAL (60000ms) — v1 task scheduler; replaced by host-sweep.ts's SWEEP_INTERVAL_MS (hardcoded 60_000) - IPC_POLL_INTERVAL (1000ms) — v1 file-based IPC; meaningless in v2's session-DB architecture Grep confirms no imports in src/, container/, or tests. Docs/SPEC.md updated to match. Ref: docs/v1-vs-v2/ACTION-ITEMS.md item 15. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent needs to perceive times in the user's timezone, not UTC.
Dropping this in the v1→v2 port produced a class of bugs where the agent
would schedule tasks for the wrong hour, suggest dinner at midnight, etc.
This restores v1 parity.
Container side:
- New container/agent-runner/src/timezone.ts mirrors src/timezone.ts with
isValidTimezone / resolveTimezone / formatLocalTime, plus:
* TIMEZONE constant resolved at load from process.env.TZ (host sets this
from src/container-runner.ts:254)
* parseZonedToUtc(input, tz) — treats a naive ISO as wall-clock time in
`tz`, returns the corresponding UTC Date. Strings with Z or offset
are passed through.
- formatter.ts:
* formatMessages() now prepends <context timezone="IANA"/>\n — matches
v1 src/v1/router.ts:20-22
* formatSingleChat uses formatLocalTime(ts, TIMEZONE) instead of a
home-rolled HH:MM 24h formatter → outputs like "Jun 15, 2026, 8:00 AM"
* reply_to="<id>" attribute + <quoted_message from="X">Y</quoted_message>
element — matches v1 format exactly; old <reply-to/> shape is gone
* stripInternalTags() exported for the dispatch path to reuse
- poll-loop.ts uses the exported stripInternalTags() instead of inline regex.
- mcp-tools/scheduling.ts:
* schedule_task/update_task descriptions now explicitly document that
processAfter accepts either UTC or naive local time (interpreted in
the user's TZ from the context header)
* handlers normalize through parseZonedToUtc() and store a UTC ISO
Host side:
- src/modules/scheduling/recurrence.ts passes { tz: TIMEZONE } to
CronExpressionParser.parse. Without this, "0 9 * * *" fires at 09:00
UTC instead of 09:00 user-local — this was the v1 behavior
(src/v1/task-scheduler.ts:20-49).
Tests:
- container/agent-runner/src/timezone.test.ts — mirror of src/timezone.test.ts
+ new parseZonedToUtc cases
- container/agent-runner/src/formatter.test.ts — context header, reply_to,
quoted_message, XML escaping, stripInternalTags (ported from v1
formatting.test.ts)
- src/modules/scheduling/recurrence.test.ts — cron TZ respected, completed
rows only cloned when recurrence is set
Ref: docs/v1-vs-v2/ACTION-ITEMS.md item 18 + timezone-formatting-v1-recreation.md
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cklist
Replaces the two overlapping old mechanisms (30-min setTimeout kill in
container-runner, 10-min heartbeat STALE_THRESHOLD reset in host-sweep)
with message-scoped stuck detection anchored to the processing_ack claim
age + an absolute 30-min ceiling that extends for long-declared Bash
tools.
Old model problems:
- IDLE_TIMEOUT setTimeout fired on plain wall-clock time; slow-but-alive
agents got killed at 30min regardless of activity
- 10-min STALE_THRESHOLD in the sweep was unreliable — the heartbeat is
only touched on SDK events, so legitimate silent tool work (sleep 30,
long WebFetch, npm install) looked identical to a hung container
- Two overlapping sources of truth for "when to let go of a container"
New model:
- Host sweep is the single source of truth.
- Container exposes a new `container_state` single-row table in outbound.db
(schema added; container writes, host reads). PreToolUse hook writes
current_tool + tool_declared_timeout_ms (read from Bash's tool_input);
PostToolUse / PostToolUseFailure clear it.
- Sweep decides with a pure helper `decideStuckAction`:
* absolute ceiling — kill if heartbeat age > max(30min, bash_timeout)
* per-claim stuck — kill if any processing_ack row has claim_age >
max(60s, bash_timeout) AND heartbeat hasn't been touched since claim
* otherwise ok
Kill paths reset leftover processing rows with exponential backoff,
reusing the existing retry machinery.
Tool blocklist expanded:
- AskUserQuestion (SDK placeholder; we have mcp__nanoclaw__ask_user_question)
- EnterPlanMode, ExitPlanMode, EnterWorktree, ExitWorktree (Claude Code UI
affordances; would hang in headless containers)
PreToolUse hook is also defense-in-depth: if a disallowed tool name slips
through, it returns `{ decision: 'block' }` so the agent sees a clear
error instead of appearing stuck.
Removed:
- container-runner.ts: IDLE_TIMEOUT setTimeout, resetIdle callback on
activeContainers entry, resetContainerIdleTimer export.
- delivery.ts: the resetContainerIdleTimer call on successful delivery.
- poll-loop.ts: IDLE_END_MS + its setInterval. Keeping the query open is
cheaper than close+reopen (no cold prompt cache). Liveness is now a
host-side concern.
- host-sweep.ts: 10-min STALE_THRESHOLD_MS + getStuckProcessingIds in the
stale-detection path (still exported for kill reset).
Tests:
- src/host-sweep.test.ts — 9 tests for decideStuckAction covering: fresh
heartbeat, absolute ceiling, absent heartbeat, Bash-timeout extension
(both ceiling and per-claim), claim age below tolerance, heartbeat
touched after claim, unparseable timestamps.
Ref: docs/v1-vs-v2/ACTION-ITEMS.md items 9, 6a, 10.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent fan-out
Replaces the opaque trigger_rules JSON + response_scope enum on
messaging_group_agents with four explicit orthogonal columns:
engage_mode 'pattern' | 'mention' | 'mention-sticky'
engage_pattern regex source; required when mode='pattern';
'.' is the "always" sentinel
sender_scope 'all' | 'known'
ignored_message_policy 'drop' | 'accumulate'
Inbound routing becomes a fan-out — every wired agent is evaluated
independently. A match gets its own session + container wake. A miss
with accumulate keeps the message as context-only (trigger=0) in that
agent's session, so when the agent does eventually engage it sees the
prior chatter.
## Schema
- Migration 010 (`engage-modes`): adds the 4 new columns, backfills
from trigger_rules.pattern + requiresTrigger + response_scope, drops
the legacy columns.
- messages_in gains `trigger INTEGER NOT NULL DEFAULT 1` (session DB
schema + `migrateMessagesInTable` forward-compat).
- countDueMessages gates waking on `trigger = 1`.
## Routing
- `pickAgent` (returns one) → loop over all wired agents. Per agent:
evaluate engage_mode; run access gate + sender-scope gate; on full
match → resolveSession + writeSessionMessage(trigger=1) + wake. On
miss with accumulate → writeSessionMessage(trigger=0), no wake. On
miss with drop → skip.
- New `findSessionForAgent(agentGroupId, mgId, threadId)` scopes
session lookup by agent so fan-out doesn't cross sessions.
- `messageIdForAgent` namespaces inbound message ids by agent_group_id
so PRIMARY KEY doesn't collide across per-agent session DBs.
## Adapter layer
- `ConversationConfig` replaces `triggerPattern` + `requiresTrigger`
with `engageMode` + `engagePattern`.
- Chat SDK bridge stores `Map<platformId, ConversationConfig[]>` (multi-
agent per conversation) and applies union gating pre-onInbound:
* onSubscribedMessage: engage if any wiring keeps firing in
subscribed state (mention-sticky or pattern)
* onNewMention: engage on mention; only subscribes the thread if
at least one wiring is `mention-sticky`
* onDirectMessage: engage per mode; sticky follows same rule
- Bridge no longer unconditionally calls `thread.subscribe()`.
## Sender scope
- Permissions module registers a second hook `setSenderScopeGate` that
runs per-wiring after the existing access gate. `sender_scope='known'`
requires canAccessAgentGroup(); `'all'` is a no-op. Not installed →
no-op everywhere (default allow).
## Container side
- Host passes `NANOCLAW_MAX_MESSAGES_PER_PROMPT` (reuses existing
MAX_MESSAGES_PER_PROMPT config; was dead code from v1).
- `getPendingMessages` queries `ORDER BY seq DESC LIMIT N`, reverses to
chronological order for the prompt — accumulated context rides along
with trigger rows up to the cap.
- `MessageInRow` gains `trigger: number` so the container can tell them
apart in downstream code (container still processes both; only the
host uses `trigger=0` for don't-wake).
## Defaults (per ACTION-ITEMS item 1 decision)
- DM (is_group=0): `engage_mode='pattern'`, `engage_pattern='.'` (always)
- Threaded group: `engage_mode='mention-sticky'` (seed-discord)
- Non-threaded group / CLI: pattern '.' in bootstrap scripts
## Tests
- src/host-core.test.ts: 3 new cases — fan-out (2 agents, 2 sessions,
2 wakes), accumulate (trigger=0 + no wake), drop (no session created).
- Existing 10 host-core tests still pass.
- Migration 010 runs on an empty DB in 0-row path — verified.
Closes: ACTION-ITEMS items 1, 4.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ault policy
When an unknown sender writes into a wired messaging group, surface the
situation to an admin instead of silently dropping. Flow:
1. Router → access gate → handleUnknownSender (policy='request_approval')
2. Fire-and-forget requestSenderApproval: pickApprover + pickApprovalDelivery
pick a reachable admin DM; deliver an Approve / Deny card; insert a
pending_sender_approvals row carrying the original InboundEvent JSON.
3. In-flight dedup: UNIQUE(messaging_group_id, sender_identity) — a retry
from the same stranger while pending is silently dropped, not re-carded.
4. Admin clicks → Chat SDK bridge → onAction → host response-registry.
The new handleSenderApprovalResponse in the permissions module claims
responses whose questionId matches a pending_sender_approvals row.
5. approve: addMember(stranger, agent_group) + replay the stored event via
routeInbound — the second attempt clears the gate because the user is
now known.
6. deny: delete the pending row. No denial persistence (ACTION-ITEMS item 5
decision) — a future attempt triggers a fresh card.
Schema:
- Migration 011 adds pending_sender_approvals (id, mg_id, agent_group_id,
sender_identity, sender_name, original_message JSON, approver_user_id,
created_at, UNIQUE(mg_id, sender_identity)).
- Also flips messaging_groups.unknown_sender_policy default from 'strict'
to 'request_approval' (rebuild-table). Existing rows unchanged — only
the default applied to new rows flips.
- Router auto-create for unknown platform/chat drops the hardcoded
'strict' override; schema default applies.
- src/db/schema.ts reference updated to match.
Why default-flip: users wire their DM during setup and don't discover that
'strict' means "silent drop of everyone not in user_roles/members". The
approval flow is the safe default — the admin sees the stranger, explicitly
decides. 'public' stays opt-in for truly open channels.
Failure modes (row NOT created so a future attempt can try again):
- No eligible approver configured (fresh install before first owner).
- No reachable DM for any approver.
- Delivery adapter missing.
Tests (src/modules/permissions/sender-approval.test.ts, 4 cases):
- First unknown message → card delivered + row created
- Retry while pending → dedup'd (1 card, 1 row)
- Approve → member added + message replayed + container woken
- Deny → row cleared + no member added
Closes: ACTION-ITEMS item 5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the gap item 5 left open: request_approval presupposes a wired channel, so unknown-channel cases (new DM, @mention in unwired group, bot added to fresh group) short-circuit at no_agent_wired before the approval flow runs. Design: - Owner-sender auto-wire fast path (exactly one agent group → wire silently; multiple → card) - Card with one button per existing agent group + "Create new" + "Ignore" - New pending_channel_approvals table, UNIQUE(messaging_group_id) - nca- action-id prefix paralleling nsa- / ncq- - Handler lives alongside handleSenderApprovalResponse - "Create new" sub-flow is intentionally open scope Cross-reference added to item 5 so the scope boundary is explicit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tems # Conflicts: # scripts/init-first-agent.ts
…items Land v1→v2 action-items (5 implementation items)
Match v1 behavior: drop getApiHost() (which was returning the CLI default https://app.onecli.sh) and always extract the gateway URL from the install script's stdout, then apply it via onecli config set api-host and .env. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The engage-mode gating added in qwibitai#1869 read `message.content` from the Chat SDK's ChatMessage in all three inbound handlers (onSubscribedMessage, onNewMention, onDirectMessage). ChatMessage exposes the user-visible string as `.text` — `.content` exists on the underlying nested structure but isn't the plain-text field. Result: `shouldEngage` always saw an empty string, pattern gating never matched, non-wildcard regex wirings silently dropped every inbound. Fix: use `message.text` in all three gates. Discovered during live smoke-test on v2 post-merge. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rovals The original 011 also rebuilt `messaging_groups` to flip the `unknown_sender_policy` column DEFAULT from "strict" to "request_approval". On live DBs the DROP TABLE step fails SQLite's foreign-key integrity check because `sessions`, `user_dms`, and `pending_sender_approvals` all reference `messaging_groups(id)`. `PRAGMA foreign_keys=OFF` / `defer_foreign_keys` can't be toggled inside the implicit migration transaction, so the rebuild can't be made to apply cleanly. The default-flip was cosmetic anyway: every `createMessagingGroup` callsite passes `unknown_sender_policy` explicitly. Router auto-create was already updated to hardcode "request_approval" (router.ts:151), and setup / seed scripts pick per context. Changes: - Migration 011 now only creates the `pending_sender_approvals` table + index. The rebuild block is gone. - Reference `SCHEMA` in src/db/schema.ts updated to reflect what the DB actually has: DEFAULT 'strict' (from migration 001), with a note about the effective policy applied at insert sites. Discovered on v2 post-merge during live restart. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…irings The engage modes shipped in qwibitai#1869 included `pattern` (regex match any message) and the `accumulate` ignored-message policy, but neither could fire in group chats because Chat SDK only surfaces: - DMs (onDirectMessage) - @mentions in unsubscribed threads (onNewMention) - every message in subscribed threads (onSubscribedMessage) A bot sitting in a Discord/Slack channel hears *nothing* from a plain message unless the thread is already subscribed. So `pattern '.'` on a group wiring → silent. `pattern /urgent/i` → silent. `mention + accumulate` → the non-mention messages that should be stored as context were never received, so nothing to accumulate. Fix: call `chat.subscribe(platformId)` at setup time for every wiring whose `engageMode === 'pattern'` or `ignoredMessagePolicy === 'accumulate'`. Failures logged + swallowed per-conversation so one un-subscribable channel doesn't crash startup. ## Knock-on: SDK stops firing onNewMention once subscribed Per SDK types:1468, `onNewMention` only fires in unsubscribed threads. Once we pre-subscribe a channel for a pattern wiring, subsequent mentions arrive as `onSubscribedMessage` with `message.isMention === true`. Before: a `mention` wiring coexisting with a `pattern` wiring in the same channel would silently stop firing after pre-subscribe. After: `shouldEngage` accepts the `isMention` flag independently from `source`, so the `mention` mode matches on (dm OR mention-new OR subscribed-with-isMention). Source shape changed `'subscribed' | 'mention' | 'dm'` → `'subscribed' | 'mention-new' | 'dm'` to make the "unsubscribed-mention event" distinction explicit. ## New fields - `ConversationConfig.ignoredMessagePolicy` — projected from the messaging_group_agents row so the bridge knows which wirings need pre-subscription. buildConversationConfigs in src/index.ts populates it. Tests: host 153/153, container 46/46. No new tests yet — the subscribe call path needs a Chat mock, deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two long-line violations introduced in d121cd1 (isGroup plumbing) exceed the printWidth limit. CI format:check fails on every PR opened against main until this is fixed; the fix is isolated here so no behavior change is mixed in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…plate On a fork PR, GITHUB_TOKEN is demoted to read-only regardless of the workflow's permissions: block, so issues.addLabels() returns 403. The label workflow silently works for PRs that skip the template (no checkboxes ticked → no API call) and fails for PRs that actually follow it — a hostile incentive against contributors who do the right thing. pull_request_target runs in the context of the base branch with full declared permissions, which is the documented fix for this case. Safe here because the workflow is metadata-only: it reads context.payload.pull_request.body and calls addLabels. No checkout, no PR-supplied code executes. A SECURITY comment is added above the trigger to keep it that way. Refs: - https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request_target - https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/ Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v2: fix setup verify for CLI-only installs
…bridge chore(format): apply prettier to chat-sdk-bridge.ts
…pport fix(workflows): label PRs from forks that follow the contributing template
…rrors-in-setup [codex] detect setup auth ping failures
…ngage-mode-schema fix(setup): register step uses engage_mode columns dropped by migration 010
Phase 2 of the SKILL.md already contains the Dockerfile + TOOL_ALLOWLIST edit instructions with an "ALREADY APPLIED" short-circuit. Keeping those edits out of trunk means users who never run /add-gmail-tool don't carry the Gmail MCP package in their image. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… reads The upstream precedence fix (5845a5a) made agent_groups.agent_provider and sessions.agent_provider authoritative for host-side provider contribution (per-session mount, env passthrough), but those DB values don't propagate into the group's container.json — and the in-container runner reads `provider` from container.json, not from the DB. That caused a confusing failure mode: flipping the DB column to 'codex', rebuilding, and restarting still spawned a Claude runner because container.json had no provider field. The old skill wording ("container receives AGENT_PROVIDER from the resolved value") overstated the integration. Update add-codex and add-opencode "Per group / per session" sections to say: set `"provider": "<name>"` in the group's container.json — that's the source the runner reads. Keep the DB columns documented for the host-side contribution they actually drive, and spell out the session → group → container.json → 'claude' fallback so the precedence is still discoverable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
skill(add-gmail-tool): OneCLI-native Gmail MCP tool
Adds /add-gcal-tool — a sibling of /add-gmail-tool that installs @cocal/google-calendar-mcp with the same OneCLI stub-file pattern. Skill applies the Dockerfile + TOOL_ALLOWLIST changes at install time; trunk stays clean so users who never run the skill don't carry the calendar MCP in their image. Dropped the Phase 5 dry-run section since it hardcoded a per-install image tag slug and duplicated Phase 4's live agent test. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
skill(add-gcal-tool): OneCLI-native Google Calendar MCP tool
…for native IDs
setup/register.ts had two bugs that prevented new channels from being
registered via `/manage-channels`:
1. createMessagingGroupAgent was called with the legacy field names
`trigger_rules` and `response_scope`. The SQL INSERT expects
`engage_mode` / `engage_pattern` / `sender_scope` / `ignored_message_policy`
(migration 010). Every register call failed with
`RangeError: Missing named parameter "engage_mode"` after the agent
and messaging group were partially created — leaving an orphaned pair.
Now mirrors scripts/init-first-agent.ts:wireIfMissing:
- Groups (is_group=1) default to engage_mode='mention' (bot only
responds when addressed).
- DMs (is_group=0) default to engage_mode='pattern' with '.' (respond
to every message).
- An explicit --trigger overrides the pattern regex.
2. The "normalize platform_id" block unconditionally prefixed
"<channel>:" even for native IDs like WhatsApp JIDs
("120363408974444974@g.us"), iMessage emails ("user@example.com"),
or Signal phones ("+15551234567") / Signal groups ("group:abc"). But
the router (src/router.ts:158) looks up messaging_groups by the raw
event.platformId from the adapter, which for these native adapters
never has a prefix. So the prefixed row was never matched — the
message was silently dropped with no "Message routed" log.
Extracted scripts/init-first-agent.ts:namespacedPlatformId into
src/platform-id.ts so both setup paths use the same heuristic (skip
the prefix for IDs containing '@', starting with '+', or starting
with 'group:'). Prevents future drift between the two paths.
Tested by: re-running `setup/index.ts --step register` for a WhatsApp
group JID, confirming the row is created with correct engage fields
and matching platform_id, then sending a test message and observing
"Message routed" with the right agent group.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-per-provider-and-agent-route-files Two independent correctness fixes: per-provider continuations + agent-route file forwarding
fix(register): wire channels with correct engage fields, skip prefix for native JIDs
Install Telegram polling adapter (HummaMummaBot), add Agency HQ API skill for CEO agent, and add credential fallback when OneCLI gateway is unavailable — passes CLAUDE_CODE_OAUTH_TOKEN directly to containers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Restores
container/hooks/service-guard.shandcontainer/hooks/tool-observer.shon the deployment branch — they exist onmainbut were never committed to this branch, breaking the CEO tool-logging deployment.service-guard.shblocks agents from restarting/stopping the nanoclaw servicetool-observer.shwrites PostToolUse events to the IPC pipeline that feedstool_call_eventsinstore/messages.dbFollowup work tracked separately: schema migration of
tool_call_events(already complete) and reprocessing of quarantined files indata/ipc/errors/(already drained — 0 files, 137 rows in table).Test plan
container/hooks/{service-guard.sh,tool-observer.sh}container/hooks/deploy-fix.shtool_call_eventsschema includesevent_typeandpayloadcolumns; INSERT succeedsdata/ipc/errors/is emptytool_call_eventsrow count > 0