Skip to content

fix: restore container/hooks for CEO tool logging deployment#8

Closed
Jeffrey-Keyser wants to merge 1094 commits intomainfrom
dev-inbox/295a4425-fix/ceo-tool-logging-deployment
Closed

fix: restore container/hooks for CEO tool logging deployment#8
Jeffrey-Keyser wants to merge 1094 commits intomainfrom
dev-inbox/295a4425-fix/ceo-tool-logging-deployment

Conversation

@Jeffrey-Keyser
Copy link
Copy Markdown
Owner

Summary

Restores container/hooks/service-guard.sh and container/hooks/tool-observer.sh on the deployment branch — they exist on main but were never committed to this branch, breaking the CEO tool-logging deployment.

  • service-guard.sh blocks agents from restarting/stopping the nanoclaw service
  • tool-observer.sh writes PostToolUse events to the IPC pipeline that feeds tool_call_events in store/messages.db

Followup work tracked separately: schema migration of tool_call_events (already complete) and reprocessing of quarantined files in data/ipc/errors/ (already drained — 0 files, 137 rows in table).

Test plan

  • Files restored at container/hooks/{service-guard.sh,tool-observer.sh}
  • No leftover container/hooks/deploy-fix.sh
  • tool_call_events schema includes event_type and payload columns; INSERT succeeds
  • data/ipc/errors/ is empty
  • tool_call_events row count > 0

gavrielc and others added 30 commits April 18, 2026 21:51
First default channel that ships with main. Unix-socket adapter + thin
client; plugs into the running daemon rather than spawning its own host.

## src/channels/cli.ts

- ChannelAdapter with channelType='cli', platformId='local'.
- setup() unlinks any stale socket, listens on $DATA_DIR/cli.sock (mode 0600
  so only the local user can connect).
- On client connect: reads newline-delimited JSON ({"text": "..."}) and
  calls config.onInbound('local', null, {id, kind:'chat', content, ts}).
- deliver() writes {"text": <body>} back to the connected socket; silently
  no-ops when no client is attached (outbound row still persists).
- Single-client policy: a second connection supersedes the first with a
  [superseded] notice.
- teardown() closes the client, closes the server, removes the socket file.

## scripts/chat.ts + pnpm run chat

One-shot client:
- pnpm run chat <message...>
- Connects to the socket, writes one JSON line with the message.
- Reads replies; exits 2s after the first reply lands (hard timeout 120s).
- ENOENT/ECONNREFUSED prints a hint to start the daemon.

## scripts/init-first-agent.ts

- Fix stale imports after earlier module extractions (permissions +
  agent-to-agent moved their DB helpers into modules/).
- After wiring the DM channel, also create cli/local messaging_group
  (unknown_sender_policy='public' — local socket perms handle auth) and
  wire it to the same agent. User can `pnpm run chat` immediately.

## package.json

- Add "chat": "tsx scripts/chat.ts" script.

## Validation

- pnpm run build clean.
- pnpm test — 137 host tests pass.
- bun test in container/agent-runner — 17 pass.
- Service boot logs: "CLI channel listening" + "Channel adapter started
  channel=cli type=cli". Clean SIGTERM shutdown; socket file removed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
feat(channels): add CLI channel — talk to your agent from the terminal
Single forward-looking reference that replaces the two untracked planning
docs (REFACTOR_PLAN.md + REFACTOR_EXECUTION.md) which had become a mix of
historical PR timeline and still-relevant decisions.

Keeps only what's actionable going forward:
- Module tiers, the four registries, and the module distribution model
  (architecture summary).
- Remaining work: Phase 5 (v2 → main) and the modules-branch decision.
- Operational patterns worth preserving (standing checks, TDZ rule,
  branch-sync file-presence diff procedure, prettier drift pattern).
- 17 curated open questions across design, distribution, core slotting,
  and documentation.

Canonical references (docs/module-contract.md, docs/architecture.md, etc.)
are linked but not duplicated. This doc is transient — retire when the
refactor is fully behind us.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a new operational skill that routes any agent group to a local
Ollama instance instead of the Anthropic API. Ollama speaks the
Anthropic /v1/messages endpoint natively, so no new provider code is
needed — just env var overrides and a model setting in the shared
settings file.

The skill also documents and applies two prerequisite source changes:
- ContainerConfig gains env and blockedHosts fields (container-config.ts)
- container-runner wires those fields as -e and --add-host Docker flags
- Dockerfile home dir set to chmod 777 so containers running as the
  host uid can write ~/.claude config (discovered during implementation)

docs/ollama.md covers the architecture, OneCLI proxy bypass rationale,
network isolation via blockedHosts, model selection tradeoffs for Apple
Silicon, and revert instructions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Aggregates the loose OneCLI install, secret registration, and first-agent
wiring commands from /setup into three new dispatcher steps. Adds
--cli-only mode to init-first-agent so /new-setup can reach a working
2-way CLI chat with the bare minimum.

- setup/onecli.ts: idempotent install + PATH + api-host + .env, polls /health
- setup/auth.ts: --check verifies secret; --create --value registers it
- setup/cli-agent.ts: wraps init-first-agent --cli-only
- scripts/init-first-agent.ts: --cli-only mode; DM mode unchanged

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Single upfront parallel scan the SKILL.md renders via `!`...`` so Claude
sees system state before generating its first response. Each field maps
to a routing decision (skip/run/ask) for a downstream step.

Reports: OS, SHELL, DOCKER + IMAGE_PRESENT, ONECLI_STATUS + ONECLI_URL,
ANTHROPIC_SECRET, SERVICE_STATUS, CLI_AGENT_WIRED, INFERRED_DISPLAY_NAME,
TZ_STATUS + TZ_ENV + TZ_SYSTEM. Runs in ~200ms on a fully-set-up host.

Not a replacement for per-step idempotency — each step keeps its own
checks since probe is a snapshot and can go stale by execution time.

Uses /api/health (OneCLI's actual endpoint). Anthropic secret check
uses the CLI client so it works whenever onecli is installed, even if
the direct HTTP health probe fails (different network paths).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Shortest path from zero to a working two-way agent chat via the CLI
channel. Renders `!`pnpm exec tsx setup/index.ts --step probe`` at the
top for dynamic context injection — Claude sees current system state
before generating its first response and routes each subsequent step
(skip/ask/run) off the probe snapshot. Pre-approves the Bash patterns
it needs via `allowed-tools` so setup runs without per-step prompts.

Lives alongside /setup for now; will replace it once proven.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Port probe to zero-dep plain ESM (setup/probe.mjs) so /new-setup can
inject dynamic context on a fresh machine where pnpm/node_modules
don't yet exist. Skill falls back to a STATUS: unavailable block if
Node itself isn't on PATH, and the flow treats that as "run every
step from 1" (each step is idempotent).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ion check

The chained `&& / ||` inline command tripped Claude Code's per-operation
permission check. Move the Node-missing fallback into setup/probe.sh so
the skill's `!` block is a single `bash setup/probe.sh` call.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…Node

Two fixes to the fresh-install path:

1. setup.sh: when `corepack enable` runs as a non-root user against a
   system-wide Node install (apt-installed to /usr/bin), it fails EACCES
   trying to symlink /usr/bin/pnpm, leaving pnpm off PATH. Retry with
   sudo when pnpm is still missing — gated to Linux/WSL so macOS
   Homebrew prefixes aren't polluted with root-owned shims.

2. SKILL.md step 1: if the probe reports STATUS: unavailable (Node not
   installed), install Node BEFORE invoking `bash setup.sh`. The old
   flow ran setup.sh first as a diagnostic, which always failed fast,
   installed Node, then re-ran — two bootstraps for no reason.

Combined: fresh Linux box now goes Node install -> single setup.sh run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ners

Two flow fixes:

1. Add "Ordering and parallelism" section making explicit that step 4
   (auth) must block until step 3 (OneCLI) is complete — auth writes
   the secret into the vault, so firing an AskUserQuestion while
   OneCLI is still installing asks the user for a credential the
   system can't store. Step 2 (container build) is safe to run past
   step 4, joined before step 6 (first CLI agent).

2. Drop the per-step quoted one-liners. They duplicated Claude's own
   natural narration ("While those build, let's get your credential
   set up." → immediately echoed by the scripted "Your agent needs an
   Anthropic credential..."). Each step now has a short description
   instead; Claude narrates in its own voice.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…alled

Probe now emits HOST_DEPS (ok|missing) based on whether
node_modules/better-sqlite3/build/Release/better_sqlite3.node exists
— the canonical proof that `pnpm install` ran and the native build
step succeeded. Step 1 (Node bootstrap) skips when HOST_DEPS=ok
instead of always re-running setup.sh. Probe now genuinely routes
step 1 the same way it routes every other step.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
onecli step:
- Poll /api/health (was /health) so the step's health check matches
  the probe's. On hosted OneCLI (app.onecli.sh) the old path returned
  non-ok, flagging the gateway as "degraded" even though install
  succeeded.
- Drop the "try `onecli start`" hint — no such subcommand exists and
  it sent the skill off chasing fabricated commands. A failed health
  poll is demoted to a soft warning; the auth step surfaces a real
  outage via `onecli secrets list`.

SKILL.md step 4: rewrite to match the /setup skill's pattern — the
user generates the token themselves, picks dashboard or CLI to
register it with OneCLI, and the skill verifies via `auth --check`.
Tokens no longer travel through chat.

Co-Authored-By: Koshkoshinsk <daniel.milliner@gmail.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…lback

The probe now returns a real snapshot from second zero, so every step
consults real probe fields instead of falling back to "run every step
blindly" when Node isn't installed. Also drops the redundant
CLI_AGENT_WIRED field (it gated the last step on its own end-state) and
scopes timezone out of the probe (timezone is not part of /new-setup).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…wiring

Update SKILL.md with tested setup: dedicated bot account prerequisite,
GITHUB_BOT_USERNAME env var for @-mention detection, private vs public
repo sender policy guidance, member registration for strict mode,
per-thread session mode, and wiring example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite SKILL.md with tested setup: OAuth app with client credentials
(recommended), bridge catchAll patch for platforms without @-mention,
LINEAR_TEAM_KEY for team-based routing, webhook setup with delay note,
private vs public sender policy, and wiring example.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…on and env vars

- Pin @opencode-ai/sdk and opencode-ai CLI both to 1.4.17; warn against
  latest (1.14.x has a breaking session API rewrite incompatible with
  the current provider code)
- Add step 7: propagate provider files into existing per-group overlays
  (data/v2-sessions/*/agent-runner-src/providers/) which override the
  image at runtime and are never auto-updated by rebuilds
- Add build cache gotcha: prune builder if "Unknown provider" after rebuild
- Document ANTHROPIC_BASE_URL as required for non-anthropic providers,
  with correct base URL per provider (DeepSeek, OpenRouter examples)
- Add OPENCODE_SMALL_MODEL to all examples
- Document OneCLI credential grant (set-secrets replaces, not appends)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- docs/v1-vs-v2/: full v1→v2 regression analysis (SUMMARY + 21 per-module
  docs + ACTION-ITEMS rollup with decisions + timezone recreation spec).
- container/agent-runner/scripts/sdk-signal-probe.ts: empirical harness
  used to characterise Claude Agent SDK event/hook/stderr timing for the
  stuck-detection design in item 9.
- src/channels/chat-sdk-bridge.ts: document the conversations Map staleness
  in a code comment; fix deferred to when dynamic group registration lands
  (ACTION-ITEMS item 17).

No runtime behavior change.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…IPC_POLL_INTERVAL

These three constants were carried over from v1's polling + IPC architecture
and have zero callers in the v2 runtime:

- POLL_INTERVAL (2000ms) — v1 message loop; replaced by event-driven
  delivery + delivery.ts's ACTIVE_POLL_MS (hardcoded 1000ms)
- SCHEDULER_POLL_INTERVAL (60000ms) — v1 task scheduler; replaced by
  host-sweep.ts's SWEEP_INTERVAL_MS (hardcoded 60_000)
- IPC_POLL_INTERVAL (1000ms) — v1 file-based IPC; meaningless in v2's
  session-DB architecture

Grep confirms no imports in src/, container/, or tests. Docs/SPEC.md
updated to match.

Ref: docs/v1-vs-v2/ACTION-ITEMS.md item 15.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The agent needs to perceive times in the user's timezone, not UTC.
Dropping this in the v1→v2 port produced a class of bugs where the agent
would schedule tasks for the wrong hour, suggest dinner at midnight, etc.
This restores v1 parity.

Container side:
- New container/agent-runner/src/timezone.ts mirrors src/timezone.ts with
  isValidTimezone / resolveTimezone / formatLocalTime, plus:
  * TIMEZONE constant resolved at load from process.env.TZ (host sets this
    from src/container-runner.ts:254)
  * parseZonedToUtc(input, tz) — treats a naive ISO as wall-clock time in
    `tz`, returns the corresponding UTC Date. Strings with Z or offset
    are passed through.

- formatter.ts:
  * formatMessages() now prepends <context timezone="IANA"/>\n — matches
    v1 src/v1/router.ts:20-22
  * formatSingleChat uses formatLocalTime(ts, TIMEZONE) instead of a
    home-rolled HH:MM 24h formatter → outputs like "Jun 15, 2026, 8:00 AM"
  * reply_to="<id>" attribute + <quoted_message from="X">Y</quoted_message>
    element — matches v1 format exactly; old <reply-to/> shape is gone
  * stripInternalTags() exported for the dispatch path to reuse

- poll-loop.ts uses the exported stripInternalTags() instead of inline regex.

- mcp-tools/scheduling.ts:
  * schedule_task/update_task descriptions now explicitly document that
    processAfter accepts either UTC or naive local time (interpreted in
    the user's TZ from the context header)
  * handlers normalize through parseZonedToUtc() and store a UTC ISO

Host side:
- src/modules/scheduling/recurrence.ts passes { tz: TIMEZONE } to
  CronExpressionParser.parse. Without this, "0 9 * * *" fires at 09:00
  UTC instead of 09:00 user-local — this was the v1 behavior
  (src/v1/task-scheduler.ts:20-49).

Tests:
- container/agent-runner/src/timezone.test.ts — mirror of src/timezone.test.ts
  + new parseZonedToUtc cases
- container/agent-runner/src/formatter.test.ts — context header, reply_to,
  quoted_message, XML escaping, stripInternalTags (ported from v1
  formatting.test.ts)
- src/modules/scheduling/recurrence.test.ts — cron TZ respected, completed
  rows only cloned when recurrence is set

Ref: docs/v1-vs-v2/ACTION-ITEMS.md item 18 + timezone-formatting-v1-recreation.md

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cklist

Replaces the two overlapping old mechanisms (30-min setTimeout kill in
container-runner, 10-min heartbeat STALE_THRESHOLD reset in host-sweep)
with message-scoped stuck detection anchored to the processing_ack claim
age + an absolute 30-min ceiling that extends for long-declared Bash
tools.

Old model problems:
- IDLE_TIMEOUT setTimeout fired on plain wall-clock time; slow-but-alive
  agents got killed at 30min regardless of activity
- 10-min STALE_THRESHOLD in the sweep was unreliable — the heartbeat is
  only touched on SDK events, so legitimate silent tool work (sleep 30,
  long WebFetch, npm install) looked identical to a hung container
- Two overlapping sources of truth for "when to let go of a container"

New model:
- Host sweep is the single source of truth.
- Container exposes a new `container_state` single-row table in outbound.db
  (schema added; container writes, host reads). PreToolUse hook writes
  current_tool + tool_declared_timeout_ms (read from Bash's tool_input);
  PostToolUse / PostToolUseFailure clear it.
- Sweep decides with a pure helper `decideStuckAction`:
    * absolute ceiling — kill if heartbeat age > max(30min, bash_timeout)
    * per-claim stuck  — kill if any processing_ack row has claim_age >
      max(60s, bash_timeout) AND heartbeat hasn't been touched since claim
    * otherwise ok
  Kill paths reset leftover processing rows with exponential backoff,
  reusing the existing retry machinery.

Tool blocklist expanded:
- AskUserQuestion (SDK placeholder; we have mcp__nanoclaw__ask_user_question)
- EnterPlanMode, ExitPlanMode, EnterWorktree, ExitWorktree (Claude Code UI
  affordances; would hang in headless containers)
PreToolUse hook is also defense-in-depth: if a disallowed tool name slips
through, it returns `{ decision: 'block' }` so the agent sees a clear
error instead of appearing stuck.

Removed:
- container-runner.ts: IDLE_TIMEOUT setTimeout, resetIdle callback on
  activeContainers entry, resetContainerIdleTimer export.
- delivery.ts: the resetContainerIdleTimer call on successful delivery.
- poll-loop.ts: IDLE_END_MS + its setInterval. Keeping the query open is
  cheaper than close+reopen (no cold prompt cache). Liveness is now a
  host-side concern.
- host-sweep.ts: 10-min STALE_THRESHOLD_MS + getStuckProcessingIds in the
  stale-detection path (still exported for kill reset).

Tests:
- src/host-sweep.test.ts — 9 tests for decideStuckAction covering: fresh
  heartbeat, absolute ceiling, absent heartbeat, Bash-timeout extension
  (both ceiling and per-claim), claim age below tolerance, heartbeat
  touched after claim, unparseable timestamps.

Ref: docs/v1-vs-v2/ACTION-ITEMS.md items 9, 6a, 10.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ent fan-out

Replaces the opaque trigger_rules JSON + response_scope enum on
messaging_group_agents with four explicit orthogonal columns:

    engage_mode            'pattern' | 'mention' | 'mention-sticky'
    engage_pattern         regex source; required when mode='pattern';
                           '.' is the "always" sentinel
    sender_scope           'all' | 'known'
    ignored_message_policy 'drop' | 'accumulate'

Inbound routing becomes a fan-out — every wired agent is evaluated
independently. A match gets its own session + container wake. A miss
with accumulate keeps the message as context-only (trigger=0) in that
agent's session, so when the agent does eventually engage it sees the
prior chatter.

## Schema

- Migration 010 (`engage-modes`): adds the 4 new columns, backfills
  from trigger_rules.pattern + requiresTrigger + response_scope, drops
  the legacy columns.
- messages_in gains `trigger INTEGER NOT NULL DEFAULT 1` (session DB
  schema + `migrateMessagesInTable` forward-compat).
- countDueMessages gates waking on `trigger = 1`.

## Routing

- `pickAgent` (returns one) → loop over all wired agents. Per agent:
  evaluate engage_mode; run access gate + sender-scope gate; on full
  match → resolveSession + writeSessionMessage(trigger=1) + wake. On
  miss with accumulate → writeSessionMessage(trigger=0), no wake. On
  miss with drop → skip.
- New `findSessionForAgent(agentGroupId, mgId, threadId)` scopes
  session lookup by agent so fan-out doesn't cross sessions.
- `messageIdForAgent` namespaces inbound message ids by agent_group_id
  so PRIMARY KEY doesn't collide across per-agent session DBs.

## Adapter layer

- `ConversationConfig` replaces `triggerPattern` + `requiresTrigger`
  with `engageMode` + `engagePattern`.
- Chat SDK bridge stores `Map<platformId, ConversationConfig[]>` (multi-
  agent per conversation) and applies union gating pre-onInbound:
    * onSubscribedMessage: engage if any wiring keeps firing in
      subscribed state (mention-sticky or pattern)
    * onNewMention: engage on mention; only subscribes the thread if
      at least one wiring is `mention-sticky`
    * onDirectMessage: engage per mode; sticky follows same rule
- Bridge no longer unconditionally calls `thread.subscribe()`.

## Sender scope

- Permissions module registers a second hook `setSenderScopeGate` that
  runs per-wiring after the existing access gate. `sender_scope='known'`
  requires canAccessAgentGroup(); `'all'` is a no-op. Not installed →
  no-op everywhere (default allow).

## Container side

- Host passes `NANOCLAW_MAX_MESSAGES_PER_PROMPT` (reuses existing
  MAX_MESSAGES_PER_PROMPT config; was dead code from v1).
- `getPendingMessages` queries `ORDER BY seq DESC LIMIT N`, reverses to
  chronological order for the prompt — accumulated context rides along
  with trigger rows up to the cap.
- `MessageInRow` gains `trigger: number` so the container can tell them
  apart in downstream code (container still processes both; only the
  host uses `trigger=0` for don't-wake).

## Defaults (per ACTION-ITEMS item 1 decision)

- DM (is_group=0): `engage_mode='pattern'`, `engage_pattern='.'` (always)
- Threaded group: `engage_mode='mention-sticky'` (seed-discord)
- Non-threaded group / CLI: pattern '.' in bootstrap scripts

## Tests

- src/host-core.test.ts: 3 new cases — fan-out (2 agents, 2 sessions,
  2 wakes), accumulate (trigger=0 + no wake), drop (no session created).
- Existing 10 host-core tests still pass.
- Migration 010 runs on an empty DB in 0-row path — verified.

Closes: ACTION-ITEMS items 1, 4.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ault policy

When an unknown sender writes into a wired messaging group, surface the
situation to an admin instead of silently dropping. Flow:

  1. Router → access gate → handleUnknownSender (policy='request_approval')
  2. Fire-and-forget requestSenderApproval: pickApprover + pickApprovalDelivery
     pick a reachable admin DM; deliver an Approve / Deny card; insert a
     pending_sender_approvals row carrying the original InboundEvent JSON.
  3. In-flight dedup: UNIQUE(messaging_group_id, sender_identity) — a retry
     from the same stranger while pending is silently dropped, not re-carded.
  4. Admin clicks → Chat SDK bridge → onAction → host response-registry.
     The new handleSenderApprovalResponse in the permissions module claims
     responses whose questionId matches a pending_sender_approvals row.
  5. approve: addMember(stranger, agent_group) + replay the stored event via
     routeInbound — the second attempt clears the gate because the user is
     now known.
  6. deny: delete the pending row. No denial persistence (ACTION-ITEMS item 5
     decision) — a future attempt triggers a fresh card.

Schema:
- Migration 011 adds pending_sender_approvals (id, mg_id, agent_group_id,
  sender_identity, sender_name, original_message JSON, approver_user_id,
  created_at, UNIQUE(mg_id, sender_identity)).
- Also flips messaging_groups.unknown_sender_policy default from 'strict'
  to 'request_approval' (rebuild-table). Existing rows unchanged — only
  the default applied to new rows flips.
- Router auto-create for unknown platform/chat drops the hardcoded
  'strict' override; schema default applies.
- src/db/schema.ts reference updated to match.

Why default-flip: users wire their DM during setup and don't discover that
'strict' means "silent drop of everyone not in user_roles/members". The
approval flow is the safe default — the admin sees the stranger, explicitly
decides. 'public' stays opt-in for truly open channels.

Failure modes (row NOT created so a future attempt can try again):
- No eligible approver configured (fresh install before first owner).
- No reachable DM for any approver.
- Delivery adapter missing.

Tests (src/modules/permissions/sender-approval.test.ts, 4 cases):
- First unknown message → card delivered + row created
- Retry while pending → dedup'd (1 card, 1 row)
- Approve → member added + message replayed + container woken
- Deny → row cleared + no member added

Closes: ACTION-ITEMS item 5.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Covers the gap item 5 left open: request_approval presupposes a wired
channel, so unknown-channel cases (new DM, @mention in unwired group,
bot added to fresh group) short-circuit at no_agent_wired before the
approval flow runs.

Design:
- Owner-sender auto-wire fast path (exactly one agent group → wire
  silently; multiple → card)
- Card with one button per existing agent group + "Create new" + "Ignore"
- New pending_channel_approvals table, UNIQUE(messaging_group_id)
- nca- action-id prefix paralleling nsa- / ncq-
- Handler lives alongside handleSenderApprovalResponse
- "Create new" sub-flow is intentionally open scope

Cross-reference added to item 5 so the scope boundary is explicit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tems

# Conflicts:
#	scripts/init-first-agent.ts
…items

Land v1→v2 action-items (5 implementation items)
Match v1 behavior: drop getApiHost() (which was returning the CLI default
https://app.onecli.sh) and always extract the gateway URL from the install
script's stdout, then apply it via onecli config set api-host and .env.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The engage-mode gating added in qwibitai#1869 read `message.content` from the
Chat SDK's ChatMessage in all three inbound handlers (onSubscribedMessage,
onNewMention, onDirectMessage). ChatMessage exposes the user-visible
string as `.text` — `.content` exists on the underlying nested structure
but isn't the plain-text field. Result: `shouldEngage` always saw an
empty string, pattern gating never matched, non-wildcard regex wirings
silently dropped every inbound.

Fix: use `message.text` in all three gates.

Discovered during live smoke-test on v2 post-merge.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…rovals

The original 011 also rebuilt `messaging_groups` to flip the
`unknown_sender_policy` column DEFAULT from "strict" to "request_approval".
On live DBs the DROP TABLE step fails SQLite's foreign-key integrity
check because `sessions`, `user_dms`, and `pending_sender_approvals` all
reference `messaging_groups(id)`. `PRAGMA foreign_keys=OFF` /
`defer_foreign_keys` can't be toggled inside the implicit migration
transaction, so the rebuild can't be made to apply cleanly.

The default-flip was cosmetic anyway: every `createMessagingGroup`
callsite passes `unknown_sender_policy` explicitly. Router auto-create
was already updated to hardcode "request_approval" (router.ts:151), and
setup / seed scripts pick per context.

Changes:
- Migration 011 now only creates the `pending_sender_approvals` table +
  index. The rebuild block is gone.
- Reference `SCHEMA` in src/db/schema.ts updated to reflect what the
  DB actually has: DEFAULT 'strict' (from migration 001), with a note
  about the effective policy applied at insert sites.

Discovered on v2 post-merge during live restart.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…irings

The engage modes shipped in qwibitai#1869 included `pattern` (regex match any
message) and the `accumulate` ignored-message policy, but neither could
fire in group chats because Chat SDK only surfaces:

  - DMs (onDirectMessage)
  - @mentions in unsubscribed threads (onNewMention)
  - every message in subscribed threads (onSubscribedMessage)

A bot sitting in a Discord/Slack channel hears *nothing* from a plain
message unless the thread is already subscribed. So `pattern '.'` on a
group wiring → silent. `pattern /urgent/i` → silent. `mention +
accumulate` → the non-mention messages that should be stored as context
were never received, so nothing to accumulate.

Fix: call `chat.subscribe(platformId)` at setup time for every wiring
whose `engageMode === 'pattern'` or `ignoredMessagePolicy === 'accumulate'`.
Failures logged + swallowed per-conversation so one un-subscribable
channel doesn't crash startup.

## Knock-on: SDK stops firing onNewMention once subscribed

Per SDK types:1468, `onNewMention` only fires in unsubscribed threads.
Once we pre-subscribe a channel for a pattern wiring, subsequent mentions
arrive as `onSubscribedMessage` with `message.isMention === true`.

Before: a `mention` wiring coexisting with a `pattern` wiring in the
same channel would silently stop firing after pre-subscribe.

After: `shouldEngage` accepts the `isMention` flag independently from
`source`, so the `mention` mode matches on (dm OR mention-new OR
subscribed-with-isMention). Source shape changed
`'subscribed' | 'mention' | 'dm'` → `'subscribed' | 'mention-new' | 'dm'`
to make the "unsubscribed-mention event" distinction explicit.

## New fields

- `ConversationConfig.ignoredMessagePolicy` — projected from the
  messaging_group_agents row so the bridge knows which wirings need
  pre-subscription. buildConversationConfigs in src/index.ts populates
  it.

Tests: host 153/153, container 46/46. No new tests yet — the subscribe
call path needs a Chat mock, deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
glifocat and others added 29 commits April 24, 2026 11:44
Two long-line violations introduced in d121cd1 (isGroup plumbing)
exceed the printWidth limit. CI format:check fails on every PR
opened against main until this is fixed; the fix is isolated here
so no behavior change is mixed in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…plate

On a fork PR, GITHUB_TOKEN is demoted to read-only regardless of the
workflow's permissions: block, so issues.addLabels() returns 403. The
label workflow silently works for PRs that skip the template (no
checkboxes ticked → no API call) and fails for PRs that actually
follow it — a hostile incentive against contributors who do the right
thing.

pull_request_target runs in the context of the base branch with full
declared permissions, which is the documented fix for this case. Safe
here because the workflow is metadata-only: it reads
context.payload.pull_request.body and calls addLabels. No checkout,
no PR-supplied code executes. A SECURITY comment is added above the
trigger to keep it that way.

Refs:
- https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request_target
- https://securitylab.github.com/resources/github-actions-preventing-pwn-requests/

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v2: fix setup verify for CLI-only installs
…bridge

chore(format): apply prettier to chat-sdk-bridge.ts
…pport

fix(workflows): label PRs from forks that follow the contributing template
…rrors-in-setup

[codex] detect setup auth ping failures
…ngage-mode-schema

fix(setup): register step uses engage_mode columns dropped by migration 010
Phase 2 of the SKILL.md already contains the Dockerfile + TOOL_ALLOWLIST
edit instructions with an "ALREADY APPLIED" short-circuit. Keeping those
edits out of trunk means users who never run /add-gmail-tool don't carry
the Gmail MCP package in their image.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… reads

The upstream precedence fix (5845a5a) made agent_groups.agent_provider and
sessions.agent_provider authoritative for host-side provider contribution
(per-session mount, env passthrough), but those DB values don't propagate
into the group's container.json — and the in-container runner reads
`provider` from container.json, not from the DB. That caused a confusing
failure mode: flipping the DB column to 'codex', rebuilding, and
restarting still spawned a Claude runner because container.json had no
provider field. The old skill wording ("container receives AGENT_PROVIDER
from the resolved value") overstated the integration.

Update add-codex and add-opencode "Per group / per session" sections to
say: set `"provider": "<name>"` in the group's container.json — that's
the source the runner reads. Keep the DB columns documented for the
host-side contribution they actually drive, and spell out the
session → group → container.json → 'claude' fallback so the precedence
is still discoverable.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
skill(add-gmail-tool): OneCLI-native Gmail MCP tool
Adds /add-gcal-tool — a sibling of /add-gmail-tool that installs
@cocal/google-calendar-mcp with the same OneCLI stub-file pattern. Skill
applies the Dockerfile + TOOL_ALLOWLIST changes at install time; trunk
stays clean so users who never run the skill don't carry the calendar
MCP in their image.

Dropped the Phase 5 dry-run section since it hardcoded a per-install
image tag slug and duplicated Phase 4's live agent test.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
skill(add-gcal-tool): OneCLI-native Google Calendar MCP tool
…for native IDs

setup/register.ts had two bugs that prevented new channels from being
registered via `/manage-channels`:

1. createMessagingGroupAgent was called with the legacy field names
   `trigger_rules` and `response_scope`. The SQL INSERT expects
   `engage_mode` / `engage_pattern` / `sender_scope` / `ignored_message_policy`
   (migration 010). Every register call failed with
   `RangeError: Missing named parameter "engage_mode"` after the agent
   and messaging group were partially created — leaving an orphaned pair.

   Now mirrors scripts/init-first-agent.ts:wireIfMissing:
   - Groups (is_group=1) default to engage_mode='mention' (bot only
     responds when addressed).
   - DMs (is_group=0) default to engage_mode='pattern' with '.' (respond
     to every message).
   - An explicit --trigger overrides the pattern regex.

2. The "normalize platform_id" block unconditionally prefixed
   "<channel>:" even for native IDs like WhatsApp JIDs
   ("120363408974444974@g.us"), iMessage emails ("user@example.com"),
   or Signal phones ("+15551234567") / Signal groups ("group:abc"). But
   the router (src/router.ts:158) looks up messaging_groups by the raw
   event.platformId from the adapter, which for these native adapters
   never has a prefix. So the prefixed row was never matched — the
   message was silently dropped with no "Message routed" log.

   Extracted scripts/init-first-agent.ts:namespacedPlatformId into
   src/platform-id.ts so both setup paths use the same heuristic (skip
   the prefix for IDs containing '@', starting with '+', or starting
   with 'group:'). Prevents future drift between the two paths.

Tested by: re-running `setup/index.ts --step register` for a WhatsApp
group JID, confirming the row is created with correct engage fields
and matching platform_id, then sending a test message and observing
"Message routed" with the right agent group.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-per-provider-and-agent-route-files

Two independent correctness fixes: per-provider continuations + agent-route file forwarding
fix(register): wire channels with correct engage fields, skip prefix for native JIDs
Install Telegram polling adapter (HummaMummaBot), add Agency HQ API
skill for CEO agent, and add credential fallback when OneCLI gateway
is unavailable — passes CLAUDE_CODE_OAUTH_TOKEN directly to containers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.