feat: multi-modal agent — voice, file sending, vision#644
Closed
brandontan wants to merge 74 commits intoqwibitai:mainfrom
Closed
feat: multi-modal agent — voice, file sending, vision#644brandontan wants to merge 74 commits intoqwibitai:mainfrom
brandontan wants to merge 74 commits intoqwibitai:mainfrom
Conversation
Core features built on NanoClaw: - Delegation system: agents spawn sub-agents via IPC (max 3 concurrent workers) - BM25 memory search: pure JS, zero deps, with EMBEDDING_URL hook for semantic search - x402 payments: host-side handler, private keys never enter containers - Credential scrubbing: API keys/tokens auto-redacted from logs and outbound messages - DM allowlist: restrict who can interact with the agent - Per-task model override: cheap models for grunt work, smart models for conversations - Cron auto-pause: auto-disables after 5 consecutive failures - Discord channel support - Agent OS template: universal agent config inherited by all agents - Expanded container: python3, ffmpeg, imagemagick, 15+ npm packages pre-installed Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed observations
Implements issue #1 (v0.2.0 milestone). After substantial Discord conversations
(5+ user messages), the observer compresses messages into dated, prioritized
observations (🔴 Critical / 🟡 Useful / 🟢 Noise) via Sonnet 4.6. Observations
are appended to daily/observer/{date}.md and found by BM25 recall.
Security: credential scrubbing, hard delimiters, injection validation, output
scrubbing. Operational: 1-call step budget, ~$0.03 cost ceiling, 30s timeout,
circuit breaker (3 failures + 15min auto-reset), pino trace logging, kill switch
(OBSERVER_ENABLED=false). Fire-and-forget — never blocks conversation delivery.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: observer agent — auto-compress conversations into observations
Add central schema registry (schemas.ts) with Zod schemas for all 7 agent types, plus a reusable LLM output validation utility (validate-llm.ts) that parses JSON, validates against Zod, and retries once with error feedback. Retrofit observer to request structured JSON from LLM instead of freeform markdown, validate with Zod before writing to disk. Fix credential scrubbing to run after parse (scrubbing raw JSON breaks structure due to greedy regex). 46 tests pass (16 schema + 10 validation + 15 observer + 5 eval assertions), 4 eval scaffolds skipped. Typecheck and build clean. Closes #20 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: Zod-validated LLM output schemas (#20)
Add JSONL conversation logger that scores each interaction with
heuristic quality signals (positive/negative/neutral) based on user
language patterns. Enables offline analysis of which topics the bot
handles well vs poorly.
- Heuristic signal extraction (no LLM call, zero cost)
- JSONL append to {group}/store/conversations.jsonl
- Credential scrubbing on all logged content
- 1MB file size cap, kill switch, message truncation
- Fire-and-forget hook in processGroupMessages
- ConversationLogEntrySchema added to central schema registry
18 tests pass. Typecheck and build clean.
Closes #6
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: conversation quality tracker with JSONL logging (#6)
When a user corrects the bot ("No, it's X not Y", "Actually...",
"I meant..."), detect the correction via regex, call LLM to extract
structured learning (wrong → right, knowledge file, context), and
append to {group}/learnings/LEARNINGS.md.
Regex gate before LLM call ensures zero cost for non-correction
messages. Same operational safeguards as observer: circuit breaker,
cooldown, file size cap, credential scrubbing, kill switch.
21 tests pass. Typecheck and build clean.
Closes #4
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: add auto-learner — detect corrections and log learnings
Combines Zod schema validation with content-level checks (length bounds, input grounding) to catch hallucinations and bad data before they're written to disk. Retry-once logic with error feedback on failure. 32 tests covering helpers, validators, schema/content validation flows, retry behavior, and StepValidation schema conformance. Closes #22 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: add per-step evaluation — validate agent outputs before use
Prunes observer entries by priority + age: noise at 30 days, useful at 90 days, critical kept forever. Parses observer markdown blocks, applies configurable retention policy, rewrites or deletes files. No LLM calls. 30 tests covering parser, age computation, filtering, reassembly, integration, and ReflectorOutput schema conformance. Closes #2 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: add reflector — deterministic memory garbage collection
Splits memory into domain files (operational, people, incidents, decisions) with regex-based categorization. Append-only writes with credential scrubbing, 200KB cap per domain, and migration helper for single-file to structured split. No LLM calls. 23 tests covering categorization, CRUD, migration, schema conformance, and credential scrubbing. Closes #3 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: add structured memory — categorized knowledge storage
…ions (#5) (#36) Detects frustration/failure signals (explicit frustration, abandonment, repeated corrections) via regex gate requiring >= 2 signals before triggering LLM analysis. Extracts structured HindsightReport (failureType, whatWentWrong, whatShouldHaveBeen, actionableLearning, severity) and appends to LEARNINGS.md. Operational safeguards: kill switch, circuit breaker (3 failures / 15min auto-reset), 10min per-group cooldown, 200KB file cap, credential scrubbing, 30s LLM timeout, message truncation, never-throws. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…#37) Replaces 4 hardcoded fire-and-forget blocks in index.ts with a single rule engine entry point. Rules are evaluated in order against conversation context (message count, correction patterns, frustration signals). Config-driven via optional router-rules.json per group (Zod-validated, falls back to defaults). Supports composable conditions (always, minMessages, correctionDetected, frustrationDetected, all, any). Every routing decision is trace-logged with input → rule matched → action taken. No LLM calls — all decisions are code-based. Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…re (#24) (#38) Stage 1: recall tool returns compact summaries (file, category, score, first line) Stage 2: recall_detail tool fetches full file content on demand - Add src/progressive-recall.ts with pure functions for summary extraction - Add src/progressive-recall.test.ts (35 tests) - Add mode param (layered/full) to container recall tool, default layered - Add recall_detail tool with path traversal protection - Remove duplicate grep-based recall tool (bug fix) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
… MCP tool calls (#25) (#39) Wraps every MCP tool handler with timing, credential scrubbing, and async JSONL logging. Daily-rotated files at /workspace/ipc/tool-calls-YYYY-MM-DD.jsonl. - Add src/tool-observability.ts with pure functions for log entries and scrubbing - Add src/tool-observability.test.ts (19 tests) - Monkey-patch server.tool in container to inject observability wrapper - Fire-and-forget async append — zero impact on tool execution latency Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
) (#40) Keyword classifier determines task type (research, grunt, conversation, analysis, content, code, quick-check) then routes to the best model. Explicit model override always wins. Configurable per-group via model-routing.json. - Add src/model-router.ts with classifier, selector, Zod config schema - Add src/model-router.test.ts (40 tests) - Wire into task-scheduler.ts and delegation-handler.ts Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…rk (#28) Rate limits: send_sms 10/hr, make_call 5/hr. Daily spend cap $10 on x402_fetch. In-memory per-session state. Completes 5-axis audit (UX, guardrails, concurrency, observability, autonomy) across all 16 MCP tools. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: tool guardrails audit — rate limits, spend caps (#28)
Progressive disclosure MCP tool: overview when asked "what can you do?", detailed section on request. Reads structured capabilities.json from workspace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: self-knowledge — agent explains its own capabilities (#27)
Implements Agent Client Protocol so Sovereign agents can be driven from Zed, Cursor, and other ACP-compatible clients. Bridges ACP sessions to the container-runner pipeline. Off by default (ACP_ENABLED=true to enable). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Immutable release snapshots in releases/<sha>/, atomic symlink switch via rename, instant rollback to previous release, auto-prune keeping last 5 releases. Pure functions with 20 tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
feat: atomic rollback deploys — symlink-based release management (#11)
The Claude Agent SDK writes debug logs to /home/node/.claude/debug/ inside the container. This directory was never created, causing every agent invocation to crash with ENOENT. Found during live deployment testing on Hetzner. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Claude Agent SDK writes to /home/node/.claude/debug/ but container-runner.ts mounts host sessions dir over /home/node/.claude/, hiding the debug dir created in the Dockerfile. Create it host-side. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Host creates the dir as root but the container runs as UID 1000 (node). Without world-writable permissions the agent SDK gets EACCES. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Host creates volume-mounted directories as root but the container runs as UID 1000 (node). This caused EACCES on IPC file unlink and .claude debug writes. Applies chmod 777 to group dir, sessions dir, debug dir, and all IPC subdirectories. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
6-step guided onboarding flow for non-technical users:
Welcome → Identity → AI Engine → Channel → Build → Done
- Validates API keys live (Anthropic + OpenRouter)
- Validates Discord/Slack tokens via platform APIs
- WhatsApp path with QR code explanation
- Personalized build phases ("Compiling Adam's brain...")
- Human-readable error messages throughout
- Localhost-only, first-run-only security
- State persisted to store/wizard-state.json
- Writes .env, model-routing.json, groups/main/CLAUDE.md
- Dashboard starts automatically during wizard mode
UX flow refined with Gemini 2.5 Pro review feedback:
combined 8 steps to 6, momentum-based ordering,
humanized build step, error message philosophy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
dash (default /bin/sh on Ubuntu) interprets parentheses in `process.exit(0)` as a subshell, causing the validate phase to fail with exit code 2. Removing shell: true also eliminates shell injection risk. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Weekly GitHub Action checks NanoClaw upstream for new commits, opens PR with merge (flags conflicts if any) - Separate job checks for Claude Code SDK version bumps in the container Dockerfile - Deploy script sets up launchd (macOS) or systemd (Linux) service automatically Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Quickstart now leads with the setup wizard instead of manual .env editing - Added bug report and feature request issue templates with structured forms - Added SECURITY.md at repo root for GitHub Security tab - Manual setup preserved as "advanced" fallback section Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Animated GIF showing 5-step wizard flow (identity → provider → channel → build → done) - README Quick Start now points to wizard instead of manual .env editing - Fix: skip Docker check during wizard mode so new users without Docker can still see the wizard Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Brings in CJK font support, WhatsApp message normalization fix, /update-nanoclaw skill (replaces old /update engine, -1508 lines), and docs updates. Kept Sovereign name/version, SignalWire/wallet secrets, and Mac commands in README. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New users without .env get the wizard immediately instead of crashing on WhatsApp/Discord auth. Early return in main() when wizard is incomplete — only init DB and start dashboard. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previous OpenClaw comparison had several inaccurate claims (false: "single-threaded", "no payments", "manual setup", "Mac only"). Rewritten to be honest and focus on real differentiators: security by default, self-improving memory, codebase simplicity, revenue tools. Also adds FAQ section and corrects line count to ~20K. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… skills, Telegram channel
5 features for agent autonomy:
1. Identity tools (update_identity, read_identity) — agent can modify its own
CLAUDE.md with guardrails: immutable sections, injection blocking, audit log
2. Runtime npm packages — NODE_PATH extended to include persistent
/workspace/group/.packages/, npm_config_ignore_scripts=true for supply chain safety
3. Skills tool (list_skills, create_skill) — agent can create persistent custom
skills that survive across sessions, built-in skills protected from overwrite
4. Telegram channel — grammy-based, same Channel interface as Discord/Slack,
JID format tg:{chat_id}, activated via TELEGRAM_BOT_TOKEN env var
5. Container memory limits — --memory 1536m prevents OOM crashes on VPS,
configurable via CONTAINER_MEMORY_LIMIT env var
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Reverse NODE_PATH order so trusted /app/node_modules loads first - Move npm_config_ignore_scripts after build-time installs (fixes sharp) - Add ALLOWED_USERS check to Telegram channel - Block immutable heading injection in update_identity content - Remove read_identity (redundant), BLOCKED_PATTERNS (bypassable), dead code - Sanitize YAML newlines in skills.ts description - Add 50KB content size limit to create_skill Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Keep only Discord + Telegram as active channels - Remove bump-version, skill-drift, update-tokens, upstream-sync workflows (require upstream GitHub App secrets, always fail on fork) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Voice: transcribe audio/voice messages via OpenRouter (Gemini Flash) before storing, so the agent sees text instead of placeholders. File sending: extend send_message tool with optional filePath param, pipe through IPC with path traversal protection, send via platform APIs (Discord AttachmentBuilder, Telegram sendPhoto/sendDocument). Vision: download images to .attachments/, persist metadata in DB, pass as base64 content blocks to Claude so the agent can see photos. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author
|
Merged directly into fork's main. Not contributing upstream. |
This was referenced Mar 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
send_messagetool accepts optionalfilePath(relative to/workspace/group/). IPC resolves paths with traversal protection. Discord uses AttachmentBuilder, Telegram uses sendPhoto/sendDocument.groups/{folder}/.attachments/, metadata stored in DB (newattachmentscolumn), passed as base64 content blocks to Claude. 5MB max, auto-cleanup of files >24h.Files changed
src/transcription.ts— new shared transcription utilitysrc/channels/telegram.ts— voice/audio/photo handling + file sendingsrc/channels/discord.ts— audio/image handling + file sendingsrc/types.ts— FileAttachment, MessageAttachment, updated Channel interfacesrc/router.ts— attachment metadata in formatted messages, file param on routeOutboundsrc/ipc.ts— filePath handling with path traversal protectionsrc/db.ts— attachments column migration + serializationsrc/index.ts— attachment collection, file param passthroughsrc/container-runner.ts— ContainerInputAttachment typecontainer/agent-runner/src/index.ts— multi-part content blocks, image cleanupcontainer/agent-runner/src/tools/messaging.ts— filePath param on send_message.env.example— multi-modal documentationTest plan
🤖 Generated with Claude Code