Closed
Conversation
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds reaction receiving, sending, storage, and search. Includes StatusTracker for message lifecycle signaling and react_to_message MCP tool for container agents. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Syncs with upstream main (on schedule, dispatch, or push), then merges main into all skill/* branches with build+test validation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts: # src/channels/whatsapp.test.ts # src/channels/whatsapp.ts
Add an evolving behavioral skills system where agents browse and apply relevant guidelines per-task, with automated evaluation and evolution. Phase 1: DB schema (6 tables), skill deployer, container mount, report_skills_used MCP tool, IPC handler, task run recording. Phase 2: Evaluator loop using direct Anthropic API (Sonnet) with 30-minute deadline before automated scoring. Phase 3: Evolution agent with cold start, candidate lifecycle, drift validation, auto-rollback, and version management. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolved conflicts in src/db.ts (kept both behavioral skills and reactions tables), src/index.ts (kept both runStartTime and firstOutputSeen, both onTasksChanged and statusHeartbeat), src/ipc.ts (kept both skills-used handling and status heartbeat, preserved update_task case). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Groups consecutive agent turns (up to 6) into rollouts that are evaluated as a unit. The evaluator now scores the full conversation window including tool call inputs/outputs and produces a reasoning field passed to the evolution agent. Key changes: - config.ts: ROLLOUT_SIZE (6) and ROLLOUT_INACTIVITY_MS (30min) - types.ts: ToolCall, Rollout interfaces; updated SkillTaskRun/SkillEvaluation - db.ts: rollouts table, rollout_id/tool_calls on runs, evaluator_reasoning on evaluations, rollout CRUD + getLowScoringRollouts - session-reader.ts: extract tool calls from Claude Code JSONL transcripts - rollout-manager.ts: getOrCreateRollout / closeStaleRollouts - evaluator.ts: evaluate closed rollouts with tool_selection dimension - evolution.ts: consume rollout context with per-turn tool calls + reasoning - index.ts: accumulate response text, assign rollout_id, extract tool calls - evaluator-prompt.md / evolution-prompt.md: updated for multi-turn format Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: multi-turn rollout evaluation windows
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add worker_tasks and wall DB tables - WorkerTask and WallEntry types - IPC handlers: create_worker_task, post_wall (with depth guard) - Worker manager: spawns containers for pending tasks, collects results, propagates completion up parent chain - Root task completion triggers orchestrator synthesis + user notification - ContainerInput extended with isWorkerTask / workerTaskId / workerDepth - container/worker-guide.md: agent instructions for delegating and acting as worker Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Host derives chatJid from registered groups by folder name when the agent omits it. Simplifies delegation — agents no longer need to know or pass their own chat JID. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Main agent enqueues all requests as worker tasks. Only exception: answering status questions about in-progress work. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Critical: - SQL operator precedence bug in getSkillVersionCount - JSON.parse crash on container_config in group lookup - Evolution skill modifications wrapped in transactions (atomic) - checkParentCompletion converted to iterative (no stack overflow) - Worker spawn failure now always marks task as failed High: - Silent migration failures now log warnings - getRootTaskId has cycle detection + iteration cap - Dynamic SQL UPDATE field names whitelisted - 30s timeout + 429 handling on evaluator/evolution API calls - LLM response structure validated before use - User content wrapped in code fences (prompt injection) - Streaming parse buffer capped at 10MB - Worker synthesis callback has full error handling - parentDepth read from DB not IPC message - Wall entries verify group ownership Medium: - foreign_keys = ON enabled in SQLite - MAX() on TEXT timestamps replaced with CASE WHEN - Evolution candidate/rollback ops in transactions - Missed selection logging uses skill ID not name - IPC write failures logged (group-queue) - tool_calls/dimensions parse errors logged Low: - Index on skill_task_runs.created_at - Unused _ipcDir param removed - Worker result capped at 500KB Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Workers now participate in the full skill feedback loop:
- Each root task tree gets a worker rollout (rollout_type='worker',
id=worker-{rootTaskId}), created idempotently when the first worker
task is spawned
- Each worker task completion records a skill_task_runs row linked to
the rollout (worker_task_id FK, root_outcome_score propagated later)
- After synthesis is sent, the synthesis is scored via claude-haiku and
the score is propagated back to all contributing worker runs as
root_outcome_score; the rollout is then closed
- Evaluator processes closed worker rollouts separately using a
worker-specific rubric (task_completion, accuracy, efficiency,
decomposition_quality, result_quality) via claude-haiku
- Evolution context now includes low-scoring worker task trees alongside
conversation rollouts so skill evolution can address worker behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Type of Change
.claude/skills/<name>/, no source changes)Description
For Skills