Skip to content

Fix/audit findings#1314

Closed
Wmaxlees wants to merge 57 commits intoqwibitai:mainfrom
Wmaxlees:fix/audit-findings
Closed

Fix/audit findings#1314
Wmaxlees wants to merge 57 commits intoqwibitai:mainfrom
Wmaxlees:fix/audit-findings

Conversation

@Wmaxlees
Copy link
Copy Markdown

Type of Change

  • Feature skill - adds a channel or integration (source code changes + SKILL.md)
  • Utility skill - adds a standalone tool (code files in .claude/skills/<name>/, no source changes)
  • Operational/container skill - adds a workflow or agent skill (SKILL.md only, no source changes)
  • Fix - bug fix or security fix to source code
  • Simplification - reduces or simplifies source code
  • Documentation - docs, README, or CONTRIBUTING changes only

Description

For Skills

  • SKILL.md contains instructions, not inline code (code goes in separate files)
  • SKILL.md is under 500 lines
  • I tested this skill on a fresh clone

gavrielc and others added 30 commits March 8, 2026 22:59
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Adds reaction receiving, sending, storage, and search. Includes
StatusTracker for message lifecycle signaling and react_to_message
MCP tool for container agents.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Syncs with upstream main (on schedule, dispatch, or push), then
merges main into all skill/* branches with build+test validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
# Conflicts:
#	src/channels/whatsapp.test.ts
#	src/channels/whatsapp.ts
github-actions bot and others added 26 commits March 11, 2026 10:26
Add an evolving behavioral skills system where agents browse and apply
relevant guidelines per-task, with automated evaluation and evolution.

Phase 1: DB schema (6 tables), skill deployer, container mount,
report_skills_used MCP tool, IPC handler, task run recording.
Phase 2: Evaluator loop using direct Anthropic API (Sonnet) with
30-minute deadline before automated scoring.
Phase 3: Evolution agent with cold start, candidate lifecycle,
drift validation, auto-rollback, and version management.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Resolved conflicts in src/db.ts (kept both behavioral skills and
reactions tables), src/index.ts (kept both runStartTime and
firstOutputSeen, both onTasksChanged and statusHeartbeat),
src/ipc.ts (kept both skills-used handling and status heartbeat,
preserved update_task case).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Groups consecutive agent turns (up to 6) into rollouts that are
evaluated as a unit. The evaluator now scores the full conversation
window including tool call inputs/outputs and produces a reasoning
field passed to the evolution agent.

Key changes:
- config.ts: ROLLOUT_SIZE (6) and ROLLOUT_INACTIVITY_MS (30min)
- types.ts: ToolCall, Rollout interfaces; updated SkillTaskRun/SkillEvaluation
- db.ts: rollouts table, rollout_id/tool_calls on runs, evaluator_reasoning
  on evaluations, rollout CRUD + getLowScoringRollouts
- session-reader.ts: extract tool calls from Claude Code JSONL transcripts
- rollout-manager.ts: getOrCreateRollout / closeStaleRollouts
- evaluator.ts: evaluate closed rollouts with tool_selection dimension
- evolution.ts: consume rollout context with per-turn tool calls + reasoning
- index.ts: accumulate response text, assign rollout_id, extract tool calls
- evaluator-prompt.md / evolution-prompt.md: updated for multi-turn format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
feat: multi-turn rollout evaluation windows
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add worker_tasks and wall DB tables
- WorkerTask and WallEntry types
- IPC handlers: create_worker_task, post_wall (with depth guard)
- Worker manager: spawns containers for pending tasks, collects results,
  propagates completion up parent chain
- Root task completion triggers orchestrator synthesis + user notification
- ContainerInput extended with isWorkerTask / workerTaskId / workerDepth
- container/worker-guide.md: agent instructions for delegating and acting as worker

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Host derives chatJid from registered groups by folder name when
the agent omits it. Simplifies delegation — agents no longer need
to know or pass their own chat JID.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Main agent enqueues all requests as worker tasks. Only exception:
answering status questions about in-progress work.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Critical:
- SQL operator precedence bug in getSkillVersionCount
- JSON.parse crash on container_config in group lookup
- Evolution skill modifications wrapped in transactions (atomic)
- checkParentCompletion converted to iterative (no stack overflow)
- Worker spawn failure now always marks task as failed

High:
- Silent migration failures now log warnings
- getRootTaskId has cycle detection + iteration cap
- Dynamic SQL UPDATE field names whitelisted
- 30s timeout + 429 handling on evaluator/evolution API calls
- LLM response structure validated before use
- User content wrapped in code fences (prompt injection)
- Streaming parse buffer capped at 10MB
- Worker synthesis callback has full error handling
- parentDepth read from DB not IPC message
- Wall entries verify group ownership

Medium:
- foreign_keys = ON enabled in SQLite
- MAX() on TEXT timestamps replaced with CASE WHEN
- Evolution candidate/rollback ops in transactions
- Missed selection logging uses skill ID not name
- IPC write failures logged (group-queue)
- tool_calls/dimensions parse errors logged

Low:
- Index on skill_task_runs.created_at
- Unused _ipcDir param removed
- Worker result capped at 500KB

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Workers now participate in the full skill feedback loop:

- Each root task tree gets a worker rollout (rollout_type='worker',
  id=worker-{rootTaskId}), created idempotently when the first worker
  task is spawned
- Each worker task completion records a skill_task_runs row linked to
  the rollout (worker_task_id FK, root_outcome_score propagated later)
- After synthesis is sent, the synthesis is scored via claude-haiku and
  the score is propagated back to all contributing worker runs as
  root_outcome_score; the rollout is then closed
- Evaluator processes closed worker rollouts separately using a
  worker-specific rubric (task_completion, accuracy, efficiency,
  decomposition_quality, result_quality) via claude-haiku
- Evolution context now includes low-scoring worker task trees alongside
  conversation rollouts so skill evolution can address worker behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants