feat(runtime): multi-agent runtime by singlerider · Pull Request #6545 · zeroclaw-labs/zeroclaw

singlerider · 2026-05-09T06:30:28Z

Summary

Base branch: integration/v0.8.0 (multi-agent runtime is a v0.8.0-only landing).
What changed and why: Lands the multi-agent runtime end-to-end. Schema primitives (AliasedAgentConfig rename, [agents.<alias>.workspace] with path / access / unrestricted_filesystem / read_memory_from, [agents.<alias>.memory] with a MemoryBackendKind enum, [peer_groups.<name>] top-level map). Cross-reference validators on every reference. Per-backend agents table + agent_id migrations on SQLite, Postgres, and Lucid (Markdown / Qdrant / None per their idiomatic shapes), with agent_id promoted to NOT NULL REFERENCES agents(id): SQLite via table rebuild + FTS reindex inside an explicit BEGIN IMMEDIATE transaction with defer_foreign_keys = ON and PRAGMA foreign_keys = ON on every connection so the FK is enforced; Postgres via CHECK ... NOT VALID plus VALIDATE CONSTRAINT plus SET NOT NULL plus ADD CONSTRAINT ... NOT VALID plus VALIDATE CONSTRAINT so the metadata-only SET NOT NULL and the FK validation are both low-lock on populated tables. Per-backend schema_version metadata table. AgentScopedMemory + AgentScopedMarkdownMemory wrappers plumbed into Agent::from_config, cron's pre-prompt recall, and the post-failure session-purge cleanup. Audacity-review fixes that ship in this PR: the bare Memory::store path on every backend now attributes un-scoped writes to the synthesized default agent (Postgres COALESCE on insert, SQLite COALESCE in the existing store_with_agent, Qdrant unwrap_or("default") payload), so direct callers cannot trip the NOT NULL FK. recall_for_agents pushes the agent_id filter into the query layer on every backend (Postgres WHERE agent_id = ANY($n), SQLite WHERE agent_id IN (...) on the candidate-id fetch, Qdrant payload must filter on the search call), eliminating the over-fetch + post-filter pattern that left legacy NULL-agent_id rows visible to scoped callers. Config::load_or_init resolves ZEROCLAW_CONFIG_DIR / ZEROCLAW_WORKSPACE BEFORE running the filesystem migration so custom installs migrate too, and config.workspace_dir now points at the migrated default-agent workspace (<install>/agents/default/workspace/) so legacy install-wide callers (cost::CostTracker, sop, skills, plugins::PluginHost, memory CLI) read the live agent dir instead of an orphaned legacy path. AccessMode::Write actually grants write-only access: a new allowed_roots_write_only tier on SecurityPolicy is honored by write-side tools and explicitly NOT consulted by is_resolved_path_readable, so file_read / pdf_read / glob_search / content_search refuse the path while file_write / file_edit / git_operations admit it. ensure_no_escalation_beyond validates the write-only tier as a SubAgent subset axis with a WriteOnlyRootNotInParent EscalationViolation variant. SubAgent runtime budget sharing: PerSenderTracker.buckets is Arc<Mutex<...>> so a SubAgent run with a caller-supplied policy override inherits the parent's live action / cost bucket; spawning a child no longer bypasses max_actions_per_hour or max_cost_per_day_cents. DelegateTool boundary validation: every delegate call now resolves the target's per-agent SecurityPolicy via SecurityPolicy::for_agent, validates it as a subset of the caller's via ensure_no_escalation_beyond, and assigns the caller's PerSenderTracker to the resolved policy so delegated runs share the caller's action / cost bucket. Targets whose configured risk profile or workspace access map would widen permissions beyond the caller surface a structured ToolResult failure at the delegate boundary instead of running. The hand-rolled chat / agentic loops still execute against the caller's parent_tools registry, which is a deliberate single-registry design; rebuilding the registry per delegation is a v0.8.1 follow-on the body does NOT promise here. Live peer delivery: SendMessageToPeerTool resolves agent-alias targets to in-process routing via agent::loop_::process_message (the bot identity is shared across agents on the same channel, so an outbound channel send would loop right back inbound and the self-loop guard would drop it; agent-to-agent messaging is process-internal by design); external peers continue through the channel registry's delivery handler. The agent-alias path is fire-and-forget (tokio::spawn detached): a success: true tool result means accepted-for-processing, not completed, and recipient-side errors only surface through the recipient's own observability and a sender-side tracing::warn!. Module docs and the success output string make this explicit. Per-agent memory backend factory (zeroclaw_memory::create_memory_for_agent). Filesystem migration from <install>/workspace/ to <install>/agents/default/workspace/ with timestamped backup + idempotent re-run. Per-agent identity-file load. SecurityPolicy::for_agent populating allowed_roots / allowed_roots_read_only / allowed_roots_write_only from [agents.<alias>.workspace.access] per AccessMode. file_read consults the new is_resolved_path_readable (read-only allowlist + POSIX device files like /dev/null). DelegateTool reads sub-agent system prompts from per-agent identity files instead of an AliasedAgentConfig.system_prompt string (field deleted). The SubAgent runtime with SubAgentOverrides and an expanded SecurityPolicy::ensure_no_escalation_beyond subset validator (autonomy, allowed_roots rw + ro + write-only with path-containment matching, allowed_commands, workspace_only, forbidden_paths in the parent ⊆ child direction, shell_env_passthrough, max_actions_per_hour, max_cost_per_day_cents, shell_timeout_secs, block_high_risk_commands, require_approval_for_medium_risk) wired through a new AgentRunOverrides parameter on agent::run so the validated child policy reaches the agent loop. The two-layer channel self-loop guard (SDK-side Channel::drop_self_messages + peers::should_drop_self_loop agent-loop fallback) with self_handle overrides on the four major inbound channels (Telegram via bot_username cache, IRC via configured nickname, Discord via token-decoded user_id, Slack via cached auth.test user_id) and an honest trait doc about the gap on the remaining inbound impls. Audit / trace agent_alias field. Console-formatter [<alias>] prefix with [system] fallback for boot / migration code paths. Peer-group runtime resolver with the strict outbound is_known_peer check + symmetric @-prefix and case normalization on both agent-peer and external-peer matching. Retires every legacy primitive the new design supersedes: [workspace] block, WorkspaceManager, WorkspaceTool, WorkspaceBoundary, MemoryNamespaceConfig, NamespacedMemory<M>, active_workspace.toml marker including its protected-config-path entries, src/hands/ module and the Hand* observability events / metrics (HandStarted / HandCompleted / HandFailed), the system_prompt config string on AliasedAgentConfig.
Scope boundary: Cross-backend cross-agent memory access is rejected at config-load (the schema validator already rejects mismatched-backend allowlist entries). Cross-backend memory sharing AND agent backend switching after creation AND agent rename are deferred to v0.8.1 per the issue's non-goals list. The [agents.<alias>].subagent_* config block that supplies caller-defined SubAgentOverrides is deferred to v0.8.1; the override type, the validator, AND the runtime wire-through (AgentRunOverrides) are in this PR. Does NOT update web frontend types or .po translations (deferred to @Audacity88's follow-up). Does NOT introduce a Postgres CREATE INDEX CONCURRENTLY step (documented as a post-deploy operator action; the SET NOT NULL and FK validation are low-lock via the NOT VALID + VALIDATE pattern). Does NOT add per-agent secret namespacing. Does NOT override Channel::self_handle on every inbound channel impl: the four highest-traffic ones (Telegram, IRC, Discord, Slack) are covered; other inbound channels (Matrix, Bluesky, Notion, Mochat, Linq, WeCom, QQ, Wati, ACP) keep the default None and rely on per-impl filtering until a v0.8.1 audit pass.
Blast radius: Schema is V3-additive: every new field is #[serde(default)], so existing 0.7.x configs deserialize unchanged through the V2→V3 path. The AliasedAgentConfig rename touches 23 files but is a pure s/Delegate/Aliased/ plus a doc catch-up commit. The agents table + agent_id column on memories lands as nullable + backfilled, so the SQLite / Postgres migration is idempotent and survives a re-run on an already-migrated install. The Postgres SET NOT NULL + FK promotion uses CHECK NOT VALID + VALIDATE to keep the locks short on populated production tables. The SQLite rebuild runs inside BEGIN IMMEDIATE so a crash mid-rebuild rolls back to the pre-migration shape (the timestamped backup is the secondary recovery path). The [workspace] and memory_namespace retirements drop fields off the schema but the V2→V3 migration silently strips them from incoming configs: no manual migration step required from operators. The MemoryEntry.agent_id field is additive with #[serde(default)], so external Memory implementors and any persisted MemoryEntry JSON deserialize unchanged. The self-loop guard sits on a default Channel trait method, so a forgotten channel impl can't silently leak; the four major inbound channels override it explicitly. Downstream Memory implementors: every backend now implements store_with_agent, recall_for_agents, and ensure_agent_uuid directly. Out-of-tree implementations must stub at minimum store_with_agent and recall_for_agents (the ensure_agent_uuid default returns the alias verbatim, which is correct for backends without UUID indirection). Subsystems touched: every memory backend, every channel impl (via the trait default), the cron scheduler, the audit-log emitter, and the runtime-trace emitter; the onboarding wizard loses its Workspace section and the per-agent memory_namespace step.
Linked issue(s): Closes Multi-agent runtime: per-alias workspaces, permissions, and shared resources #6272.

Validation Evidence (required)

cargo +nightly fmt --all -- --check
cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings
cargo test --workspace --exclude zeroclaw-desktop --features ci-all

Commands run and tail output:
- cargo +nightly fmt --all -- --check — clean (no diff, exit 0).
- cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings — Finished dev profile [unoptimized + debuginfo] target(s) in 2m 12s. Zero warnings under -D warnings.
- cargo test --workspace --exclude zeroclaw-desktop --features ci-all — every per-crate test bucket green. Notable green totals: 1656 zeroclaw-runtime + 615 zeroclaw-config + 307 zeroclaw-memory + 1127 zeroclaw-tools + 768 channels + 1653 hardware + 1125 plugins. Tests added in this PR cover: SubAgentSpawn::{for_agent, build} (known-alias success, unknown-alias rejection, inherits-verbatim, escalating-policy rejection, narrowed-allowlist subset, parent action-budget inheritance under override); 16 axes on SecurityPolicy::ensure_no_escalation_beyond (rw-root, ro-root, write-only-root, command, workspace_only, max_actions, max_cost; identical-policy and narrowed-child accept paths; rw→ro downgrade accept; subpath narrowing inside parent root; autonomy escalation; dropped forbidden_paths entry; expanded shell_env_passthrough; higher shell_timeout_secs; disabled block_high_risk_commands; disabled require_approval_for_medium_risk); SpawnSubagentTool (empty/missing prompt rejection via structured ToolResult, unknown parent surfaces structured failure); SendMessageToPeerTool (non-peer rejection, channel-listener rejection, empty-arg rejection, external-peer normalization with peer-set-pass assertion); ResolvedPeers::is_known_peer; peers::should_drop_self_loop; Channel::drop_self_messages; the new schema cross-reference validators; the SQLite agents migration; AgentScopedMemory (own-recall / sibling-allowlist / cross-agent isolation / caller-allowlist intersection / get cross-agent filter / forget cross-agent refusal / list attribution filtering / store_with_agent foreign-id refusal / purge_namespace refusal / purge_session bound-only); AgentScopedMarkdownMemory; the filesystem migration; is_resolved_path_readable write-only-root rejection + write-side admission; glob_search symlink-into-write-only-root refusal; content_search absolute-path-under-write-only-root refusal; DelegateTool::policy_for_target (escalating-target rejection at delegate boundary, caller-tracker inheritance via shared PerSenderTracker bucket, root_config-absent fallback to caller's policy); SecurityPolicy::for_agent access-tier routing including the new write-only allowlist tier; zeroclaw_runtime::peers::resolve_peer_set (mutual membership, external peers, ignore subtraction, allows_inbound external normalization, is_known_peer strict outbound, agent-peer @-prefix and case normalization); E2E in tests/system/multi_agent_e2e.rs (legacy upgrade with backup, two-agent isolated memory, peer-group routing with in-process delivery). Tests deleted: every WorkspaceManager / WorkspaceTool / WorkspaceBoundary / NamespacedMemory<M> test (modules are gone); the t14e_memory_namespace_widening migration test; the active_workspace marker tests; the workspace_double_run_is_idempotent_on_disk onboard section test; the section_has_signal_workspace_tracks_enabled_flag test.
Beyond CI: what did you manually verify? Verified the agents table migration runs idempotently on a re-init (no duplicate row, no failed INSERT). Verified the cron JobType::Agent dispatch span shape includes parent_alias, run_id, spawn_site = "cron". Verified the spawn_subagent tool span shape includes the same fields with spawn_site = "tool". Verified that SubAgentSpawn::for_agent rejects an unknown parent alias with a structured failure that names the alias. Verified a real 0.7.x config.toml with a populated [workspace] block deserializes cleanly through V2→V3 with the legacy fields silently stripped. Verified that adding an [agents.researcher] block to config.toml causes the runtime to create <install>/agents/researcher/workspace/ and seed bootstrap identity files on first agent-loop entry; the agent then loads its identity files from the per-agent dir. Verified the V3 default-agent path on fresh install: a freshly initialized config now opens its SQLite memory at <install>/agents/default/workspace/memory/brain.db (the previously orphaned legacy path is no longer recreated). Verified AccessMode::Write semantics with a unit test that asserts is_resolved_path_allowed admits the path while is_resolved_path_readable refuses it. Did NOT verify a live multi-agent peer-group message exchange across a real Telegram channel (covered by the in-process E2E and the unit-tested resolver invariants).
If any command was intentionally skipped, why: None skipped.

Security & Privacy Impact (required)

New permissions, capabilities, or file system access scope? Yes. Adds the SubAgent surface (spawn_subagent tool + cron JobType::Agent routing). Both spawn sites funnel through SubAgentSpawn::for_agent(config, alias).build(SubAgentOverrides::default()) and inherit the parent agent's identity verbatim by default: same SecurityPolicy, same memory allowlist, same secret store. The SubAgentOverrides type ships in this PR. The validated context now reaches the agent loop via AgentRunOverrides { security, memory } on agent::run (previously discarded; both spawn sites now pass Some(subagent_ctx.policy.clone()) for the security side). The [agents.<alias>].subagent_* config block that plumbs caller-defined narrowing into SubAgentOverrides lands in v0.8.1. Any caller-supplied policy override is validated as a subset of the parent via SecurityPolicy::ensure_no_escalation_beyond, which now covers: autonomy, allowed_roots (rw + ro + write-only with path-containment matching), allowed_commands, workspace_only, forbidden_paths in the parent ⊆ child direction, shell_env_passthrough, max_actions_per_hour, max_cost_per_day_cents, shell_timeout_secs, block_high_risk_commands, and require_approval_for_medium_risk. UUID-set containment on the memory allowlist still applies. Both checks chain a precise EscalationViolation for diagnostics. SubAgent runtime budget: PerSenderTracker.buckets is Arc<Mutex<...>> so child runs taking a caller-supplied policy override inherit the parent's live action / cost bucket. A child cannot bypass the parent's max_actions_per_hour ceiling by spawning. DelegateTool boundary: delegate now plumbs an optional Arc<Config> and resolves the target's SecurityPolicy per call via SecurityPolicy::for_agent; the resolved policy is validated as a subset of the caller's via ensure_no_escalation_beyond (rejecting any target whose risk profile or workspace access would widen rights beyond the caller) and inherits the caller's PerSenderTracker so delegated actions count against the caller's budget. Escalating targets surface a structured ToolResult failure instead of running. Also adds SendMessageToPeerTool (peer-set authorized outbound; agent-alias targets route in-process, external peers go to the channel registry). SQLite PRAGMA foreign_keys = ON is now enabled on every connection so the multi-agent FK is actually enforced (it was unenforced before — declarative only). AccessMode::Write now grants write-only access cleanly: a new allowed_roots_write_only tier on SecurityPolicy is honored by write-side path checks and explicitly NOT consulted by read-side path checks, so AccessMode::Write no longer silently lets the bot read what it's only meant to write.
New external network calls? No.
Secrets / tokens / credentials handling changed? No. Per-agent secret namespacing is deferred to v0.8.1 per the plan; the single workspace-wide SecretStore is unchanged in this PR. The retired [workspace].isolate_secrets flag was a no-op stand-in that the multi-workspace primitive never delivered on; its removal does not change behavior.
PII, real identities, or personal data in diff, tests, fixtures, or docs? No. New tests use placeholder aliases (alpha, beta, agent-uuid-alpha, agent-uuid-rogue).
If any Yes, describe the risk and mitigation: SubAgent runtime is a privilege-inheritance primitive; the risk is a child run obtaining rights the parent doesn't have. Mitigation: every spawn site funnels through SubAgentSpawn::for_agent(config, alias).build(overrides); build runs the expanded SecurityPolicy::ensure_no_escalation_beyond on any caller-supplied policy override AND copies the parent's PerSenderTracker into the child policy so action / cost budgets are shared, AND the validated context reaches the agent loop. The audit's privilege-touching surfaces are: SubAgent spawn (validator + runtime wire-through + budget inheritance all shipped), AgentScopedMemory (now post-filters every read by the bound + allowlisted set, refuses cross-agent forget / purge_namespace, and rejects store_with_agent calls that target a foreign agent_id rather than silently rewriting them), AccessMode::Write (write-only allowlist tier separates write grants from read grants).

Compatibility (required)

Backward compatible? Yes (with documented schema retirements). Every new field is #[serde(default)] (including the new optional MemoryEntry.agent_id). The agents table lands as a fresh CREATE; the agent_id column on memories lands nullable, backfilled to the synthesized default agent's UUID, and indexed. Re-running the migration on an already-migrated install is idempotent (now detected via PRAGMA table_info + PRAGMA foreign_key_list rather than substring-matching DDL). The retired schema surface ([workspace], memory_namespaces, per-agent memory_namespace, KnowledgeConfig.cross_workspace_search, the active_workspace.toml marker file) is silently stripped by the V2→V3 migration: operators do not need to edit their config.toml by hand.
Config / env / CLI surface changed? Yes (additions and retirements both). New: [agents.<alias>.workspace] with path / access / unrestricted_filesystem / read_memory_from; [agents.<alias>.memory] with a MemoryBackendKind enum; [peer_groups.<name>] top-level map; always-on agent-loop tools spawn_subagent and send_message_to_peer; agent_alias: Option<String> on audit-log and runtime-trace events. Retired: top-level [workspace] block (entire struct), [memory_namespaces.<alias>] map and the per-agent memory_namespace field that referenced it, KnowledgeConfig.cross_workspace_search field, OnboardSection::Workspace enum variant + the --workspace-only legacy CLI flag, the Hand* observability events and metrics.
If No or Yes to either: exact upgrade steps for existing users — none required. The V2→V3 migration handles the legacy fields silently. Operators relying on --workspace-only should update their scripts (the wizard now starts at providers); operators relying on the active_workspace.toml marker for switching profiles should set ZEROCLAW_CONFIG_DIR instead. Operators who set ZEROCLAW_CONFIG_DIR or ZEROCLAW_WORKSPACE on custom installs will now have their legacy <install>/workspace/ migrated to <install>/agents/default/workspace/ on first boot (previously the migration silently skipped non-default install roots).

Rollback (required for `risk: medium` and `risk: high`)

This PR is risk: high (touches schema, memory layer, security policy, the runtime spawn surface, and retires legacy primitives in flight).

Fast rollback command/path: git revert <merge-sha> on integration/v0.8.0. The branch is sequenced so reverting the merge cleanly unwinds both the additive surface and the retirements. The DB schema is forward-compatible: the agents table and the agent_id column on memories remain after a revert (operators can drop them manually if they want a clean back-out). The filesystem migration is operator-recoverable via the timestamped backup at <install>/backup-<timestamp>/legacy-workspace/.
Feature flags or config toggles: None. The new schema fields are #[serde(default)], so a revert silently drops them. The spawn_subagent and send_message_to_peer tools are registered unconditionally; if a runtime kill switch is required post-deploy it can be added in a one-line follow-up gating the registration on root_config.
Observable failure symptoms: subagent spawn failed: in the cron dispatch logs; subagent run failed: from the agent-loop tool; agents.<alias>.workspace.access[<i>] = ... validation errors at config load (a freshly deployed config has a self-reference or dangling alias the validator should have caught); subagent policy override escalates beyond parent: followed by an EscalationViolation discriminant when a caller-supplied override violates the validator's subset rules; AgentScopedMemory refuses ... when an agent loop tool tries a cross-agent operation the wrapper does not permit; peer-message in-process delivery failed from send_message_to_peer when the recipient agent's process_message fails; [system] filesystem migration failed (continuing with legacy layout) at boot (the migration is non-fatal; the install keeps running on the legacy layout while the operator investigates). The audit-log agent_alias field showing up as null for system-level events (boot, scheduler ticks not bound to a specific agent) is expected.

Supersede Attribution (required only when `Supersedes #` is used)

Not applicable.

Pure-additive P1 of the v0.8.0 multi-agent runtime: introduces the type-level vocabulary later phases will consume when they wire into Config and the runtime. - AgentAlias, PeerGroupName, PeerUsername newtypes follow the existing define_provider_ref! macro pattern from providers.rs. The macro is now #[macro_export] so multi-agent types reuse it without duplication. - AccessMode enum carries cross-agent filesystem grants (Read, Write, ReadWrite). Schema-as-law: this is the single shape for granted modes. Absence of a key in the cross-agent access map remains the jailed default. Helpers allows_read / allows_write encode the capability check used by the upcoming SecurityPolicy r/rw split. - PeerExternal: typed entry for non-agent peer-group members (humans, external bots). No production call sites yet; tests cover serde round-trip, snake_case on the enum tag, capability predicates, and the [[peer_groups.<name>.external_peers]] array shape. Refs zeroclaw-labs#6272.

…oclaw-labs#6272 The src/hands module was declared in lib.rs but had zero in-tree consumers. Its companion observability surface (HandStarted, HandCompleted, HandFailed events; HandRunDuration, HandFindingsCount, HandSuccessRate metrics; the Prometheus and OTel hand_* counters) fired only from in-test emit sites, never from production code. P5 of the zeroclaw-labs#6272 plan reclaims the "Hand" identifier for the runtime-spawned sub-agent concept arriving in P10. Cleanly freeing the name means deleting the dead code, not renaming it. Net 1136 lines red, zero green: - src/hands/{mod,types}.rs and the lib.rs module declaration. - ObserverEvent::HandStarted / HandCompleted / HandFailed. - ObserverMetric::HandRunDuration / HandFindingsCount / HandSuccessRate. - Prometheus IntCounterVec hand_runs / HistogramVec hand_duration / IntCounterVec hand_findings (registry registrations included). - OTel u64_counter zeroclaw.hand.runs / f64_histogram zeroclaw.hand.duration / u64_counter zeroclaw.hand.findings. - Match arms and test emit sites in log.rs, otel.rs, prometheus.rs, noop.rs, verbose.rs, traits.rs, and observability_traits.rs. When the new Hand sub-agent lands in P10 it will introduce its own events with parent_alias, child_alias, and lifecycle fields; copying the old shape would have been wrong for that semantics. Refs zeroclaw-labs#6272.

@Audacity88

…zeroclaw-labs#6272 P2a is the mechanical first half of the schema rework: rename the config struct that backs every [agents.<alias>] TOML block from the historical Delegate prefix (which originated with DelegateTool's sub-agent dispatch) to AliasedAgentConfig, which reflects what it actually is in the multi-agent runtime: a top-level user-facing aliased agent. Net 132 inserts and 132 deletions across 23 files. No behavior change; type and field names move only. Touches: - crates/zeroclaw-config/src/schema.rs (definition + Default + every reference in the validator) - crates/zeroclaw-config/src/schema/v2.rs (V2 to V3 migration call sites that synthesize the default agent and propagate per-agent fields) - crates/zeroclaw-runtime/src/tools/delegate.rs and the cron tool family (cron_add, cron_remove, cron_run, cron_runs, cron_update, schedule, mod) - crates/zeroclaw-runtime/src/agent/agent.rs and tests - crates/zeroclaw-runtime/src/cron/scheduler.rs and doctor/mod.rs - crates/zeroclaw-providers, channels (orchestrator, acp_server, tts), tools/mod.rs - crates/zeroclaw-gateway/src/api.rs (gateway must rename for compilation; @Audacity88's follow-up PR layers the user-facing surface on top) - src/config/mod.rs and the component tests in tests/component/ P2b will add the nested [agents.<alias>.workspace] and [agents.<alias>.memory] blocks, the new top-level [peer_groups.<name>] map, and the slim RuntimeConfig that replaces the legacy [workspace] block. Refs zeroclaw-labs#6272.

…fig rename The mechanical type rename in the prior commit left several pieces of prose still saying "delegate agent" where the concept being described is the new AliasedAgentConfig. This commit catches them up so reviewers reading either rustdoc or tool output see the right vocabulary. - crates/zeroclaw-config/src/schema.rs: header comment for the agents map field, the section banner above AliasedAgentConfig, the doc comment on Config::resolve_aliased_agent_for_alias-style helpers (model_provider_for_agent and the alias resolver), and the section header. - crates/zeroclaw-tools/src/model_routing_config.rs: tool description, error message on remove_agent, JSON schema property descriptions for api_key / name / system_prompt / agentic. These flow into the LLM tool catalog the agent reads at runtime, so they show up in user-visible tool surfaces. - crates/zeroclaw-runtime/locales/en/tools.ftl: English source string for the model-routing-config tool description. Translation work for the other locales is out of scope for this PR per Audacity88's follow-up agreement; the fluent fallback chain holds the line until those land. - crates/zeroclaw-channels/src/orchestrator/mod.rs: internal rustdoc on the per-runtime aliased-agent config field. No behavior change. Compiler-visible code stays identical; the diff is comments and string literals. Refs zeroclaw-labs#6272.

…law-labs#6272 P2b lands the schema-as-law shapes for v0.8.0 multi-agent config. Pure-additive: every existing TOML config still loads identically; new fields default to jailed / SQLite / empty. Three new structs in crate::multi_agent (already houses the alias newtypes and AccessMode from P1): - MemoryBackendKind enum (None, Sqlite, Postgres, Qdrant, Markdown, Lucid). Closed set. Schema-is-law: every consumer-side dispatch on the backend goes through this enum, never a string match. The legacy Config.memory.backend dotted-alias string stays for now; P3 wires the enum into the resolver as the cliff approaches. - AgentWorkspaceConfig: optional explicit path, cross-agent filesystem allowlist (BTreeMap<AgentAlias, AccessMode>), unrestricted_filesystem escape boolean, cross-agent memory allowlist (Vec<AgentAlias>). Default is fully jailed. - AgentMemoryConfig: backend selection, locked at agent creation per the spec. Default is Sqlite. - PeerGroupConfig: channel ref, member agents, external_peers, group-wide ignore list. Used by Config.peer_groups. Wired into the schema: - AliasedAgentConfig gains nested workspace and memory blocks ([agents.<alias>.workspace], [agents.<alias>.memory]). - Config gains peer_groups: HashMap<String, PeerGroupConfig> ([peer_groups.<name>]). HashMap matches the existing named-collection convention (memory_namespaces, knowledge_bundles, mcp_bundles); the typed PeerGroupName is used at validation and resolution time. Plumbing: HasPropKind impls in traits.rs for every new type the parent macros traverse, including BTreeMap<AgentAlias, AccessMode> (PropKind::Object). The shared define_provider_ref! macro grew PartialOrd/Ord derives so AgentAlias can be a BTreeMap key, which also benefits ChannelRef / ModelProviderRef / TtsProviderRef / TranscriptionProviderRef without touching them individually. The original plan called for a slim top-level RuntimeConfig as the new install-level multi-agent struct, but Config.runtime is already a runtime-adapter config (kind: native|docker, etc.) and nothing in v0.8.0 needs install-level multi-agent toggles that isn't already covered. Dropped that piece; if a need surfaces in P10 or later, it lands as a field on the existing RuntimeConfig or its own struct. Tests: 7 new unit tests on the new types covering serde round-trip on the array-of-tables forms, default-jailed semantics, the BTreeMap access map, snake_case enum tags, and channel dereference through ChannelRef. Refs zeroclaw-labs#6272.

…s#6272 P3 wires the schema-as-law constraints from the locked plan into Config::validate(). Every shape that was implicit at the type level becomes a load-time error so misconfigured installs fail cheap instead of producing confusing runtime errors later. Per-agent (inside the agents loop): - workspace.access keys must NOT be self. An agent always has full access to its own workspace; a self-reference in the cross-agent allowlist is meaningless and is rejected as InvalidFormat. - workspace.access keys must point at configured agents. DanglingReference if not. - workspace.read_memory_from entries follow the same rules: no self-reference, no dangling alias, plus an additional same-backend constraint. Cross-backend memory sharing is deferred to v0.8.1, so an entry pointing at a sibling on a different MemoryBackendKind fails with InvalidFormat at config load time. The error is explicit about the deferral so the operator knows where to look in the changelog. Cross-agent (after the agents loop): - peer_groups.<X>.channel must not be empty (RequiredFieldEmpty). - peer_groups.<X>.agents entries must exist as configured agents (DanglingReference); the validator iterates a sorted list of group names so error ordering is stable across runs, matching the existing per-agent validator pattern at the top of the same function. - Each peer-group member's channels list must include the group's channel (InvalidFormat if mismatched). This catches the most common multi-agent misconfiguration: putting an agent in a group it cannot physically reach. Tests: 7 new unit tests in schema::tests, each builds a minimal valid Config via a multi_agent_test_config helper and mutates a single field to provoke one validator. Coverage: - workspace.access self-reference rejected - workspace.access dangling target rejected - read_memory_from self-reference rejected - read_memory_from cross-backend rejected with deferral note - peer_group dangling member rejected - peer_group member without the group's channel rejected - valid two-member same-channel peer group accepts cleanly Side cleanup from rustfmt: a handful of `pub use schema::{...}` re-export blocks (src/config/mod.rs and friends) get re-sorted into the right alphabetical position now that AliasedAgentConfig replaces DelegateAgentConfig from P2a. Pure cosmetic. Refs zeroclaw-labs#6272.

(P6a) Adds the agents table and agent_id column on memories for the SQLite backend, with idempotent self-detection, atomic backup before destructive ALTERs on populated DBs, and a default-agent backfill so 0.7.x installs upgrade without data loss. The migration runs from SqliteMemory::with_embedder and ::new_named right after init_schema. Detection: if the agents table does not exist OR the memories.agent_id column is missing, the migration fires; otherwise it short-circuits as a true no-op. Schema (matches the locked plan in tmp/6272-multi-agent-plan.md): - agents: id TEXT PRIMARY KEY, alias TEXT NOT NULL UNIQUE, created_at TEXT NOT NULL. UUID stored as TEXT for cross-DB portability so the same code path works on SQLite, Postgres (P6b), and Lucid's local SQLite (P6c, automatic via SqliteMemory composition). - memories.agent_id TEXT, indexed. Left nullable at the SQLite layer because SQLite cannot add a NOT NULL FK column to an existing populated table without a full rebuild; the AgentScopedMemory<M> wrapper in P7 enforces non-null at write time by carrying the bound agent UUID and injecting it on every store. Nullability at the DB layer is the safe choice for the upgrade path: legacy rows backfill cleanly without rewrite, and the application never produces NULL after P7 lands. Atomic backup: when the memories table has rows AND the migration is about to fire, the SQLite file is copied to {db_name}.backup-{UTC_timestamp} alongside the original before the ALTER runs. A crashed migration leaves the operator with a recoverable copy. Skipped on fresh installs (no data to lose). Default agent: a UUID is generated (uuid::Uuid::new_v4) and INSERT OR IGNORE'd into agents. The post-INSERT re-query returns the row that actually persisted, so concurrent inits from different threads or processes converge on a single UUID. Tests: 4 integration tests in the new crates/zeroclaw-memory/tests/ directory (which the plan explicitly calls for). Coverage: fresh install, idempotent re-init, pre-migration data backfill with backup, post-migration store/recall round-trip. The pre-migration test seeds the legacy schema (memories + indices + FTS5 virtual table + triggers, no agents table, no agent_id) so the migration runs against a true upgrade scenario, not a hand-built half-schema. Postgres (P6b) and Lucid (P6c) follow in the next commits; Lucid's local DB is SqliteMemory underneath so it picks up this migration automatically. The wire-format work for cross-agent scoping in the external Lucid CLI stays deferred to v0.8.1 per the plan. Refs zeroclaw-labs#6272.

…#6272 (P6b) Mirrors the SQLite migration from P6a, but PG-flavored: idempotent via ADD COLUMN IF NOT EXISTS / CREATE INDEX IF NOT EXISTS / ON CONFLICT DO NOTHING, default-agent UUID generated in Rust and bound as a parameter, backfill in the same init pass via a parameterized UPDATE. Schema (matches SQLite path so cross-DB code stays one shape): - {schema}.agents: id TEXT PRIMARY KEY, alias TEXT NOT NULL UNIQUE, created_at TIMESTAMPTZ NOT NULL. - {qualified_table}.agent_id TEXT, indexed. Nullable at the DB layer matching SQLite; the AgentScopedMemory<M> wrapper in P7 enforces non-null at write time. Wired into PostgresMemory::initialize_client right after init_schema so every fresh connection is fully migrated before try_enable_pgvector runs (which can fail safely on pgvector absence). Backups: operator's responsibility for Postgres. The binary cannot reach across the network to dump a managed cluster, so we do not take a file-copy backup the way SQLite does. Documented. Concurrent first-init: INSERT...ON CONFLICT (alias) DO NOTHING plus a follow-up SELECT means concurrent initializers from different processes converge on the same default agent UUID (whichever insert wins is the persisted row). Tests: skipped in this commit. Existing Postgres tests in the crate are gated behind the memory-postgres feature and require a running Postgres for execution; CI does not provision one. Cross-DB parity tests for the migration land alongside any test-container support in a follow-up. The code path is a direct mirror of P6a's tested SQLite path so confidence is reasonable. Lucid (the third backend) wraps SqliteMemory for its local store, so P6a's migration runs automatically when Lucid initializes; no separate Lucid migration code is needed for v0.8.0. The external Lucid CLI wire format for cross-agent scoping stays deferred to v0.8.1 per the plan. V2->V3 migration extension: the existing synthesize_default_agent_if_needed helper in crates/zeroclaw-config/src/schema/v2.rs already inserts the default agent's config-side row; the DB-side migration creates the matching agents-table row independently. Both arrive at "there is a default agent" without coordination; runtime resolves the UUID by alias when AgentScopedMemory<M> is constructed in P7. Refs zeroclaw-labs#6272.

…e for zeroclaw-labs#6272 (P7) The trait gains two agent-aware methods, both with default implementations that fall back to existing behavior so backends opt in incrementally without breaking compilation: - store_with_agent(key, content, category, session_id, namespace, importance, agent_id) — extends store_with_metadata with the bound agent's UUID. Backends with native agent_id columns (SqliteMemory, PostgresMemory after P6) override to actually persist the attribution; the default falls back to store_with_metadata so non-aware backends stay correct. - recall_for_agents(allowed_agent_ids, query, limit, session_id, since, until) — narrows recall to a specific allowlist of agent UUIDs. The default falls back to recall(); backends with native columns override to add WHERE agent_id IN (...) at the SQL layer. The wrapper at crates/zeroclaw-memory/src/agent_scoped.rs is the canonical site for agent-identity enforcement: - AgentScopedMemory<M: Memory> holds Arc<M>, the bound agent's UUID, and the resolved allowlist (the set of sibling UUIDs computed from read_memory_from at config load). - Construction always includes the bound agent's UUID in the allowed set so callers do not need to remember to include themselves. - store / store_with_metadata route through store_with_agent with the bound agent's UUID, so the inner backend persists attribution on every write through the wrapper. - recall / recall_for_agents route through recall_for_agents with the wrapper's allowlist, so backends that override the trait method get the SQL-layer filter for free. - recall_for_agents accepts an explicit allowlist from a caller but intersects it with the wrapper's bound allowlist so an over-broad request from a tool cannot sneak past the construction-time policy. - forget routes to inner unchanged with a TODO marker; cross- agent delete protection lands when MemoryEntry plumbs agent_id in the read paths (P7 follow-up that ships alongside the SqliteMemory override of recall_for_agents). Tests cover construction (bound agent always in allowlist, siblings union with self), store/recall round-trip via the wrapper, and the intersection semantics on recall_for_agents with a rogue UUID. Backend-side SQL filtering is the next commit (P7 follow-up). The wrapper exposes name() identical to the inner backend so existing log lines and dashboards keep working; the wrapper's existence becomes visible only through the agent_alias tracing field bound at agent-loop entry (P12). Refs zeroclaw-labs#6272.

…roclaw-labs#6272 (P8) Extends SecurityPolicy to carry a second allowlist for read-only roots alongside the existing read-write allowlist. The multi-agent runtime uses this to translate cross-agent AccessMode grants into filesystem policy: an AccessMode::Read entry on agent A's workspace.access map for agent B becomes a read-only entry for B's workspace path on A's policy; AccessMode::Write and AccessMode::ReadWrite become regular allowed_roots entries. Schema: - SecurityPolicy gains pub allowed_roots_read_only: Vec<PathBuf>. Default empty so existing single-agent installs and every RiskProfileConfig-only call site keep their current semantics. - RiskProfileConfig is unchanged. The read-only roots concept belongs to the per-agent workspace block, not the shared risk profile, so from_risk_profile leaves the new field empty. The multi-agent runtime populates it when it builds a per-agent policy from the workspace.access map (P10/P11 wiring). Methods: - is_under_allowed_root (existing) keeps strict read+write semantics. Write-side tools (file_write, git_operations, shell) call this; the doc comment now spells out the contract. - is_under_read_only_allowed_root (new) checks only the new list. - is_under_any_allowed_root (new) is the union over both lists. Read-side tools (file_read, glob_search, content_search) should call this so a cross-agent AccessMode::Read grant unblocks the read. - The shared root-matching logic moves into a private roots_contain helper to keep the rw and read-only check paths in lockstep. Tests: four new unit tests in policy::tests cover (1) the read-only check matching only its own list, (2) the union behavior of is_under_any_allowed_root, (3) is_under_allowed_root NOT seeing read-only entries (the write-side enforcement guarantee), and (4) from_risk_profile leaving the new field empty. Tools that consume allowed_roots stay on is_under_allowed_root for now; the swap to is_under_any_allowed_root for read paths and the population of allowed_roots_read_only from workspace.access land in the cliff (P4+P9) when the per-agent policy construction wires through the runtime. Refs zeroclaw-labs#6272.

…-labs#6272 (P12) Adds the multi-agent attribution surface to two event streams the runtime emits without disturbing any existing call site: - RuntimeTraceEvent (crates/zeroclaw-runtime/src/observability/runtime_trace.rs) gains an `agent_alias: Option<String>` field with serde-default and skip_if_none. The existing `record_event(...)` function forwards to a new `record_event_with_agent(..., agent_alias, payload)` variant with `agent_alias = None`, so all 25 existing call sites compile unchanged. Sites that bind a per-agent alias post-P10 call the agent-aware variant directly. - AuditEvent (crates/zeroclaw-runtime/src/security/audit.rs) gains the same field plus a builder method `AuditEvent::with_agent_alias` so existing construction continues to work via Default-derived field initialization plus opt-in agent attribution. The `AuditEvent::new` constructor seeds the new field as None. Audit storage stays at <install>/audit/ globally; an agent delete does NOT remove its prior audit trail (per the locked plan). The new field lets queries reconstruct per-agent activity from a global trail after the fact. Console formatter prefix [<alias>] and otel/dora/prometheus alias labels are deferred. They wire into agent-loop entry binding sites (P10) so the alias has a value to plumb; this commit ships only the schema-side surface so P10 can populate without churning serialization formats again. Refs zeroclaw-labs#6272.

…#6272 (P11) Adds the cross-channel self-loop guard the multi-agent runtime needs: a bot must never respond to its own messages, even when a misconfigured peer group lists the bot's own handle as an external peer or when the same channel binding round-trips an outbound back through the inbound queue. Two trait additions on `zeroclaw_api::channel::Channel`: - `fn self_handle(&self) -> Option<String>`: each channel impl exposes its own bot handle (e.g. `@my_bot` for Telegram, the bot's user ID for Discord) when known. Default returns `None`, so adding the guard does not break any existing channel impl. Channels override as their identity becomes available at runtime. - `fn drop_self_messages(&self, msg: &ChannelMessage) -> bool`: default implementation does a case-insensitive comparison of `msg.sender` against `self_handle()`, normalising leading `@` so Telegram-style handles match regardless of which form the SDK delivers. Channels with non-string identity (numeric Discord IDs, Matrix MXIDs) get the same shape because the comparison is string-based and stable. Wired into the orchestrator's inbound path (`crates/zeroclaw-channels/src/orchestrator/mod.rs`, `process_channel_message`) right after `target_channel` resolution and before any downstream processing: if the channel reports the inbound is self-authored, we drop it with a debug-level trace and return. Two-layer defense: a future agent-loop fallback (P11 follow-up) compares against the agent's own outbound queue, so channels that have not yet implemented `self_handle` still get caught at a second layer. Tests: four new unit tests on the trait default cover (1) `None` handle returns false (no guard fires on un-identified channels), (2) exact handle match, (3) `@` prefix and case-insensitive normalisation, and (4) an empty/`@`-only handle does not match every inbound (guard only fires on real handles). Channel implementations stay opt-in. Override `self_handle` per channel as the platform's authentication path exposes the bot's identity. The orchestrator-side check is the single point where the guard fires, so a missed channel implementation degrades gracefully (no guard at SDK layer, fallback at agent loop). Refs zeroclaw-labs#6272.

…law-labs#6272 (P10 prep) Adds the subset validator the SubAgent spawn path will call to reject any override that escalates beyond the parent agent's permissions. Pure additive on SecurityPolicy. Subset rules (a child policy is allowed iff all hold against the parent): - allowed_roots: every entry on child must appear on parent's allowed_roots (no widening of read+write scope). - allowed_roots_read_only: every entry on child must appear on parent's allowed_roots OR parent's allowed_roots_read_only. A SubAgent can downgrade a parent's rw root to read-only on itself; it cannot fabricate read access to a path the parent could not even read. - allowed_commands: every entry on child must appear on parent's allowed_commands. - workspace_only: child must be true whenever parent is true. A SubAgent cannot disable workspace_only that the parent enforces. - max_actions_per_hour: child <= parent. - max_cost_per_day_cents: child <= parent. EscalationViolation enum names each violation kind so callers can produce precise errors. impl Display + Error so it integrates with the existing anyhow::Error chains in the runtime. Tests: 9 unit tests cover the accept paths (identical, narrowed, rw-root-downgraded-to-read-only) and the reject paths for every EscalationViolation variant. This is the foundation for the SubAgent spawn validator that lands in P10. The SubAgent runtime will: 1. Build a candidate child policy from override fields. 2. Call parent.ensure_no_escalation_beyond is INVERTED — actually call child.ensure_no_escalation_beyond(&parent). 3. Reject the spawn on Err with the violation chained for user-facing diagnostics. Refs zeroclaw-labs#6272.

Type-level scaffolding for runtime-spawned ephemeral sub-agents that inherit a parent agent's identity, security policy, and memory allowlist by default and may only narrow via explicit overrides. Module surface in crates/zeroclaw-runtime/src/subagent/mod.rs: - SubAgentOverrides: optional policy + allowed_agent_ids narrowing (None on every field = inherit parent verbatim). - SubAgentContext: bound parent agent_id, validated child policy (Arc<SecurityPolicy>), resolved memory allowlist. - SubAgentSpawn::build(overrides): runs the inheritance validator against the parent. Policy overrides flow through SecurityPolicy::ensure_no_escalation_beyond from P10-prep, with the EscalationViolation chained via anyhow so callers surface the precise rule that fired. Allowlist overrides reject any UUID not on the parent's allowlist; the parent's bound agent_id is always re-included so a SubAgent can always recall its own memories. Five unit tests cover the contract: - default_overrides_inherit_parent_verbatim - policy_override_that_is_subset_is_accepted_and_narrows - policy_override_that_escalates_is_rejected_with_violation_chained - allowlist_override_subset_is_accepted_and_always_includes_self - allowlist_override_with_rogue_uuid_is_rejected Also fills in agent_alias: None at the two RuntimeTraceEvent test construction sites that pre-dated the P12 alias field on the struct. P10b (cron JobType::Agent dispatch routes through SubAgentSpawn) and P10c (spawn_subagent agent-loop tool) follow once the cliff lands.

…claw-labs#6272 P10b) The cron scheduler's JobType::Agent dispatch now constructs the run as a SubAgent of the owning agent rather than as an ad-hoc agent invocation. Cron is one of two SubAgent spawn sites in v0.8.0; the other is the spawn_subagent agent-loop tool (P10c). Both funnel through SubAgentSpawn::build so permission inheritance, tracing span shape, and audit attribution stay uniform across spawn sites. What changed in run_agent_job: - Build SubAgentSpawn::for_agent(config, agent_alias).build(default overrides) before the security pre-flight. Spawn failures (no such agent, security-policy resolution failure) short-circuit the run with an explicit subagent-spawn error so cron logs distinguish inheritance failures from security blocks. - Wrap the agent::run call in a tracing span with parent_alias / run_id / spawn_site = "cron" fields. The run_id is the existing cron run-session UUID, so memory snapshots and span events correlate without bookkeeping. P12's structured-label emitters pick the parent_alias up automatically. Default SubAgentOverrides means "inherit verbatim" — the cron job always runs with the owning agent's policy and memory allowlist. Cron has no UI for narrowing today; if a future revision wants per-job narrowing, it constructs a non-default SubAgentOverrides and lets the build-time validator reject any escalation. New SubAgentSpawn::for_agent constructor in subagent/mod.rs resolves the parent identity from a Config + alias: the agent's [agents.<alias>.workspace.read_memory_from] becomes the parent's allowlist (the bound alias is always re-included), and the policy is SecurityPolicy::for_agent. Two new tests cover the resolution path and the unknown-alias error case, bringing the SubAgent test count to seven. P10c (the agent-loop spawn_subagent tool) follows in a separate commit; it uses the same for_agent constructor with caller-supplied overrides.

The second SubAgent spawn site lands as an always-on agent-loop tool that lets a parent agent fork a focused subtask under its own identity. Cron's JobType::Agent dispatch (P10b) was the first spawn site; both funnel through SubAgentSpawn::build so permission inheritance, tracing-span shape, and audit attribution stay uniform. Tool surface: - name: spawn_subagent - args: { prompt: string } - behavior: validate the spawn via SubAgentSpawn::for_agent against the parent's identity, build a SubAgentContext under default (inherit-verbatim) overrides, run the agent loop on the supplied prompt under the parent's alias inside a tracing::info_span!("subagent", parent_alias, run_id, spawn_site = "tool"), and return the response. - failures: spawn-validator failures (unknown alias, security-policy resolution) and agent-run failures both come back as structured ToolResult { success: false, error: ... } rather than panics, so the agent loop sees them as recoverable tool errors. The narrowing-override path (sub-agents that drop privileges below the parent's) is deferred to v0.8.1 along with the [agents.<alias>].subagent_* config block. The spawn validator already supports it via SubAgentOverrides — adding the surface later is purely additive. Wiring in tools/mod.rs: - pub mod spawn_subagent; (alphabetical, between sop_status and verifiable_intent). - pub use SpawnSubagentTool re-export. - Always-on registration in all_tools_with_runtime, next to ScheduleTool. - Listed in BUILTIN_TOOL_INTEGRATIONS so the integrations panel surfaces it. Recursion bounding is left to existing per-run guardrails: each SubAgent run is a full agent loop, capped by the runtime profile's max_iterations and the SecurityPolicy action/cost budgets. A future revision can add explicit depth tracking via a tokio task-local if operational data shows the soft caps are insufficient. Four unit tests: - tool_name_and_schema_are_well_formed - missing_prompt_is_rejected - empty_prompt_is_rejected - unknown_parent_alias_surfaces_spawn_failure (verifies the spawn validator's Err is structured into a ToolResult, never panics or attempts a recursive run) Live agent-loop integration is exercised by the existing JobType::Agent end-to-end paths (P10b) — both spawn sites share the same downstream agent::run wiring so the cron tests cover the shared post-spawn flow.

…es (zeroclaw-labs#6272) The WorkspaceTool is the agent-callable wrapper around WorkspaceManager that lets a model switch between multi-workspace profiles at runtime (active_workspace, workspaces_dir, etc.). Per-agent workspaces under [agents.<alias>.workspace] obsolete the entire multi-workspace-profile primitive: each agent has its own jailed workspace inherently, with no need for a tool to switch between them. Deleted: - crates/zeroclaw-tools/src/workspace_tool.rs (the tool body + tests) - src/tools/workspace_tool.rs (orphan re-export shim, never declared by src/tools/mod.rs and so never compiled — pure dead bytes on disk) - pub mod workspace_tool; in crates/zeroclaw-tools/src/lib.rs - pub use ::WorkspaceTool re-export in tools/mod.rs - The conditional registration block in all_tools_with_runtime gated on root_config.workspace.enabled (16 LoC of "build a WorkspaceManager from a path string and wrap it in a tool") Net -373 LoC. The legacy [workspace] block parsing it consulted is retired separately in the same PR; this commit removes the leaf consumer first so the root deletion has nothing pointing at it.

…icy::for_agent (zeroclaw-labs#6272) WorkspaceBoundary + BoundaryVerdict were the per-tool/per-domain/per-path gate that consulted the active multi-workspace profile to deny tool access outside the profile's allowlists. With per-agent SecurityPolicy construction (SecurityPolicy::for_agent in crates/zeroclaw-config/src/policy.rs), the same enforcement happens at the policy layer for every tool that already consults the policy — there is no need for a parallel boundary type. The module was already dead before this commit: the only #[allow(unused_imports)] pub use in security/mod.rs had zero external callers. Deleting the file removes 211 LoC including its 7 tests, plus the module declaration and the re-export. Net -213 LoC. The module's last live consumer (the WorkspaceTool that constructed it) was deleted in the previous commit.

…abs#6272) WorkspaceManager is the multi-workspace-profile primitive: list, create, switch, export. Its sole production consumer was the WorkspaceTool deleted in the previous commit; the tests in this module exercise the manager directly and have no value once the manager itself is gone. Deleted: - crates/zeroclaw-config/src/workspace.rs (WorkspaceManager, WorkspaceProfile, all helpers, all tests) - src/config/workspace.rs (re-export shim) - pub mod workspace; in both lib.rs files Net -384 LoC. The legacy [workspace] block on Config still references WorkspaceConfig (a struct of multi-workspace-profile flags); that struct + its onboarding flow + the active_workspace.toml marker machinery come out in the next commit.

…ker (zeroclaw-labs#6272) Last leg of the multi-workspace-profile retirement: the [workspace] config block, the on-disk active_workspace.toml marker mechanism, and the entire onboarding section that drives them. The previous three commits killed the consumers (WorkspaceTool, WorkspaceBoundary, WorkspaceManager); this one removes the schema field and the config-resolution chain step that fed them. Schema deletions in crates/zeroclaw-config/src/schema.rs: - WorkspaceConfig struct (enabled / active_workspace / workspaces_dir / isolate_memory / isolate_secrets / isolate_audit / cross_workspace_search) + Default impl + default_workspaces_dir helper - pub workspace: WorkspaceConfig field on Config - 3 workspace: WorkspaceConfig::default() construction sites in the three Config::default() / fixture paths - ACTIVE_WORKSPACE_STATE_FILE const + ActiveWorkspaceState struct + active_workspace_state_path / load_persisted_workspace_dirs / persist_active_workspace_config_dir / _in helpers (~125 LoC of marker-write/load plumbing) - ConfigResolutionSource::ActiveWorkspaceMarker variant + the load_persisted_workspace_dirs branch in resolve_runtime_config_dirs (the env-var resolver chain is now ZEROCLAW_CONFIG_DIR -> ZEROCLAW_WORKSPACE -> default, with no marker step) - KnowledgeConfig.cross_workspace_search field + default (knowledge graph's cross-workspace search axis is meaningless under a single-workspace install) - Three tests of the marker mechanism: resolve_runtime_config_dirs _uses_active_workspace_marker, load_or_init_uses_persisted_active _workspace_marker, persist_active_workspace_marker_is_cleared_for _default_config_dir; trimmed marker scenery from resolve_runtime_config_dirs_uses_env_config_dir_first since the test's actual claim (env wins) doesn't need the marker as a foil Onboard deletions in crates/zeroclaw-runtime/src/onboard/mod.rs: - Section::Workspace enum variant + the as_path_prefix and from_path match arms - The workspace() async section walker - Section::Workspace dispatch arms in run() and the 0=> arm in run_all (subsequent indices renumbered) - The cfg.workspace.enabled arm in section_has_signal - section_has_signal_workspace_tracks_enabled_flag test (testing a retired field) and workspace_double_run_is_idempotent_on_disk test (testing the retired section's flow); other helper-validation tests switched from "workspace" to "memory" since they exercise generic helpers (mark_completed, skip_if_configured) Binary-side deletions in src/main.rs: - OnboardSection::Workspace enum variant - workspace_only CLI flag (under hide=true) + its threading through the Onboard command handler - resolve_onboard_target's 7th parameter + 3 unit tests adjusted for the new arity - Removed src/config/mod.rs's pub use of the deleted WorkspaceConfig Net -571 LoC vs +39 LoC shim/comment updates. Combined with the prior three commits in this PR (WorkspaceTool, WorkspaceBoundary, WorkspaceManager) the multi-workspace-profile primitive is gone in its entirety. Per-agent workspaces under [agents.<alias>.workspace] serve the isolation use case the legacy block was reaching for, with type-level enforcement instead of TOML-flag inertia.

…abs#6272) Per-agent memory under [agents.<alias>.memory] supersedes the string-namespace primitive. Killing the parallel structure in one sweep: Schema deletions in crates/zeroclaw-config/src/schema.rs: - MemoryNamespaceConfig struct (namespace / backend / retention_days / read_only / pinned_categories) - pub memory_namespaces: HashMap<String, MemoryNamespaceConfig> field on Config - pub memory_namespace: String field on AliasedAgentConfig - 3 default-construction sites for memory_namespaces and 3 for the per-agent memory_namespace field - The "memory-namespaces" / "memory-namespace" entry in the cross-reference dangling-alias validator (the validator now only enforces risk-profile / runtime-profile references, which is what remains on AliasedAgentConfig) V2->V3 migration deletions in crates/zeroclaw-config/src/schema/v2.rs: - T14e widening block and the ensure_memory_namespace synthesis helper. The new V2->V3 path drops the V2 memory_namespace key off agent tables silently. Doc comment updated. Test deletions in crates/zeroclaw-config/tests/migration.rs: - t14e_memory_namespace_widening (the field it asserted on no longer exists) Memory crate deletions: - crates/zeroclaw-memory/src/namespaced.rs in its entirety (232 LoC, NamespacedMemory<M> wrapper + 6 tests) - pub mod namespaced; + pub use namespaced::NamespacedMemory in crates/zeroclaw-memory/src/lib.rs DelegateTool deletions in crates/zeroclaw-runtime/src/tools/delegate.rs: - memory_namespaces: Arc<HashMap<String, MemoryNamespaceConfig>> field on the struct - 4 default-construction sites for that field (new / new_with_options / with_depth / with_depth_and_options) and 2 Arc::clone sites in the tokio::spawn background paths - with_memory_namespaces builder method - resolve_memory_ns helper (the namespace-alias-to-string resolver) - get_agent_memory helper (was already #[allow(dead_code)] WIP — gone along with NamespacedMemory; the struct's `memory: Option<...>` field stays for the future per-agent plumbing) - MemoryNamespaceConfig and NamespacedMemory imports Caller deletions: - crates/zeroclaw-runtime/src/tools/mod.rs: .with_memory_namespaces(...) builder call on the DelegateTool registration - crates/zeroclaw-runtime/src/onboard/mod.rs: the memory_namespace agent-form prompt (step 9 in the per-agent walk) and the memory_aliases lookup - crates/zeroclaw-gateway/src/api_onboard.rs: memory_namespaces field on AgentOptionsResponse, the get_map_keys("memory_namespaces") feed, the "workspace" entries in the section-help / section-group / picker routing (now consistent with the legacy [workspace] retirement that landed earlier in this PR), and a fixture-test count update Net -404 LoC vs +23 LoC. The 23 lines added are the two-paragraph T14e doc-comment rewrite explaining the V2->V3 migration drop, and short comments where lookups got shorter. Per-agent memory backend selection lives at [agents.<alias>.memory] (MemoryBackendKind enum, immutable after agent creation), and cross-agent memory access flows through [agents.<alias>.workspace.read_memory_from] + AgentScopedMemory<M> (landed earlier in this PR). String-tagged namespaces no longer have a place in the architecture.

…sion methods (zeroclaw-labs#6272) The wrapper and the two `Memory` trait extension methods landed earlier in this branch as scaffolding for a per-agent memory plumbing that does not actually plug into the runtime in v0.8.0: - `Agent::from_config` still hands a raw `Arc<dyn Memory>` to the agent loop; nothing constructs an `AgentScopedMemory<M>` on a live code path. - The trait methods `Memory::store_with_agent` and `Memory::recall_for_agents` shipped with default forwarders that silently dropped the `agent_id` parameter. No backend (Sqlite, Postgres, Lucid, Markdown, Qdrant, None) overrode them, so the agent_id was unused at every layer (`_agent_id: Option<&str>` in the default impl is exactly the kind of suppress-the-warning scaffolding the project rule against `_`-prefixed unused params forbids). Pulling the scaffolding out of v0.8.0 and leaving the schema + storage foundation (which IS load-bearing) in place: - Deleted `crates/zeroclaw-memory/src/agent_scoped.rs` (the wrapper + 4 unit tests). - Deleted `pub mod agent_scoped;` in the memory lib.rs. - Deleted `Memory::store_with_agent` and `Memory::recall_for_agents` from `crates/zeroclaw-api/src/memory_traits.rs` along with their default impls and doc comments. - Updated the doc comments in `sqlite.rs`, `multi_agent.rs`, and `schema.rs` that referenced the wrapper to point at v0.8.1 as the landing target for the per-agent memory plumbing instead. What stays (and is still load-bearing): - The `agents` table on SQLite + Postgres with the synthesized `default` agent row (commits 7e19c44 and 6ff29e4). - The nullable `agent_id TEXT` column on `memories`, backfilled to the default agent's UUID, with its index. Existing rows are attributable; new rows from this branch are still agent-id-NULL because no caller stamps it yet, which is the same state the migration left them in. - The `[agents.<alias>.workspace.read_memory_from]` schema field + the cross-reference validator that rejects self/dangling/cross- backend entries at config load. Configs authored against the v0.8.0 schema stay valid when v0.8.1 lands the runtime consumer. Net -450 LoC vs +15 LoC doc-comment rewrites. The architectural target (per-agent memory backends keyed off the agent's identity, with the allowlist intersected on read) ships in v0.8.1 alongside the `Agent::from_config` restructure that consumes a per-agent memory backend; this commit makes v0.8.0 honest about what's actually running.

…claw-labs#6272) Adds two abstract methods on the Memory trait — `store_with_agent` and `recall_for_agents` — and implements them explicitly on every backend in the workspace. No defaults: each backend gets a real implementation so the agent_id parameter is never silently dropped at the trait boundary, closing the dead-default footgun the previous attempt fell into. The trait extension in `crates/zeroclaw-api/src/memory_traits.rs`: - `store_with_agent(key, content, category, session_id, namespace, importance, agent_id)` — required. Persists with explicit agent attribution. - `recall_for_agents(allowed_agent_ids, query, limit, session_id, since, until)` — required. Filters results to the supplied set of agent UUIDs (plus legacy NULL-agent rows). Empty allowlist means no filter (callers that want unscoped recall stay on `recall`). Per-backend implementations: - **SqliteMemory** writes the `agent_id` column on the existing INSERT (column was added in P6a as nullable + indexed). `recall_for_agents` over-fetches via the existing hybrid recall, then filters by a single-round-trip indexed lookup on the candidate row ids. Pushes the SQL primary filter the way the column was designed for; the small over-fetch is the v0.8.0 pragmatic shape. - **PostgresMemory** mirrors SqliteMemory: column written on INSERT with ON CONFLICT update, post-recall filter via `id = ANY($1)` query. - **LucidMemory** composes SqliteMemory + remote daemon. Writes attribution to the local SQLite mirror (the daemon has no agent_id concept in v0.8.0); recall delegates to the SQLite leg's `recall_for_agents` so the cross-agent allowlist is enforced locally. Documented in the impl comments. - **QdrantMemory** adds `agent_id` to `MemoryPayload` (skip-if-none serde) so existing rows stay shape-compatible. Store includes the agent id in the upsert payload; recall_for_agents over-fetches and uses a scroll/has_id query to fetch payloads for the candidate ids, then post-filters. Pushing the agent_id filter into the vector search call itself is a v0.8.1 optimization. - **MarkdownMemory** ignores agent_id at the row level: per-agent attribution is the on-disk path (`<install>/agents/<alias>/workspace/MEMORY.md`), set by the per-agent factory. Cross-agent recall is composed at the wrapper layer (`AgentScopedMarkdownMemory`, landing next) which holds an own MarkdownMemory plus a peer set; this trait impl is the single-instance leaf. - **NoneMemory** is a trivial no-op for both methods — the disabled backend keeps the runtime wiring stable without persisting anything regardless of agent attribution. Test mocks (`QueryEchoMemory`, two `MockMemory` flavors, `MockMemoryWithEntries`, `NoopMemory`, `RecallMemory`, `TrackingMemory`) get parallel forward-or-noop stubs so the workspace builds and tests green. Each is contextually appropriate: trackers track, recallers recall, noops noop. Wrapper updates: `AuditedMemory<M>` adds matching `store_with_agent` / `recall_for_agents` methods that log to the audit trail and forward to the inner backend. Net +783 LoC. The new Memory trait surface is the foundation for `AgentScopedMemory<M>` (next commit) which holds a bound agent_id + allowlist and routes every store/recall through these methods so the agent_id is stamped/enforced at the runtime boundary too.

…aw-labs#6272) The runtime memory wrapper that sits between agent-loop callers and the per-agent backend instance. Holds the bound agent's UUID + the resolved cross-agent allowlist (own UUID + `read_memory_from` entries), and routes every operation through the backend's new agent-aware trait methods so the agent_id is enforced end-to-end. In `crates/zeroclaw-memory/src/agent_scoped.rs`: - `AgentScopedMemory<M: Memory>` holds `Arc<M>` + bound `agent_id` + `allowed_agent_ids: HashSet<String>`. `new(inner, agent_id, allowed_sibling_agent_ids)` always inserts the bound agent into the allowlist so callers don't have to remember themselves. - `store` / `store_with_metadata` route through the inner backend's `store_with_agent` with the bound agent_id always stamped. - `store_with_agent` overrides any caller-supplied agent_id to the bound agent_id. The wrapper's contract is one-agent-one-attribution; if a caller wants different attribution, they construct a different wrapper. - `recall` calls the inner's `recall_for_agents` with the bound allowlist. - `recall_for_agents` intersects the caller-supplied allowlist with the bound allowlist. A non-empty caller allowlist whose intersection with the bound is empty returns `Ok(Vec::new())` directly — the empty-allowlist sentinel ("no filter") on the inner backend is NOT used for that case, so a caller cannot widen scope past what the agent's config permits. - `get` / `list` / `forget` / `count` / `purge_*` / `reindex` / `store_procedural` / `recall_namespaced` / `export` all forward to the inner backend. Trait surface that does not yet expose an agent-scoped form (get/list) stays pass-through; v0.8.1 follow-up adds those variants. Re-exported as `zeroclaw_memory::AgentScopedMemory`. Four unit tests using SqliteMemory as the inner backend: - `store_routes_through_store_with_agent_and_persists_attribution`: rows stored via the wrapper come back on a subsequent recall. - `recall_excludes_other_agent_rows_when_allowlist_omits_them`: a row pre-seeded with a different agent_id does NOT surface through a wrapper whose allowlist excludes that agent. - `recall_includes_allowlisted_sibling_rows`: a row pre-seeded with the sibling's agent_id DOES surface when the wrapper's allowlist includes that sibling. - `recall_for_agents_intersects_caller_allowlist_with_bound_allowlist`: a caller asking for a rogue UUID outside the bound allowlist gets zero rogue-attributed rows back. Cross-backend allowlist entries are rejected at config-load by the P3 validator; the wrapper therefore only ever sees same-backend sibling UUIDs in `allowed_agent_ids`. The Markdown variant (`AgentScopedMarkdownMemory`) lands in the next commit — its model is "compose own + peer MarkdownMemory instances and union with attribution," not the row-filter shape this generic wrapper uses.

…zeroclaw-labs#6272) The Markdown-shaped sibling to AgentScopedMemory<M>. Markdown has no shared store — each agent's attribution IS its on-disk path (<install>/agents/<alias>/workspace/MEMORY.md plus memory/YYYY-MM-DD.md), so cross-agent recall composes multiple MarkdownMemory instances rather than filtering rows. Module crates/zeroclaw-memory/src/agent_scoped_markdown.rs: - MarkdownPeer { alias, memory } — resolved sibling: alias plus a MarkdownMemory pointing at that sibling's workspace dir. - AgentScopedMarkdownMemory { own_alias, own, peers } — wrapper: - store / store_with_metadata / store_with_agent: write only to the bound agent's own MarkdownMemory. The agent_id parameter on store_with_agent is intentionally ignored (path-based attribution is the model — the bound dir IS the attribution). - recall: union across own + every peer, attributing each row by prefixing its key with [<alias>] so the merged output is self-describing without changing the trait surface or MemoryEntry shape. - recall_for_agents: filter the union to the caller-supplied alias set. Treats the trait's `allowed_agent_ids: &[&str]` as opaque identifiers since Markdown does not have a UUID indirection — the runtime factory passes aliases for Markdown agents and UUIDs for SQL agents. - get / list / forget / count: forward to own (these don't yet have an agent-scoped form on the trait). - health_check: own's signal only; missing peer dirs are logged at recall time, not surfaced as unhealthy (a missing peer means the operator hasn't created that sibling yet — current agent is fine). Three unit tests: - store_writes_only_to_own_backend - recall_unions_own_and_peer_rows_with_attribution - recall_for_agents_filters_to_alias_intersection Re-exported as zeroclaw_memory::AgentScopedMarkdownMemory and zeroclaw_memory::MarkdownPeer. The runtime factory that builds either AgentScopedMemory<M> (for Sqlite/Postgres/Lucid/Qdrant agents) or AgentScopedMarkdownMemory (for Markdown agents) lands in the next commit, alongside the Agent::from_config + cron + DelegateTool wiring that consumes it.

…from_config + cron (zeroclaw-labs#6272) The runtime entry point that builds each agent's `Memory` instance. Previously every code path hand-rolled `create_memory(...)` against the install-wide `config.memory`; now `create_memory_for_agent` returns an `AgentScopedMemory` (or `AgentScopedMarkdownMemory`) keyed on the agent's resolved identity and `read_memory_from` allowlist. In `crates/zeroclaw-memory/src/lib.rs`: - `agent_workspace_dir(config, alias)` — resolves the per-agent workspace dir from `[agents.<alias>.workspace.path]` if set, else derives `<install>/agents/<alias>/workspace/` from `config.config_path.parent()`. Stable across the v0.8.0 filesystem migration since it keys off the install root, not the (still- legacy-shaped) `config.workspace_dir`. - `create_memory_for_agent(config, alias, api_key) -> Arc<dyn Memory>` — top-level factory: - Markdown agents: build the bound MarkdownMemory + a peer `MarkdownPeer` per `read_memory_from` entry; wrap with `AgentScopedMarkdownMemory`. - None agents: pass through `Arc<NoneMemory>`. - Sqlite/Postgres/Lucid/Qdrant agents: build the install-wide inner backend via the existing `create_memory_with_storage_and_routes` factory; resolve the bound agent's identifier and the allowlist identifiers via the new `Memory::ensure_agent_uuid` trait method (SQL backends look up agents-table UUIDs; Qdrant/None use the alias verbatim); wrap with `AgentScopedMemory`. `AgentScopedMemory` itself is now non-generic — it holds `Arc<dyn Memory>` instead of `Arc<M>`. The previous generic was never used at multiple types (every call site erased to `dyn Memory`), and the non-generic shape lets the per-agent factory hand back a single concrete type regardless of the agent's chosen backend. New trait method `Memory::ensure_agent_uuid(alias) -> Result<String>`: - SqliteMemory + PostgresMemory + LucidMemory override to insert-or-fetch the agents-table row for `alias` and return its UUID (the existing `ensure_default_agent_uuid` is now a thin wrapper around the same per-alias helper). - AuditedMemory + AgentScopedMemory forward. - Default impl returns the alias verbatim — correct for Markdown, Qdrant, None, which have no UUID indirection at the storage layer. PostgresMemory now stores `qualified_agents` alongside `qualified_table` so `ensure_agent_uuid` can build the `<schema>.agents` reference at call time. Wiring (replaces install-wide `create_memory` calls with `create_memory_for_agent`): - `crates/zeroclaw-runtime/src/agent/agent.rs`: `Agent::from_config_with_session_cwd_and_mcp`. - `crates/zeroclaw-runtime/src/agent/loop_.rs`: both `run` entry points (interactive loop + non-interactive single-shot). - `crates/zeroclaw-runtime/src/cron/scheduler.rs`: cron's pre-prompt memory recall (line ~335) and the post-failure session-purge cleanup (line ~427) — both now key off the cron-owning agent's alias so a Markdown-backed agent's cron job recalls from its own dir, a SQLite-backed agent's cron job filters by its agent_id, etc. The same-backend invariant on `read_memory_from` is enforced at config load (P3); this commit therefore never has to reconcile mixed-backend allowlists at runtime. End-to-end now: an agent loop that calls `mem.recall(...)` goes through the wrapper, which calls the inner backend's `recall_for_agents` with the resolved allowlist, which (for SQL) filters via WHERE agent_id IN (...) plus the legacy NULL case. A `mem.store(...)` goes through `store_with_agent` with the bound agent's UUID — every persisted row is attributable to one agent.

Removes the dedicated management CLI (Commands::Agents enum variant, AgentsCommands subcommands, handle_agents_command dispatcher, plus the agents_create / agents_delete / agents_list helpers). Operators add and remove agents in this PR by editing [agents.<alias>] blocks directly; the runtime creates the per-agent workspace dir and seeds bootstrap identity files on first agent-loop entry. The dedicated CLI lands with the v0.8.1 session registry that gates active-session refusal on delete. Docs updated: setup walkthrough now uses the config-edit path; the architecture page lists the management CLI under v0.8.1. Net -251 LoC in src/main.rs, -33 LoC in docs.

Six items the previous commits punted that the issue body called out as required: - SubAgent escalation validator: SecurityPolicy::ensure_no_escalation_beyond back with EscalationViolation enum + Display + Error. SubAgentOverrides surface restored on the SubAgent runtime; SubAgentSpawn::build now takes overrides and runs the subset check. 9 validator tests + 5 builder tests. - SQLite agent_id NOT NULL REFERENCES agents(id): table-rebuild pattern inside migrate_multi_agent (drop FTS, copy rows into memories_new with the constraint, swap, recreate indices/FTS, rebuild FTS content). The bare store + store_with_metadata paths now route through store_with_agent; store_with_agent COALESCEs the agent_id parameter to the default agent's UUID so callers without an agent context still satisfy the FK. - Postgres agent_id NOT NULL REFERENCES + DO-block FK creation guarded by pg_constraint lookup so re-runs are idempotent. - schema_version metadata table on both backends, stamped at the end of each successful migration. - send_message_to_peer agent-loop tool. Validates the target via the new ResolvedPeers::is_known_peer (strict outbound check, distinct from allows_inbound's default-accept inbound semantics) plus a per-agent channel-listener guard. Dispatches via cron::scheduler::deliver_announcement to keep the runtime → channels dependency direction. - Agent-loop self-loop guard fallback. peers::should_drop_self_loop is called from process_channel_message after the SDK-side Channel::drop_self_messages returns false; both layers use identical normalization. - Hermetic peer-group E2E in tests/system/multi_agent_e2e.rs that walks the full authorization surface: resolver admit/reject for peer agents and external peers, plus tool-level rejection of non-peer targets. Tests existing in agent_scoped now provision real agent rows via ensure_agent_uuid before attributing memories — required by the FK.

zeroclaw_runtime::agents ships create_agent / delete_agent / list_agents as the runtime-layer capability that future operator surfaces (CLI, web admin, gateway endpoint) call. Each function keeps the on-disk shape consistent across every entry point: write the [agents.<alias>] config block, create the per-agent workspace dir, seed bootstrap identity files, and atomically save the config (or strip the block, remove the dir, and rewrite peer-group memberships in one save). zeroclaw_runtime::agents::session_registry is the process-global RAII gate that delete_agent consults: register_session(alias) returns a SessionGuard whose Drop decrements the per-alias counter; delete bails on active_sessions_for(alias) > 0 unless force_active_sessions=true. 13 new unit tests cover create roundtrip, duplicate-alias refusal, unknown-risk-profile refusal, delete with peer-group strip, dry-run no-op, unknown-alias refusal, active-session refusal + cycle, force override, sorted list summaries, and the registry's RAII semantics.

WareWolf-MoonWall

Read through the file list and sampled the key architectural additions: Memory::store_with_agent, Memory::recall_for_agents, Memory::ensure_agent_uuid, Channel::drop_self_messages, the per-agent workspace config in zeroclaw-config, AgentScopedMemory, the new agents/ module and session registry in the runtime, and the config schema extensions. Directional notes follow — understood this is not going to master and a follow-up "make it actually work" PR is planned.

Memory trait extension: store_with_agent and recall_for_agents are required trait methods, which is a breaking change for every external Memory implementor. The ensure_agent_uuid default (returns alias verbatim) softens the impact for backends without UUID indirection. Since this lands in the v0.8.0 breaking-change window that's the right place for it — worth being explicit in the final PR description that downstream implementors must stub at minimum store_with_agent and recall_for_agents.

drop_self_messages default implementation: The @-strip and case-fold normalization is correct and the edge-case test (empty handle after stripping @ must not match every sender) is the right guard. Numeric IDs (Discord snowflake, Matrix event ID) pass through trim_start_matches('@') unchanged — the comment correctly notes they are already as_str compatible. No action needed; confirming the logic holds.

Observability trait removal (Hand* events/metrics): The removal of HandStarted, HandCompleted, HandFailed and the three Hand* metrics narrows the surface before the multi-agent loop adds its own observability. Worth confirming no active observer implementation in zeroclaw-providers or zeroclaw-channels still references these variants before the final merge pass.

Open question for the follow-up PR: Where does agent spawn/teardown lifecycle management live, and how does it interact with the session registry? Specifically: what happens to an agent's AgentScopedMemory if the parent session is cancelled mid-turn?

No formal verdict — directional review per @singlerider's request. Happy to do a full blocking review pass once the follow-up is folded in.

…y Memory method The prior wrapper only intercepted store/recall and passthrough'd get, list, forget, count, purge_namespace, purge_session, recall_namespaced, and export. With the wrapper as the Arc<dyn Memory> the agent-loop tools see, every passthrough was a privilege-escalation surface: an agent could read sibling rows by guessed key, list every install row, delete sibling rows by key, or purge another agent's session. Now MemoryEntry carries optional agent_id (populated by every backend on read), the wrapper post-filters reads by the bound + allowlisted set, refuses cross-agent forgets, scopes purge_session to bound rows, refuses cross-agent purge_namespace, and rejects store_with_agent calls that target a foreign agent_id rather than silently overriding. Tests exercise the read filter, the cross-agent forget refusal, list attribution filtering, foreign agent_id store refusal, purge_namespace refusal, and the bound-only purge_session shape.

…tching The prior validator covered allowed_roots (rw + ro), allowed_commands, workspace_only, max_actions_per_hour, and max_cost_per_day_cents — and used exact PathBuf equality, so a child policy that legitimately narrowed /srv to /srv/app failed validation. Several escalation axes were missing entirely: a child policy with autonomy = Full under a ReadOnly parent was accepted, as were child policies that dropped a parent's forbidden_paths entry, expanded shell_env_passthrough, raised shell_timeout_secs, or flipped block_high_risk_commands or require_approval_for_medium_risk to false. Now the validator checks each of those, uses path containment (canonical plus literal-path fallback) so child narrowings inside parent roots are accepted, and the EscalationViolation enum carries one variant per axis. AutonomyLevel grows PartialOrd/Ord so the comparison is direct. Also drop the stale active_workspace.toml entries from is_runtime_config_path: the marker file was retired with the [workspace] block. Tests cover each new axis on both the rejection and the legitimate- narrowing path.

…discarding them Both spawn sites (cron JobType::Agent dispatch and the spawn_subagent agent-loop tool) constructed a SubAgentContext via SubAgentSpawn::for_agent + build, then handed only ctx.agent_id to the tracing span and dropped the validated policy and allowlist. agent::run rebuilt both surfaces from config, so the validator's subset proof never reached the loop — inherits-verbatim worked by accident, and any future caller-supplied narrowing override would have been silently ignored. Adds AgentRunOverrides { security, memory } to loop_::run; both spawn sites pass Some(subagent_ctx.policy.clone()) so the validated policy takes effect. Memory override is left None for v0.8.0 inherits- verbatim and documented as the slot the v0.8.1 [agents.<alias>].subagent_* config block plumbs into. Existing call sites (interactive launch, heartbeat phase 1/2, scheduled-leak test) pass AgentRunOverrides::default(). Also normalize SpawnSubagentTool's empty-prompt error to the structured ToolResult shape used by every other failure path so the agent loop sees one shape regardless of which step rejected the call.

Channel::self_handle() defaulted to None, so the orchestrator's two- layer self-loop guard (Channel::drop_self_messages SDK side and peers::should_drop_self_loop agent-loop fallback) was dormant for every channel impl that didn't override — both layers consult the same source. Telegram's bot_username cache, IRC's configured nickname, Discord's token-encoded user_id, and Slack's auth.test user_id are each reachable; expose them through self_handle so the guard runs. For Slack, the cache is populated on the inbound listen path so the sync self_handle() call doesn't have to issue an HTTP round-trip. Update the trait doc to be honest about what overriding means: the default leaves both guard layers dormant, so channels handling inbound traffic must override. Outbound-only channels (webhook, gmail-push, voice-call) keep the default; other inbound channels beyond the four listed remain on the default and rely on per-impl filtering.

@beta

is_known_peer / allows_inbound previously normalized only the external- peer side: agent-peer matching used raw set lookup, so is_known_peer( channel, "@beta") rejected a stored alias of "beta" and a config of [agents.Beta] vs an inbound origin of "beta" diverged. Aliases are config map keys with no case enforcement so the chat-channel idiom (@-prefix, mixed case) needs symmetric normalization. resolve_peer_set now stores agent aliases case-folded with @ stripped; is_known_peer / allows_inbound apply identical normalization to the target side. The orphaned doc comment on ResolvedPeers (a method-doc above the wrong method) is moved to its own method. Tighten send_message_to_peer's @-prefix normalization test so the success path actually asserts the peer-set check accepted, not just that the unrelated delivery layer fell through.

…delete DeleteReport gains active_sessions: usize so a dry-run inspection can see whether a real delete would be refused without coupling the inspection to the active-session gate. The destructive path emits a tracing::warn when force_active_sessions=true overrides the refusal — otherwise a scripted operator running with the override leaves no record that an in-flight agent's workspace was ripped out from under the running loop. Also delete the #[allow(dead_code)] _AgentAliasReference hack and restrict the AgentAlias import to the test module where it's actually used.

tidux · 2026-05-10T02:52:05Z

If I'm reading this right, it looks like the SQLite and PostgreSQL backends still have one DB and not DB or table per agent, and only add a new column for which agent is emitting it. Have we considered adding more tables for different purposes? I'm looking at either a new table or a new DB for ACP session replay because it's going to need different schema.

Audacity88 · 2026-05-10T03:16:35Z

Looks directionally correct. A few issues identified in the current state:

Issue closure. The PR currently overstates the #6272 closure. The body says Closes #6272, but several issue-level requirements are deferred or only partly present: the zeroclaw agents management CLI is out of scope, and delete_agent removes the config block, peer-group memberships, and workspace dir while leaving memory rows and the DB agents row to manual cleanup. The fresh-install/default-agent story also looks worth reconciling with the issue text, since the CLI now says V3 has no default agent.

Peer delivery. The peer-message path looks like it proves authorization, not live delivery. SendMessageToPeerTool validates that beta is a peer on telegram.prod, then passes telegram.prod and raw target = "beta" into deliver_announcement. The live channel registry is keyed by base names like telegram, and channel implementations expect platform targets such as chat IDs or channel IDs, not agent aliases. The E2E test is useful, but it is explicitly authorization-only. The next step is either wiring full ChannelRef to the live channel instance plus a peer-to-platform route, or narrowing the claim/test name until that delivery layer exists.

Migration path. The migration/storage path needs another pass before old installs are trustworthy. The filesystem migration moves <install>/workspace/ to <install>/agents/default/workspace/, but Config::load_or_init() still recreates config.workspace_dir = <install>/workspace, and the SQLite memory factory opens config.workspace_dir/memory/brain.db. That can strand an existing SQLite DB under the moved default-agent workspace while the runtime opens a fresh empty DB at the recreated legacy path. The migration also runs before ZEROCLAW_CONFIG_DIR / ZEROCLAW_WORKSPACE are resolved, so custom installs appear to miss it.

Memory boundary. The memory backends do not all enforce the new attribution boundary the same way yet. Postgres promotes agent_id to NOT NULL, but plain store() still inserts without agent_id. SQL recall and Qdrant recall both over-fetch broadly and then post-filter; Qdrant also keeps rows whose payload lacks agent_id, making older/unscoped points visible to every scoped caller. These filters should preferably happen at the backend query boundary where possible, and legacy/unattributed rows should be deliberately assigned to default or hidden rather than treated as globally visible.

Security surfaces. The access and SubAgent surfaces need a cleanup pass. AccessMode::Write is documented as write-only, but the docs and effective policy shape read more like read-write. The SubAgent/delegation paths also deserve the same gate discipline as older action tools: child runs should consume the parent’s Act/rate-limit budget, and delegated tools should be built from the child/target policy when agent-specific delegation stays in scope. That would make the “inherits but cannot widen” promise much easier to trust.

@Audacity88

Cuts the unwired zeroclaw_runtime::agents lifecycle module + session_registry (the v0.8.1 zeroclaw agents CLI surface that would consume them does not ship in this PR; the runtime module was dead code on this branch) and lands the four substantive fixes from @Audacity88's directional review: - Migration path: Config::load_or_init resolves ZEROCLAW_CONFIG_DIR / ZEROCLAW_WORKSPACE before running the filesystem migration so custom installs migrate; config.workspace_dir now points at the migrated default- agent workspace so legacy install-wide callers (cost::CostTracker, sop, skills, plugins, memory CLI) read the live agent dir instead of an orphaned legacy path. - Memory boundary: PostgresMemory::store routes through store_with_agent (COALESCE to default agent UUID); SqliteMemory + PostgresMemory recall_for_agents push the agent_id filter into the query layer (WHERE agent_id IN / ANY) and drop the post-fetch attribution lookup that let legacy NULL-agent_id rows leak to scoped callers; QdrantMemory recall_for_agents uses a payload `must` filter on agent_id and store_with_agent attributes None to "default". - Peer delivery (live, not authorization-only): SendMessageToPeerTool resolves agent-alias targets to in-process delivery via agent::loop_::process_message (bot identity is shared across agents on the same channel, so an outbound channel send would loop right back inbound); external peers continue through the channel registry's delivery handler. - AccessMode::Write semantics: SecurityPolicy gains an allowed_roots_write_only tier so AccessMode::Write actually grants write access without read access; is_resolved_path_readable refuses write-only paths, is_resolved_path_allowed admits them; ensure_no_escalation_beyond validates the write-only tier as a SubAgent subset axis with a WriteOnlyRootNotInParent EscalationViolation variant. SubAgent budget sharing: PerSenderTracker.buckets becomes Arc<Mutex<...>> so SubAgent runs that take a caller-supplied policy override inherit the parent's live action/cost bucket. Spawning a SubAgent no longer bypasses max_actions_per_hour or max_cost_per_day_cents. Tests: 1465 zeroclaw-runtime + 615 zeroclaw-config + 307 zeroclaw-memory green; new coverage for write-only enforcement, SubAgent budget inheritance under override, and updated for_agent allowlist tier routing.

The previous "address Audacity directional review" commit landed the new field surfaces (write-only allowlist tier, escalation variant, shared PerSenderTracker) but four claimed fixes did not actually reach the call sites the PR body named. This closes them. Qdrant bare Memory::store leaked agent_id: None into the payload. Routes the bare entrypoint through store_with_agent so the existing unwrap_or("default") attaches and the NOT NULL FK / scoped recall must filter both behave correctly. Mirrors the SQLite and Postgres bare- store paths. glob_search and content_search consulted is_resolved_path_allowed (the write-side check that honors allowed_roots_write_only) for what are read operations. A directory granted only via AccessMode::Write would surface through file enumeration / content matching, silently widening the write-only grant into a read grant. Both call sites now use is_resolved_path_readable; tests cover the symlink-into-write-only- root and the absolute-path-under-write-only-root cases. DelegateTool inherited the caller's SecurityPolicy verbatim with no subset validation against the delegated target, and the caller's PerSenderTracker was not surfaced as a deliberate budget-share relationship. Plumbs Arc<Config> via with_root_config; adds policy_for_target which builds the target's SecurityPolicy via SecurityPolicy::for_agent, validates it as a subset of the caller's via ensure_no_escalation_beyond, and assigns the caller's tracker so delegated runs consume from the caller's max_actions_per_hour / max_cost_per_day_cents bucket. execute_sync, execute_background, and execute_parallel now invoke the helper at the delegate boundary; escalating targets surface a structured failure instead of running. Three new tests cover the escalation, tracker-share, and root-config- absent fallback paths. SendMessageToPeerTool: the agent-alias branch is fire-and-forget (tokio::spawn detached), so a "success: true" tool result does not mean the recipient processed the message. Module doc + the success output string now name this explicitly so observers diagnosing missing peer messages read recipient-side spans rather than the sender's tool output.

Last of the four read tools the PR body lists. Mirrors file_read, glob_search, and content_search: a directory granted only via AccessMode::Write would otherwise leak through pdf extraction since is_resolved_path_allowed honors allowed_roots_write_only by design.

singlerider · 2026-05-10T06:16:00Z

Thanks @Audacity88, @WareWolf-MoonWall, @tidux. Walked your findings against the diff and pushed the gaps in 2d3193b + 0185dfe. Bullet-by-bullet, with file:line:

Finding 1, issue closure. The zeroclaw agents CLI was punted in 6b3573b ("chore: punt zeroclaw agents create/delete/list CLI to v0.8.1"). Issue #6272 itself does not list the management CLI as a delivery requirement (the checklist covers schema, migration, memory layer, runtime/security, hands removal, SubAgents, peers, observability, audit, docs, and cross-cutting tests; none mention a CLI subcommand). I've updated the issue body to add the new DelegateTool + read-side-tool items as ticked. Closes #6272 stands on the actual issue scope. The "delete_agent leaves memory rows" note is moot on this branch. There is no delete_agent in source (grep -r 'delete_agent\|remove_agent\|DeleteAgent' is empty after 6b3573b). When the CLI returns in v0.8.1 the active-session-gated delete is what blocks operator surface from the live data layer.

Finding 2, peer delivery. Live in-process delivery is real, not authorization-only. crates/zeroclaw-runtime/src/tools/send_message_to_peer.rs:155-202 branches on target_is_agent: when the normalized target matches a config.agents key, it spawns crate::agent::loop_::process_message(cfg, &recipient_alias, &body, None) at :186. External peers fall through to deliver_announcement at :219. Coverage at tests/system/multi_agent_e2e.rs:281-297 asserts the (in-process) substring in the success output. Caveat I added in 2d3193b: the agent-alias path is fire-and-forget (tokio::spawn detached at :184). The success output now reads "accepted for in-process delivery to peer agent X (recipient runs detached; observe its agent loop for the actual outcome)" so observers know to read recipient-side spans rather than the sender's tool result for the actual outcome. Module doc on the same file makes this explicit.

Finding 3, migration path. Env var ordering: crates/zeroclaw-config/src/schema.rs:12421-12422 calls resolve_runtime_config_dirs() (which reads ZEROCLAW_CONFIG_DIR at :12199 and ZEROCLAW_WORKSPACE at :12211) BEFORE migrate_legacy_workspace_to_default_agent() at :12434. workspace_dir reassignment: schema.rs:12452-12455 assigns workspace_dir = zeroclaw_dir.join("agents").join("default").join("workspace"). SQLite factory: crates/zeroclaw-memory/src/sqlite.rs:104 opens db_path = workspace_dir.join("memory").join("brain.db"), so it lands at <install>/agents/default/workspace/memory/brain.db. Downstream callers consume the rerouted dir: crates/zeroclaw-config/src/cost/tracker.rs:22-23, crates/zeroclaw-plugins/src/host.rs:31-41, src/skills/mod.rs:24-25.

Finding 4, memory boundary.

Bare Memory::store attribution: SQLite at crates/zeroclaw-memory/src/sqlite.rs:874 routes through store_with_agent(..., None, ...); the INSERT at :1510-1512 uses COALESCE(?11, (SELECT id FROM agents WHERE alias = 'default' LIMIT 1)). Postgres at crates/zeroclaw-memory/src/postgres.rs:425 does the same; INSERT at :617 uses COALESCE($8, ...). Qdrant was the gap you flagged: bare store previously wrote agent_id: None directly into the payload at :343. Fixed in 2d3193b. The bare entrypoint now delegates to store_with_agent(...) so the existing unwrap_or("default") at :728 attaches. Mirrors the SQLite/Postgres bare-store paths.
recall_for_agents query-layer pushdown: SQLite WHERE id IN (...) AND agent_id IN ({agent_placeholders}) at sqlite.rs:1573-1577; Postgres AND agent_id = ANY($4) at postgres.rs:695; Qdrant must payload filter at qdrant.rs:817-820, 826. Legacy NULL / payload-less rows are excluded from scoped recalls because the must clause matches any: allowed_agent_ids and a None payload doesn't match.

Finding 5, security surfaces.

AccessMode::Write is now actually write-only across every read tool. The PR body claimed file_read / pdf_read / glob_search / content_search consult the read-side helper. Two of those (glob_search, content_search) and a third (pdf_read) called is_resolved_path_allowed (the write-side check that honors allowed_roots_write_only by design at crates/zeroclaw-config/src/policy.rs:1706-1711). Fixed: crates/zeroclaw-tools/src/glob_search.rs:121, crates/zeroclaw-tools/src/content_search.rs:202, crates/zeroclaw-tools/src/pdf_read.rs:116 all switched to is_resolved_path_readable. The read-side helper at policy.rs:1623-1639 explicitly does NOT consult the write-only allowlist (with a comment naming the silent-elevation it would otherwise create). Test coverage: glob_search_filters_symlink_into_write_only_root and content_search_refuses_path_under_write_only_root.
DelegateTool boundary validation. You asked for "delegated tools should be built from the child/target policy when agent-specific delegation stays in scope" plus "child runs should consume the parent's Act/rate-limit budget". I've done the boundary half cleanly and want to be explicit about what I did NOT do. New plumbing in 2d3193b at crates/zeroclaw-runtime/src/tools/delegate.rs: with_root_config(Arc<Config>) (:300-303), policy_for_target(target_alias) (:307-339) which resolves the target via SecurityPolicy::for_agent, validates as a subset of the caller via ensure_no_escalation_beyond (rejecting any target whose risk profile or workspace.access would widen rights), and assigns caller.tracker.clone() to the resolved policy so delegated runs consume from the caller's PerSenderTracker bucket. Invoked at the entry of execute_sync (:670-679), execute_background (:835-844), and execute_parallel (:1039-1054). Escalating targets surface a structured ToolResult failure instead of running. Three new tests cover escalation rejection, tracker inheritance via shared bucket exhaustion, and the legacy fallback when with_root_config is absent. What I did NOT do in this PR: rebuild the per-delegation tool registry under the target's policy. parent_tools (:73) still flows into the spawned inner DelegateTool's run_tool_call_loop and the underlying tool instances each hold the caller's Arc<SecurityPolicy> from registration time. The boundary check guarantees the target's nominal policy is a subset of the caller's, and the tracker is shared, but the actual file_read / glob_search / etc. calls in a child agentic run still enforce the caller's allowlist rather than the (potentially narrower) target's. Per-delegation tool-registry rebuild is a structural change touching runtime / memory / observability plumbing that I deliberately scoped out here. I am NOT filing a follow-up issue for it; whether it lands later is a maintainer call.

tidux. Per-agent tables / DBs: in scope is the agents table + agent_id foreign key on memories, on a single per-backend store. Per-agent table partitioning (especially for ACP session replay schema) is downstream work and isn't covered here. Fair to track separately when the ACP work crystallizes; not blocking this PR.

PR body and issue #6272 body have been updated to reflect what actually shipped (no zeroclaw agents CLI claims; new DelegateTool + read-side checklist items added; test totals refreshed). Pre-push gate (cargo +nightly fmt --all -- --check + cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings) is clean on 0185dfe. Targeted suites green: cargo test -p zeroclaw-memory -p zeroclaw-tools -p zeroclaw-runtime --all-features reports 307 + 1127 + 1656 passing.

DelegateTool::policy_for_target now refuses narrowing in addition to escalation. The spawned agentic loop reuses the caller's parent_tools registry, so a narrower target policy never reaches those tool calls; catching the narrowing at the delegate boundary turns a silent over-grant into a loud refusal that points operators at spawn_subagent for genuinely narrowed runs. SubAgent allowlist fields renamed to make explicit that they carry config aliases (the [agents.<alias>] keys) rather than backend storage identifiers: SubAgentOverrides.allowed_agent_aliases, SubAgentContext.parent_alias / allowed_agent_aliases, SubAgentSpawn.parent_alias / parent_allowed_agent_aliases. Module doc spells out that consumers building an AgentScopedMemory must resolve via Memory::ensure_agent_uuid first (SQL backends use UUIDs from the agents table; Markdown / Qdrant / None use the alias verbatim per the trait default). The in-tree consumer today is zeroclaw_memory::create_memory_for_agent, which already does the resolution. Drop a stale `let _ = agent_config;` in build_enriched_system_prompt that was claiming the parameter was unused; it is used several lines above to resolve the agent's skill bundles.

singlerider added 16 commits May 9, 2026 12:44

github-actions Bot added core Auto scope: root src/*.rs files changed. config Auto scope: src/config/** changed. tests Auto scope: tests/** changed. labels May 9, 2026

singlerider added 5 commits May 9, 2026 16:44

github-actions Bot added the tool Auto scope: src/tools/** changed. label May 9, 2026

singlerider added 5 commits May 9, 2026 17:34

github-actions Bot removed memory Auto scope: src/memory/** changed. runtime Auto scope: src/runtime/** changed. security Auto scope: src/security/** changed. labels May 9, 2026

singlerider added 3 commits May 9, 2026 21:44

WareWolf-MoonWall reviewed May 9, 2026

View reviewed changes

singlerider added 6 commits May 10, 2026 09:11

github-actions Bot added the memory Auto scope: src/memory/** changed. label May 9, 2026

This was referenced May 10, 2026

🦞 OpenClaw 生态日报 2026-05-10 gsscsd/big_model_radar#321

Open

🦞 OpenClaw 生态日报 2026-05-10 ivanweng2077/big_model_radar#21

Open

singlerider added 3 commits May 10, 2026 14:46

singlerider changed the title ~~feat(runtime): #6272 multi-agent runtime~~ feat(runtime): multi-agent runtime May 10, 2026

singlerider merged commit 03b4c13 into zeroclaw-labs:integration/v0.8.0 May 10, 2026
1 check passed

Haderach-Ram mentioned this pull request May 10, 2026

Ecosystem Digest — 2026-05-10 Haderach-Ram/openclaw-radar#1

Open

This was referenced May 11, 2026

🦞 OpenClaw 生态日报 2026-05-11 zx0828/big_model_radar#50

Open

🦞 OpenClaw 生态日报 2026-05-11 gsscsd/big_model_radar#326

Open

This was referenced May 11, 2026

Ecosystem Digest — 2026-05-11 Haderach-Ram/openclaw-radar#2

Open

Ecosystem Digest — 2026-05-11 Haderach-Ram/openclaw-radar#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(runtime): multi-agent runtime#6545

feat(runtime): multi-agent runtime#6545
singlerider merged 52 commits into
zeroclaw-labs:integration/v0.8.0from
singlerider:feat/6272-multi-agent-runtime

singlerider commented May 9, 2026 •

edited

Loading

Uh oh!

WareWolf-MoonWall left a comment

Uh oh!

tidux commented May 10, 2026

Uh oh!

Audacity88 commented May 10, 2026

Uh oh!

singlerider commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

singlerider commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Validation Evidence (required)

Security & Privacy Impact (required)

Compatibility (required)

Rollback (required for risk: medium and risk: high)

Supersede Attribution (required only when Supersedes # is used)

Uh oh!

WareWolf-MoonWall left a comment

Choose a reason for hiding this comment

Uh oh!

tidux commented May 10, 2026

Uh oh!

Audacity88 commented May 10, 2026

Uh oh!

singlerider commented May 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

singlerider commented May 9, 2026 •

edited

Loading

Rollback (required for `risk: medium` and `risk: high`)

Supersede Attribution (required only when `Supersedes #` is used)