feat(runtime): multi-agent runtime#6545
Conversation
Pure-additive P1 of the v0.8.0 multi-agent runtime: introduces the type-level vocabulary later phases will consume when they wire into Config and the runtime. - AgentAlias, PeerGroupName, PeerUsername newtypes follow the existing define_provider_ref! macro pattern from providers.rs. The macro is now #[macro_export] so multi-agent types reuse it without duplication. - AccessMode enum carries cross-agent filesystem grants (Read, Write, ReadWrite). Schema-as-law: this is the single shape for granted modes. Absence of a key in the cross-agent access map remains the jailed default. Helpers allows_read / allows_write encode the capability check used by the upcoming SecurityPolicy r/rw split. - PeerExternal: typed entry for non-agent peer-group members (humans, external bots). No production call sites yet; tests cover serde round-trip, snake_case on the enum tag, capability predicates, and the [[peer_groups.<name>.external_peers]] array shape. Refs zeroclaw-labs#6272.
…oclaw-labs#6272 The src/hands module was declared in lib.rs but had zero in-tree consumers. Its companion observability surface (HandStarted, HandCompleted, HandFailed events; HandRunDuration, HandFindingsCount, HandSuccessRate metrics; the Prometheus and OTel hand_* counters) fired only from in-test emit sites, never from production code. P5 of the zeroclaw-labs#6272 plan reclaims the "Hand" identifier for the runtime-spawned sub-agent concept arriving in P10. Cleanly freeing the name means deleting the dead code, not renaming it. Net 1136 lines red, zero green: - src/hands/{mod,types}.rs and the lib.rs module declaration. - ObserverEvent::HandStarted / HandCompleted / HandFailed. - ObserverMetric::HandRunDuration / HandFindingsCount / HandSuccessRate. - Prometheus IntCounterVec hand_runs / HistogramVec hand_duration / IntCounterVec hand_findings (registry registrations included). - OTel u64_counter zeroclaw.hand.runs / f64_histogram zeroclaw.hand.duration / u64_counter zeroclaw.hand.findings. - Match arms and test emit sites in log.rs, otel.rs, prometheus.rs, noop.rs, verbose.rs, traits.rs, and observability_traits.rs. When the new Hand sub-agent lands in P10 it will introduce its own events with parent_alias, child_alias, and lifecycle fields; copying the old shape would have been wrong for that semantics. Refs zeroclaw-labs#6272.
…zeroclaw-labs#6272 P2a is the mechanical first half of the schema rework: rename the config struct that backs every [agents.<alias>] TOML block from the historical Delegate prefix (which originated with DelegateTool's sub-agent dispatch) to AliasedAgentConfig, which reflects what it actually is in the multi-agent runtime: a top-level user-facing aliased agent. Net 132 inserts and 132 deletions across 23 files. No behavior change; type and field names move only. Touches: - crates/zeroclaw-config/src/schema.rs (definition + Default + every reference in the validator) - crates/zeroclaw-config/src/schema/v2.rs (V2 to V3 migration call sites that synthesize the default agent and propagate per-agent fields) - crates/zeroclaw-runtime/src/tools/delegate.rs and the cron tool family (cron_add, cron_remove, cron_run, cron_runs, cron_update, schedule, mod) - crates/zeroclaw-runtime/src/agent/agent.rs and tests - crates/zeroclaw-runtime/src/cron/scheduler.rs and doctor/mod.rs - crates/zeroclaw-providers, channels (orchestrator, acp_server, tts), tools/mod.rs - crates/zeroclaw-gateway/src/api.rs (gateway must rename for compilation; @Audacity88's follow-up PR layers the user-facing surface on top) - src/config/mod.rs and the component tests in tests/component/ P2b will add the nested [agents.<alias>.workspace] and [agents.<alias>.memory] blocks, the new top-level [peer_groups.<name>] map, and the slim RuntimeConfig that replaces the legacy [workspace] block. Refs zeroclaw-labs#6272.
…fig rename The mechanical type rename in the prior commit left several pieces of prose still saying "delegate agent" where the concept being described is the new AliasedAgentConfig. This commit catches them up so reviewers reading either rustdoc or tool output see the right vocabulary. - crates/zeroclaw-config/src/schema.rs: header comment for the agents map field, the section banner above AliasedAgentConfig, the doc comment on Config::resolve_aliased_agent_for_alias-style helpers (model_provider_for_agent and the alias resolver), and the section header. - crates/zeroclaw-tools/src/model_routing_config.rs: tool description, error message on remove_agent, JSON schema property descriptions for api_key / name / system_prompt / agentic. These flow into the LLM tool catalog the agent reads at runtime, so they show up in user-visible tool surfaces. - crates/zeroclaw-runtime/locales/en/tools.ftl: English source string for the model-routing-config tool description. Translation work for the other locales is out of scope for this PR per Audacity88's follow-up agreement; the fluent fallback chain holds the line until those land. - crates/zeroclaw-channels/src/orchestrator/mod.rs: internal rustdoc on the per-runtime aliased-agent config field. No behavior change. Compiler-visible code stays identical; the diff is comments and string literals. Refs zeroclaw-labs#6272.
…law-labs#6272 P2b lands the schema-as-law shapes for v0.8.0 multi-agent config. Pure-additive: every existing TOML config still loads identically; new fields default to jailed / SQLite / empty. Three new structs in crate::multi_agent (already houses the alias newtypes and AccessMode from P1): - MemoryBackendKind enum (None, Sqlite, Postgres, Qdrant, Markdown, Lucid). Closed set. Schema-is-law: every consumer-side dispatch on the backend goes through this enum, never a string match. The legacy Config.memory.backend dotted-alias string stays for now; P3 wires the enum into the resolver as the cliff approaches. - AgentWorkspaceConfig: optional explicit path, cross-agent filesystem allowlist (BTreeMap<AgentAlias, AccessMode>), unrestricted_filesystem escape boolean, cross-agent memory allowlist (Vec<AgentAlias>). Default is fully jailed. - AgentMemoryConfig: backend selection, locked at agent creation per the spec. Default is Sqlite. - PeerGroupConfig: channel ref, member agents, external_peers, group-wide ignore list. Used by Config.peer_groups. Wired into the schema: - AliasedAgentConfig gains nested workspace and memory blocks ([agents.<alias>.workspace], [agents.<alias>.memory]). - Config gains peer_groups: HashMap<String, PeerGroupConfig> ([peer_groups.<name>]). HashMap matches the existing named-collection convention (memory_namespaces, knowledge_bundles, mcp_bundles); the typed PeerGroupName is used at validation and resolution time. Plumbing: HasPropKind impls in traits.rs for every new type the parent macros traverse, including BTreeMap<AgentAlias, AccessMode> (PropKind::Object). The shared define_provider_ref! macro grew PartialOrd/Ord derives so AgentAlias can be a BTreeMap key, which also benefits ChannelRef / ModelProviderRef / TtsProviderRef / TranscriptionProviderRef without touching them individually. The original plan called for a slim top-level RuntimeConfig as the new install-level multi-agent struct, but Config.runtime is already a runtime-adapter config (kind: native|docker, etc.) and nothing in v0.8.0 needs install-level multi-agent toggles that isn't already covered. Dropped that piece; if a need surfaces in P10 or later, it lands as a field on the existing RuntimeConfig or its own struct. Tests: 7 new unit tests on the new types covering serde round-trip on the array-of-tables forms, default-jailed semantics, the BTreeMap access map, snake_case enum tags, and channel dereference through ChannelRef. Refs zeroclaw-labs#6272.
…s#6272 P3 wires the schema-as-law constraints from the locked plan into Config::validate(). Every shape that was implicit at the type level becomes a load-time error so misconfigured installs fail cheap instead of producing confusing runtime errors later. Per-agent (inside the agents loop): - workspace.access keys must NOT be self. An agent always has full access to its own workspace; a self-reference in the cross-agent allowlist is meaningless and is rejected as InvalidFormat. - workspace.access keys must point at configured agents. DanglingReference if not. - workspace.read_memory_from entries follow the same rules: no self-reference, no dangling alias, plus an additional same-backend constraint. Cross-backend memory sharing is deferred to v0.8.1, so an entry pointing at a sibling on a different MemoryBackendKind fails with InvalidFormat at config load time. The error is explicit about the deferral so the operator knows where to look in the changelog. Cross-agent (after the agents loop): - peer_groups.<X>.channel must not be empty (RequiredFieldEmpty). - peer_groups.<X>.agents entries must exist as configured agents (DanglingReference); the validator iterates a sorted list of group names so error ordering is stable across runs, matching the existing per-agent validator pattern at the top of the same function. - Each peer-group member's channels list must include the group's channel (InvalidFormat if mismatched). This catches the most common multi-agent misconfiguration: putting an agent in a group it cannot physically reach. Tests: 7 new unit tests in schema::tests, each builds a minimal valid Config via a multi_agent_test_config helper and mutates a single field to provoke one validator. Coverage: - workspace.access self-reference rejected - workspace.access dangling target rejected - read_memory_from self-reference rejected - read_memory_from cross-backend rejected with deferral note - peer_group dangling member rejected - peer_group member without the group's channel rejected - valid two-member same-channel peer group accepts cleanly Side cleanup from rustfmt: a handful of `pub use schema::{...}` re-export blocks (src/config/mod.rs and friends) get re-sorted into the right alphabetical position now that AliasedAgentConfig replaces DelegateAgentConfig from P2a. Pure cosmetic. Refs zeroclaw-labs#6272.
(P6a) Adds the agents table and agent_id column on memories for the SQLite backend, with idempotent self-detection, atomic backup before destructive ALTERs on populated DBs, and a default-agent backfill so 0.7.x installs upgrade without data loss. The migration runs from SqliteMemory::with_embedder and ::new_named right after init_schema. Detection: if the agents table does not exist OR the memories.agent_id column is missing, the migration fires; otherwise it short-circuits as a true no-op. Schema (matches the locked plan in tmp/6272-multi-agent-plan.md): - agents: id TEXT PRIMARY KEY, alias TEXT NOT NULL UNIQUE, created_at TEXT NOT NULL. UUID stored as TEXT for cross-DB portability so the same code path works on SQLite, Postgres (P6b), and Lucid's local SQLite (P6c, automatic via SqliteMemory composition). - memories.agent_id TEXT, indexed. Left nullable at the SQLite layer because SQLite cannot add a NOT NULL FK column to an existing populated table without a full rebuild; the AgentScopedMemory<M> wrapper in P7 enforces non-null at write time by carrying the bound agent UUID and injecting it on every store. Nullability at the DB layer is the safe choice for the upgrade path: legacy rows backfill cleanly without rewrite, and the application never produces NULL after P7 lands. Atomic backup: when the memories table has rows AND the migration is about to fire, the SQLite file is copied to {db_name}.backup-{UTC_timestamp} alongside the original before the ALTER runs. A crashed migration leaves the operator with a recoverable copy. Skipped on fresh installs (no data to lose). Default agent: a UUID is generated (uuid::Uuid::new_v4) and INSERT OR IGNORE'd into agents. The post-INSERT re-query returns the row that actually persisted, so concurrent inits from different threads or processes converge on a single UUID. Tests: 4 integration tests in the new crates/zeroclaw-memory/tests/ directory (which the plan explicitly calls for). Coverage: fresh install, idempotent re-init, pre-migration data backfill with backup, post-migration store/recall round-trip. The pre-migration test seeds the legacy schema (memories + indices + FTS5 virtual table + triggers, no agents table, no agent_id) so the migration runs against a true upgrade scenario, not a hand-built half-schema. Postgres (P6b) and Lucid (P6c) follow in the next commits; Lucid's local DB is SqliteMemory underneath so it picks up this migration automatically. The wire-format work for cross-agent scoping in the external Lucid CLI stays deferred to v0.8.1 per the plan. Refs zeroclaw-labs#6272.
…#6272 (P6b) Mirrors the SQLite migration from P6a, but PG-flavored: idempotent via ADD COLUMN IF NOT EXISTS / CREATE INDEX IF NOT EXISTS / ON CONFLICT DO NOTHING, default-agent UUID generated in Rust and bound as a parameter, backfill in the same init pass via a parameterized UPDATE. Schema (matches SQLite path so cross-DB code stays one shape): - {schema}.agents: id TEXT PRIMARY KEY, alias TEXT NOT NULL UNIQUE, created_at TIMESTAMPTZ NOT NULL. - {qualified_table}.agent_id TEXT, indexed. Nullable at the DB layer matching SQLite; the AgentScopedMemory<M> wrapper in P7 enforces non-null at write time. Wired into PostgresMemory::initialize_client right after init_schema so every fresh connection is fully migrated before try_enable_pgvector runs (which can fail safely on pgvector absence). Backups: operator's responsibility for Postgres. The binary cannot reach across the network to dump a managed cluster, so we do not take a file-copy backup the way SQLite does. Documented. Concurrent first-init: INSERT...ON CONFLICT (alias) DO NOTHING plus a follow-up SELECT means concurrent initializers from different processes converge on the same default agent UUID (whichever insert wins is the persisted row). Tests: skipped in this commit. Existing Postgres tests in the crate are gated behind the memory-postgres feature and require a running Postgres for execution; CI does not provision one. Cross-DB parity tests for the migration land alongside any test-container support in a follow-up. The code path is a direct mirror of P6a's tested SQLite path so confidence is reasonable. Lucid (the third backend) wraps SqliteMemory for its local store, so P6a's migration runs automatically when Lucid initializes; no separate Lucid migration code is needed for v0.8.0. The external Lucid CLI wire format for cross-agent scoping stays deferred to v0.8.1 per the plan. V2->V3 migration extension: the existing synthesize_default_agent_if_needed helper in crates/zeroclaw-config/src/schema/v2.rs already inserts the default agent's config-side row; the DB-side migration creates the matching agents-table row independently. Both arrive at "there is a default agent" without coordination; runtime resolves the UUID by alias when AgentScopedMemory<M> is constructed in P7. Refs zeroclaw-labs#6272.
…e for zeroclaw-labs#6272 (P7) The trait gains two agent-aware methods, both with default implementations that fall back to existing behavior so backends opt in incrementally without breaking compilation: - store_with_agent(key, content, category, session_id, namespace, importance, agent_id) — extends store_with_metadata with the bound agent's UUID. Backends with native agent_id columns (SqliteMemory, PostgresMemory after P6) override to actually persist the attribution; the default falls back to store_with_metadata so non-aware backends stay correct. - recall_for_agents(allowed_agent_ids, query, limit, session_id, since, until) — narrows recall to a specific allowlist of agent UUIDs. The default falls back to recall(); backends with native columns override to add WHERE agent_id IN (...) at the SQL layer. The wrapper at crates/zeroclaw-memory/src/agent_scoped.rs is the canonical site for agent-identity enforcement: - AgentScopedMemory<M: Memory> holds Arc<M>, the bound agent's UUID, and the resolved allowlist (the set of sibling UUIDs computed from read_memory_from at config load). - Construction always includes the bound agent's UUID in the allowed set so callers do not need to remember to include themselves. - store / store_with_metadata route through store_with_agent with the bound agent's UUID, so the inner backend persists attribution on every write through the wrapper. - recall / recall_for_agents route through recall_for_agents with the wrapper's allowlist, so backends that override the trait method get the SQL-layer filter for free. - recall_for_agents accepts an explicit allowlist from a caller but intersects it with the wrapper's bound allowlist so an over-broad request from a tool cannot sneak past the construction-time policy. - forget routes to inner unchanged with a TODO marker; cross- agent delete protection lands when MemoryEntry plumbs agent_id in the read paths (P7 follow-up that ships alongside the SqliteMemory override of recall_for_agents). Tests cover construction (bound agent always in allowlist, siblings union with self), store/recall round-trip via the wrapper, and the intersection semantics on recall_for_agents with a rogue UUID. Backend-side SQL filtering is the next commit (P7 follow-up). The wrapper exposes name() identical to the inner backend so existing log lines and dashboards keep working; the wrapper's existence becomes visible only through the agent_alias tracing field bound at agent-loop entry (P12). Refs zeroclaw-labs#6272.
…roclaw-labs#6272 (P8) Extends SecurityPolicy to carry a second allowlist for read-only roots alongside the existing read-write allowlist. The multi-agent runtime uses this to translate cross-agent AccessMode grants into filesystem policy: an AccessMode::Read entry on agent A's workspace.access map for agent B becomes a read-only entry for B's workspace path on A's policy; AccessMode::Write and AccessMode::ReadWrite become regular allowed_roots entries. Schema: - SecurityPolicy gains pub allowed_roots_read_only: Vec<PathBuf>. Default empty so existing single-agent installs and every RiskProfileConfig-only call site keep their current semantics. - RiskProfileConfig is unchanged. The read-only roots concept belongs to the per-agent workspace block, not the shared risk profile, so from_risk_profile leaves the new field empty. The multi-agent runtime populates it when it builds a per-agent policy from the workspace.access map (P10/P11 wiring). Methods: - is_under_allowed_root (existing) keeps strict read+write semantics. Write-side tools (file_write, git_operations, shell) call this; the doc comment now spells out the contract. - is_under_read_only_allowed_root (new) checks only the new list. - is_under_any_allowed_root (new) is the union over both lists. Read-side tools (file_read, glob_search, content_search) should call this so a cross-agent AccessMode::Read grant unblocks the read. - The shared root-matching logic moves into a private roots_contain helper to keep the rw and read-only check paths in lockstep. Tests: four new unit tests in policy::tests cover (1) the read-only check matching only its own list, (2) the union behavior of is_under_any_allowed_root, (3) is_under_allowed_root NOT seeing read-only entries (the write-side enforcement guarantee), and (4) from_risk_profile leaving the new field empty. Tools that consume allowed_roots stay on is_under_allowed_root for now; the swap to is_under_any_allowed_root for read paths and the population of allowed_roots_read_only from workspace.access land in the cliff (P4+P9) when the per-agent policy construction wires through the runtime. Refs zeroclaw-labs#6272.
…-labs#6272 (P12) Adds the multi-agent attribution surface to two event streams the runtime emits without disturbing any existing call site: - RuntimeTraceEvent (crates/zeroclaw-runtime/src/observability/runtime_trace.rs) gains an `agent_alias: Option<String>` field with serde-default and skip_if_none. The existing `record_event(...)` function forwards to a new `record_event_with_agent(..., agent_alias, payload)` variant with `agent_alias = None`, so all 25 existing call sites compile unchanged. Sites that bind a per-agent alias post-P10 call the agent-aware variant directly. - AuditEvent (crates/zeroclaw-runtime/src/security/audit.rs) gains the same field plus a builder method `AuditEvent::with_agent_alias` so existing construction continues to work via Default-derived field initialization plus opt-in agent attribution. The `AuditEvent::new` constructor seeds the new field as None. Audit storage stays at <install>/audit/ globally; an agent delete does NOT remove its prior audit trail (per the locked plan). The new field lets queries reconstruct per-agent activity from a global trail after the fact. Console formatter prefix [<alias>] and otel/dora/prometheus alias labels are deferred. They wire into agent-loop entry binding sites (P10) so the alias has a value to plumb; this commit ships only the schema-side surface so P10 can populate without churning serialization formats again. Refs zeroclaw-labs#6272.
…#6272 (P11) Adds the cross-channel self-loop guard the multi-agent runtime needs: a bot must never respond to its own messages, even when a misconfigured peer group lists the bot's own handle as an external peer or when the same channel binding round-trips an outbound back through the inbound queue. Two trait additions on `zeroclaw_api::channel::Channel`: - `fn self_handle(&self) -> Option<String>`: each channel impl exposes its own bot handle (e.g. `@my_bot` for Telegram, the bot's user ID for Discord) when known. Default returns `None`, so adding the guard does not break any existing channel impl. Channels override as their identity becomes available at runtime. - `fn drop_self_messages(&self, msg: &ChannelMessage) -> bool`: default implementation does a case-insensitive comparison of `msg.sender` against `self_handle()`, normalising leading `@` so Telegram-style handles match regardless of which form the SDK delivers. Channels with non-string identity (numeric Discord IDs, Matrix MXIDs) get the same shape because the comparison is string-based and stable. Wired into the orchestrator's inbound path (`crates/zeroclaw-channels/src/orchestrator/mod.rs`, `process_channel_message`) right after `target_channel` resolution and before any downstream processing: if the channel reports the inbound is self-authored, we drop it with a debug-level trace and return. Two-layer defense: a future agent-loop fallback (P11 follow-up) compares against the agent's own outbound queue, so channels that have not yet implemented `self_handle` still get caught at a second layer. Tests: four new unit tests on the trait default cover (1) `None` handle returns false (no guard fires on un-identified channels), (2) exact handle match, (3) `@` prefix and case-insensitive normalisation, and (4) an empty/`@`-only handle does not match every inbound (guard only fires on real handles). Channel implementations stay opt-in. Override `self_handle` per channel as the platform's authentication path exposes the bot's identity. The orchestrator-side check is the single point where the guard fires, so a missed channel implementation degrades gracefully (no guard at SDK layer, fallback at agent loop). Refs zeroclaw-labs#6272.
…law-labs#6272 (P10 prep) Adds the subset validator the SubAgent spawn path will call to reject any override that escalates beyond the parent agent's permissions. Pure additive on SecurityPolicy. Subset rules (a child policy is allowed iff all hold against the parent): - allowed_roots: every entry on child must appear on parent's allowed_roots (no widening of read+write scope). - allowed_roots_read_only: every entry on child must appear on parent's allowed_roots OR parent's allowed_roots_read_only. A SubAgent can downgrade a parent's rw root to read-only on itself; it cannot fabricate read access to a path the parent could not even read. - allowed_commands: every entry on child must appear on parent's allowed_commands. - workspace_only: child must be true whenever parent is true. A SubAgent cannot disable workspace_only that the parent enforces. - max_actions_per_hour: child <= parent. - max_cost_per_day_cents: child <= parent. EscalationViolation enum names each violation kind so callers can produce precise errors. impl Display + Error so it integrates with the existing anyhow::Error chains in the runtime. Tests: 9 unit tests cover the accept paths (identical, narrowed, rw-root-downgraded-to-read-only) and the reject paths for every EscalationViolation variant. This is the foundation for the SubAgent spawn validator that lands in P10. The SubAgent runtime will: 1. Build a candidate child policy from override fields. 2. Call parent.ensure_no_escalation_beyond is INVERTED — actually call child.ensure_no_escalation_beyond(&parent). 3. Reject the spawn on Err with the violation chained for user-facing diagnostics. Refs zeroclaw-labs#6272.
Type-level scaffolding for runtime-spawned ephemeral sub-agents that inherit a parent agent's identity, security policy, and memory allowlist by default and may only narrow via explicit overrides. Module surface in crates/zeroclaw-runtime/src/subagent/mod.rs: - SubAgentOverrides: optional policy + allowed_agent_ids narrowing (None on every field = inherit parent verbatim). - SubAgentContext: bound parent agent_id, validated child policy (Arc<SecurityPolicy>), resolved memory allowlist. - SubAgentSpawn::build(overrides): runs the inheritance validator against the parent. Policy overrides flow through SecurityPolicy::ensure_no_escalation_beyond from P10-prep, with the EscalationViolation chained via anyhow so callers surface the precise rule that fired. Allowlist overrides reject any UUID not on the parent's allowlist; the parent's bound agent_id is always re-included so a SubAgent can always recall its own memories. Five unit tests cover the contract: - default_overrides_inherit_parent_verbatim - policy_override_that_is_subset_is_accepted_and_narrows - policy_override_that_escalates_is_rejected_with_violation_chained - allowlist_override_subset_is_accepted_and_always_includes_self - allowlist_override_with_rogue_uuid_is_rejected Also fills in agent_alias: None at the two RuntimeTraceEvent test construction sites that pre-dated the P12 alias field on the struct. P10b (cron JobType::Agent dispatch routes through SubAgentSpawn) and P10c (spawn_subagent agent-loop tool) follow once the cliff lands.
…claw-labs#6272 P10b) The cron scheduler's JobType::Agent dispatch now constructs the run as a SubAgent of the owning agent rather than as an ad-hoc agent invocation. Cron is one of two SubAgent spawn sites in v0.8.0; the other is the spawn_subagent agent-loop tool (P10c). Both funnel through SubAgentSpawn::build so permission inheritance, tracing span shape, and audit attribution stay uniform across spawn sites. What changed in run_agent_job: - Build SubAgentSpawn::for_agent(config, agent_alias).build(default overrides) before the security pre-flight. Spawn failures (no such agent, security-policy resolution failure) short-circuit the run with an explicit subagent-spawn error so cron logs distinguish inheritance failures from security blocks. - Wrap the agent::run call in a tracing span with parent_alias / run_id / spawn_site = "cron" fields. The run_id is the existing cron run-session UUID, so memory snapshots and span events correlate without bookkeeping. P12's structured-label emitters pick the parent_alias up automatically. Default SubAgentOverrides means "inherit verbatim" — the cron job always runs with the owning agent's policy and memory allowlist. Cron has no UI for narrowing today; if a future revision wants per-job narrowing, it constructs a non-default SubAgentOverrides and lets the build-time validator reject any escalation. New SubAgentSpawn::for_agent constructor in subagent/mod.rs resolves the parent identity from a Config + alias: the agent's [agents.<alias>.workspace.read_memory_from] becomes the parent's allowlist (the bound alias is always re-included), and the policy is SecurityPolicy::for_agent. Two new tests cover the resolution path and the unknown-alias error case, bringing the SubAgent test count to seven. P10c (the agent-loop spawn_subagent tool) follows in a separate commit; it uses the same for_agent constructor with caller-supplied overrides.
The second SubAgent spawn site lands as an always-on agent-loop tool
that lets a parent agent fork a focused subtask under its own
identity. Cron's JobType::Agent dispatch (P10b) was the first spawn
site; both funnel through SubAgentSpawn::build so permission
inheritance, tracing-span shape, and audit attribution stay uniform.
Tool surface:
- name: spawn_subagent
- args: { prompt: string }
- behavior: validate the spawn via SubAgentSpawn::for_agent against
the parent's identity, build a SubAgentContext under default
(inherit-verbatim) overrides, run the agent loop on the supplied
prompt under the parent's alias inside a
tracing::info_span!("subagent", parent_alias, run_id,
spawn_site = "tool"), and return the response.
- failures: spawn-validator failures (unknown alias, security-policy
resolution) and agent-run failures both come back as structured
ToolResult { success: false, error: ... } rather than panics, so
the agent loop sees them as recoverable tool errors.
The narrowing-override path (sub-agents that drop privileges below
the parent's) is deferred to v0.8.1 along with the
[agents.<alias>].subagent_* config block. The spawn validator
already supports it via SubAgentOverrides — adding the surface later
is purely additive.
Wiring in tools/mod.rs:
- pub mod spawn_subagent; (alphabetical, between sop_status and
verifiable_intent).
- pub use SpawnSubagentTool re-export.
- Always-on registration in all_tools_with_runtime, next to
ScheduleTool.
- Listed in BUILTIN_TOOL_INTEGRATIONS so the integrations panel
surfaces it.
Recursion bounding is left to existing per-run guardrails: each
SubAgent run is a full agent loop, capped by the runtime profile's
max_iterations and the SecurityPolicy action/cost budgets. A future
revision can add explicit depth tracking via a tokio task-local if
operational data shows the soft caps are insufficient.
Four unit tests:
- tool_name_and_schema_are_well_formed
- missing_prompt_is_rejected
- empty_prompt_is_rejected
- unknown_parent_alias_surfaces_spawn_failure (verifies the spawn
validator's Err is structured into a ToolResult, never panics or
attempts a recursive run)
Live agent-loop integration is exercised by the existing
JobType::Agent end-to-end paths (P10b) — both spawn sites share the
same downstream agent::run wiring so the cron tests cover the
shared post-spawn flow.
…es (zeroclaw-labs#6272) The WorkspaceTool is the agent-callable wrapper around WorkspaceManager that lets a model switch between multi-workspace profiles at runtime (active_workspace, workspaces_dir, etc.). Per-agent workspaces under [agents.<alias>.workspace] obsolete the entire multi-workspace-profile primitive: each agent has its own jailed workspace inherently, with no need for a tool to switch between them. Deleted: - crates/zeroclaw-tools/src/workspace_tool.rs (the tool body + tests) - src/tools/workspace_tool.rs (orphan re-export shim, never declared by src/tools/mod.rs and so never compiled — pure dead bytes on disk) - pub mod workspace_tool; in crates/zeroclaw-tools/src/lib.rs - pub use ::WorkspaceTool re-export in tools/mod.rs - The conditional registration block in all_tools_with_runtime gated on root_config.workspace.enabled (16 LoC of "build a WorkspaceManager from a path string and wrap it in a tool") Net -373 LoC. The legacy [workspace] block parsing it consulted is retired separately in the same PR; this commit removes the leaf consumer first so the root deletion has nothing pointing at it.
…icy::for_agent (zeroclaw-labs#6272) WorkspaceBoundary + BoundaryVerdict were the per-tool/per-domain/per-path gate that consulted the active multi-workspace profile to deny tool access outside the profile's allowlists. With per-agent SecurityPolicy construction (SecurityPolicy::for_agent in crates/zeroclaw-config/src/policy.rs), the same enforcement happens at the policy layer for every tool that already consults the policy — there is no need for a parallel boundary type. The module was already dead before this commit: the only #[allow(unused_imports)] pub use in security/mod.rs had zero external callers. Deleting the file removes 211 LoC including its 7 tests, plus the module declaration and the re-export. Net -213 LoC. The module's last live consumer (the WorkspaceTool that constructed it) was deleted in the previous commit.
…abs#6272) WorkspaceManager is the multi-workspace-profile primitive: list, create, switch, export. Its sole production consumer was the WorkspaceTool deleted in the previous commit; the tests in this module exercise the manager directly and have no value once the manager itself is gone. Deleted: - crates/zeroclaw-config/src/workspace.rs (WorkspaceManager, WorkspaceProfile, all helpers, all tests) - src/config/workspace.rs (re-export shim) - pub mod workspace; in both lib.rs files Net -384 LoC. The legacy [workspace] block on Config still references WorkspaceConfig (a struct of multi-workspace-profile flags); that struct + its onboarding flow + the active_workspace.toml marker machinery come out in the next commit.
…ker (zeroclaw-labs#6272) Last leg of the multi-workspace-profile retirement: the [workspace] config block, the on-disk active_workspace.toml marker mechanism, and the entire onboarding section that drives them. The previous three commits killed the consumers (WorkspaceTool, WorkspaceBoundary, WorkspaceManager); this one removes the schema field and the config-resolution chain step that fed them. Schema deletions in crates/zeroclaw-config/src/schema.rs: - WorkspaceConfig struct (enabled / active_workspace / workspaces_dir / isolate_memory / isolate_secrets / isolate_audit / cross_workspace_search) + Default impl + default_workspaces_dir helper - pub workspace: WorkspaceConfig field on Config - 3 workspace: WorkspaceConfig::default() construction sites in the three Config::default() / fixture paths - ACTIVE_WORKSPACE_STATE_FILE const + ActiveWorkspaceState struct + active_workspace_state_path / load_persisted_workspace_dirs / persist_active_workspace_config_dir / _in helpers (~125 LoC of marker-write/load plumbing) - ConfigResolutionSource::ActiveWorkspaceMarker variant + the load_persisted_workspace_dirs branch in resolve_runtime_config_dirs (the env-var resolver chain is now ZEROCLAW_CONFIG_DIR -> ZEROCLAW_WORKSPACE -> default, with no marker step) - KnowledgeConfig.cross_workspace_search field + default (knowledge graph's cross-workspace search axis is meaningless under a single-workspace install) - Three tests of the marker mechanism: resolve_runtime_config_dirs _uses_active_workspace_marker, load_or_init_uses_persisted_active _workspace_marker, persist_active_workspace_marker_is_cleared_for _default_config_dir; trimmed marker scenery from resolve_runtime_config_dirs_uses_env_config_dir_first since the test's actual claim (env wins) doesn't need the marker as a foil Onboard deletions in crates/zeroclaw-runtime/src/onboard/mod.rs: - Section::Workspace enum variant + the as_path_prefix and from_path match arms - The workspace() async section walker - Section::Workspace dispatch arms in run() and the 0=> arm in run_all (subsequent indices renumbered) - The cfg.workspace.enabled arm in section_has_signal - section_has_signal_workspace_tracks_enabled_flag test (testing a retired field) and workspace_double_run_is_idempotent_on_disk test (testing the retired section's flow); other helper-validation tests switched from "workspace" to "memory" since they exercise generic helpers (mark_completed, skip_if_configured) Binary-side deletions in src/main.rs: - OnboardSection::Workspace enum variant - workspace_only CLI flag (under hide=true) + its threading through the Onboard command handler - resolve_onboard_target's 7th parameter + 3 unit tests adjusted for the new arity - Removed src/config/mod.rs's pub use of the deleted WorkspaceConfig Net -571 LoC vs +39 LoC shim/comment updates. Combined with the prior three commits in this PR (WorkspaceTool, WorkspaceBoundary, WorkspaceManager) the multi-workspace-profile primitive is gone in its entirety. Per-agent workspaces under [agents.<alias>.workspace] serve the isolation use case the legacy block was reaching for, with type-level enforcement instead of TOML-flag inertia.
…abs#6272) Per-agent memory under [agents.<alias>.memory] supersedes the string-namespace primitive. Killing the parallel structure in one sweep: Schema deletions in crates/zeroclaw-config/src/schema.rs: - MemoryNamespaceConfig struct (namespace / backend / retention_days / read_only / pinned_categories) - pub memory_namespaces: HashMap<String, MemoryNamespaceConfig> field on Config - pub memory_namespace: String field on AliasedAgentConfig - 3 default-construction sites for memory_namespaces and 3 for the per-agent memory_namespace field - The "memory-namespaces" / "memory-namespace" entry in the cross-reference dangling-alias validator (the validator now only enforces risk-profile / runtime-profile references, which is what remains on AliasedAgentConfig) V2->V3 migration deletions in crates/zeroclaw-config/src/schema/v2.rs: - T14e widening block and the ensure_memory_namespace synthesis helper. The new V2->V3 path drops the V2 memory_namespace key off agent tables silently. Doc comment updated. Test deletions in crates/zeroclaw-config/tests/migration.rs: - t14e_memory_namespace_widening (the field it asserted on no longer exists) Memory crate deletions: - crates/zeroclaw-memory/src/namespaced.rs in its entirety (232 LoC, NamespacedMemory<M> wrapper + 6 tests) - pub mod namespaced; + pub use namespaced::NamespacedMemory in crates/zeroclaw-memory/src/lib.rs DelegateTool deletions in crates/zeroclaw-runtime/src/tools/delegate.rs: - memory_namespaces: Arc<HashMap<String, MemoryNamespaceConfig>> field on the struct - 4 default-construction sites for that field (new / new_with_options / with_depth / with_depth_and_options) and 2 Arc::clone sites in the tokio::spawn background paths - with_memory_namespaces builder method - resolve_memory_ns helper (the namespace-alias-to-string resolver) - get_agent_memory helper (was already #[allow(dead_code)] WIP — gone along with NamespacedMemory; the struct's `memory: Option<...>` field stays for the future per-agent plumbing) - MemoryNamespaceConfig and NamespacedMemory imports Caller deletions: - crates/zeroclaw-runtime/src/tools/mod.rs: .with_memory_namespaces(...) builder call on the DelegateTool registration - crates/zeroclaw-runtime/src/onboard/mod.rs: the memory_namespace agent-form prompt (step 9 in the per-agent walk) and the memory_aliases lookup - crates/zeroclaw-gateway/src/api_onboard.rs: memory_namespaces field on AgentOptionsResponse, the get_map_keys("memory_namespaces") feed, the "workspace" entries in the section-help / section-group / picker routing (now consistent with the legacy [workspace] retirement that landed earlier in this PR), and a fixture-test count update Net -404 LoC vs +23 LoC. The 23 lines added are the two-paragraph T14e doc-comment rewrite explaining the V2->V3 migration drop, and short comments where lookups got shorter. Per-agent memory backend selection lives at [agents.<alias>.memory] (MemoryBackendKind enum, immutable after agent creation), and cross-agent memory access flows through [agents.<alias>.workspace.read_memory_from] + AgentScopedMemory<M> (landed earlier in this PR). String-tagged namespaces no longer have a place in the architecture.
…sion methods (zeroclaw-labs#6272) The wrapper and the two `Memory` trait extension methods landed earlier in this branch as scaffolding for a per-agent memory plumbing that does not actually plug into the runtime in v0.8.0: - `Agent::from_config` still hands a raw `Arc<dyn Memory>` to the agent loop; nothing constructs an `AgentScopedMemory<M>` on a live code path. - The trait methods `Memory::store_with_agent` and `Memory::recall_for_agents` shipped with default forwarders that silently dropped the `agent_id` parameter. No backend (Sqlite, Postgres, Lucid, Markdown, Qdrant, None) overrode them, so the agent_id was unused at every layer (`_agent_id: Option<&str>` in the default impl is exactly the kind of suppress-the-warning scaffolding the project rule against `_`-prefixed unused params forbids). Pulling the scaffolding out of v0.8.0 and leaving the schema + storage foundation (which IS load-bearing) in place: - Deleted `crates/zeroclaw-memory/src/agent_scoped.rs` (the wrapper + 4 unit tests). - Deleted `pub mod agent_scoped;` in the memory lib.rs. - Deleted `Memory::store_with_agent` and `Memory::recall_for_agents` from `crates/zeroclaw-api/src/memory_traits.rs` along with their default impls and doc comments. - Updated the doc comments in `sqlite.rs`, `multi_agent.rs`, and `schema.rs` that referenced the wrapper to point at v0.8.1 as the landing target for the per-agent memory plumbing instead. What stays (and is still load-bearing): - The `agents` table on SQLite + Postgres with the synthesized `default` agent row (commits 7e19c44 and 6ff29e4). - The nullable `agent_id TEXT` column on `memories`, backfilled to the default agent's UUID, with its index. Existing rows are attributable; new rows from this branch are still agent-id-NULL because no caller stamps it yet, which is the same state the migration left them in. - The `[agents.<alias>.workspace.read_memory_from]` schema field + the cross-reference validator that rejects self/dangling/cross- backend entries at config load. Configs authored against the v0.8.0 schema stay valid when v0.8.1 lands the runtime consumer. Net -450 LoC vs +15 LoC doc-comment rewrites. The architectural target (per-agent memory backends keyed off the agent's identity, with the allowlist intersected on read) ships in v0.8.1 alongside the `Agent::from_config` restructure that consumes a per-agent memory backend; this commit makes v0.8.0 honest about what's actually running.
…claw-labs#6272) Adds two abstract methods on the Memory trait — `store_with_agent` and `recall_for_agents` — and implements them explicitly on every backend in the workspace. No defaults: each backend gets a real implementation so the agent_id parameter is never silently dropped at the trait boundary, closing the dead-default footgun the previous attempt fell into. The trait extension in `crates/zeroclaw-api/src/memory_traits.rs`: - `store_with_agent(key, content, category, session_id, namespace, importance, agent_id)` — required. Persists with explicit agent attribution. - `recall_for_agents(allowed_agent_ids, query, limit, session_id, since, until)` — required. Filters results to the supplied set of agent UUIDs (plus legacy NULL-agent rows). Empty allowlist means no filter (callers that want unscoped recall stay on `recall`). Per-backend implementations: - **SqliteMemory** writes the `agent_id` column on the existing INSERT (column was added in P6a as nullable + indexed). `recall_for_agents` over-fetches via the existing hybrid recall, then filters by a single-round-trip indexed lookup on the candidate row ids. Pushes the SQL primary filter the way the column was designed for; the small over-fetch is the v0.8.0 pragmatic shape. - **PostgresMemory** mirrors SqliteMemory: column written on INSERT with ON CONFLICT update, post-recall filter via `id = ANY($1)` query. - **LucidMemory** composes SqliteMemory + remote daemon. Writes attribution to the local SQLite mirror (the daemon has no agent_id concept in v0.8.0); recall delegates to the SQLite leg's `recall_for_agents` so the cross-agent allowlist is enforced locally. Documented in the impl comments. - **QdrantMemory** adds `agent_id` to `MemoryPayload` (skip-if-none serde) so existing rows stay shape-compatible. Store includes the agent id in the upsert payload; recall_for_agents over-fetches and uses a scroll/has_id query to fetch payloads for the candidate ids, then post-filters. Pushing the agent_id filter into the vector search call itself is a v0.8.1 optimization. - **MarkdownMemory** ignores agent_id at the row level: per-agent attribution is the on-disk path (`<install>/agents/<alias>/workspace/MEMORY.md`), set by the per-agent factory. Cross-agent recall is composed at the wrapper layer (`AgentScopedMarkdownMemory`, landing next) which holds an own MarkdownMemory plus a peer set; this trait impl is the single-instance leaf. - **NoneMemory** is a trivial no-op for both methods — the disabled backend keeps the runtime wiring stable without persisting anything regardless of agent attribution. Test mocks (`QueryEchoMemory`, two `MockMemory` flavors, `MockMemoryWithEntries`, `NoopMemory`, `RecallMemory`, `TrackingMemory`) get parallel forward-or-noop stubs so the workspace builds and tests green. Each is contextually appropriate: trackers track, recallers recall, noops noop. Wrapper updates: `AuditedMemory<M>` adds matching `store_with_agent` / `recall_for_agents` methods that log to the audit trail and forward to the inner backend. Net +783 LoC. The new Memory trait surface is the foundation for `AgentScopedMemory<M>` (next commit) which holds a bound agent_id + allowlist and routes every store/recall through these methods so the agent_id is stamped/enforced at the runtime boundary too.
…aw-labs#6272) The runtime memory wrapper that sits between agent-loop callers and the per-agent backend instance. Holds the bound agent's UUID + the resolved cross-agent allowlist (own UUID + `read_memory_from` entries), and routes every operation through the backend's new agent-aware trait methods so the agent_id is enforced end-to-end. In `crates/zeroclaw-memory/src/agent_scoped.rs`: - `AgentScopedMemory<M: Memory>` holds `Arc<M>` + bound `agent_id` + `allowed_agent_ids: HashSet<String>`. `new(inner, agent_id, allowed_sibling_agent_ids)` always inserts the bound agent into the allowlist so callers don't have to remember themselves. - `store` / `store_with_metadata` route through the inner backend's `store_with_agent` with the bound agent_id always stamped. - `store_with_agent` overrides any caller-supplied agent_id to the bound agent_id. The wrapper's contract is one-agent-one-attribution; if a caller wants different attribution, they construct a different wrapper. - `recall` calls the inner's `recall_for_agents` with the bound allowlist. - `recall_for_agents` intersects the caller-supplied allowlist with the bound allowlist. A non-empty caller allowlist whose intersection with the bound is empty returns `Ok(Vec::new())` directly — the empty-allowlist sentinel ("no filter") on the inner backend is NOT used for that case, so a caller cannot widen scope past what the agent's config permits. - `get` / `list` / `forget` / `count` / `purge_*` / `reindex` / `store_procedural` / `recall_namespaced` / `export` all forward to the inner backend. Trait surface that does not yet expose an agent-scoped form (get/list) stays pass-through; v0.8.1 follow-up adds those variants. Re-exported as `zeroclaw_memory::AgentScopedMemory`. Four unit tests using SqliteMemory as the inner backend: - `store_routes_through_store_with_agent_and_persists_attribution`: rows stored via the wrapper come back on a subsequent recall. - `recall_excludes_other_agent_rows_when_allowlist_omits_them`: a row pre-seeded with a different agent_id does NOT surface through a wrapper whose allowlist excludes that agent. - `recall_includes_allowlisted_sibling_rows`: a row pre-seeded with the sibling's agent_id DOES surface when the wrapper's allowlist includes that sibling. - `recall_for_agents_intersects_caller_allowlist_with_bound_allowlist`: a caller asking for a rogue UUID outside the bound allowlist gets zero rogue-attributed rows back. Cross-backend allowlist entries are rejected at config-load by the P3 validator; the wrapper therefore only ever sees same-backend sibling UUIDs in `allowed_agent_ids`. The Markdown variant (`AgentScopedMarkdownMemory`) lands in the next commit — its model is "compose own + peer MarkdownMemory instances and union with attribution," not the row-filter shape this generic wrapper uses.
…zeroclaw-labs#6272) The Markdown-shaped sibling to AgentScopedMemory<M>. Markdown has no shared store — each agent's attribution IS its on-disk path (<install>/agents/<alias>/workspace/MEMORY.md plus memory/YYYY-MM-DD.md), so cross-agent recall composes multiple MarkdownMemory instances rather than filtering rows. Module crates/zeroclaw-memory/src/agent_scoped_markdown.rs: - MarkdownPeer { alias, memory } — resolved sibling: alias plus a MarkdownMemory pointing at that sibling's workspace dir. - AgentScopedMarkdownMemory { own_alias, own, peers } — wrapper: - store / store_with_metadata / store_with_agent: write only to the bound agent's own MarkdownMemory. The agent_id parameter on store_with_agent is intentionally ignored (path-based attribution is the model — the bound dir IS the attribution). - recall: union across own + every peer, attributing each row by prefixing its key with [<alias>] so the merged output is self-describing without changing the trait surface or MemoryEntry shape. - recall_for_agents: filter the union to the caller-supplied alias set. Treats the trait's `allowed_agent_ids: &[&str]` as opaque identifiers since Markdown does not have a UUID indirection — the runtime factory passes aliases for Markdown agents and UUIDs for SQL agents. - get / list / forget / count: forward to own (these don't yet have an agent-scoped form on the trait). - health_check: own's signal only; missing peer dirs are logged at recall time, not surfaced as unhealthy (a missing peer means the operator hasn't created that sibling yet — current agent is fine). Three unit tests: - store_writes_only_to_own_backend - recall_unions_own_and_peer_rows_with_attribution - recall_for_agents_filters_to_alias_intersection Re-exported as zeroclaw_memory::AgentScopedMarkdownMemory and zeroclaw_memory::MarkdownPeer. The runtime factory that builds either AgentScopedMemory<M> (for Sqlite/Postgres/Lucid/Qdrant agents) or AgentScopedMarkdownMemory (for Markdown agents) lands in the next commit, alongside the Agent::from_config + cron + DelegateTool wiring that consumes it.
…from_config + cron (zeroclaw-labs#6272) The runtime entry point that builds each agent's `Memory` instance. Previously every code path hand-rolled `create_memory(...)` against the install-wide `config.memory`; now `create_memory_for_agent` returns an `AgentScopedMemory` (or `AgentScopedMarkdownMemory`) keyed on the agent's resolved identity and `read_memory_from` allowlist. In `crates/zeroclaw-memory/src/lib.rs`: - `agent_workspace_dir(config, alias)` — resolves the per-agent workspace dir from `[agents.<alias>.workspace.path]` if set, else derives `<install>/agents/<alias>/workspace/` from `config.config_path.parent()`. Stable across the v0.8.0 filesystem migration since it keys off the install root, not the (still- legacy-shaped) `config.workspace_dir`. - `create_memory_for_agent(config, alias, api_key) -> Arc<dyn Memory>` — top-level factory: - Markdown agents: build the bound MarkdownMemory + a peer `MarkdownPeer` per `read_memory_from` entry; wrap with `AgentScopedMarkdownMemory`. - None agents: pass through `Arc<NoneMemory>`. - Sqlite/Postgres/Lucid/Qdrant agents: build the install-wide inner backend via the existing `create_memory_with_storage_and_routes` factory; resolve the bound agent's identifier and the allowlist identifiers via the new `Memory::ensure_agent_uuid` trait method (SQL backends look up agents-table UUIDs; Qdrant/None use the alias verbatim); wrap with `AgentScopedMemory`. `AgentScopedMemory` itself is now non-generic — it holds `Arc<dyn Memory>` instead of `Arc<M>`. The previous generic was never used at multiple types (every call site erased to `dyn Memory`), and the non-generic shape lets the per-agent factory hand back a single concrete type regardless of the agent's chosen backend. New trait method `Memory::ensure_agent_uuid(alias) -> Result<String>`: - SqliteMemory + PostgresMemory + LucidMemory override to insert-or-fetch the agents-table row for `alias` and return its UUID (the existing `ensure_default_agent_uuid` is now a thin wrapper around the same per-alias helper). - AuditedMemory + AgentScopedMemory forward. - Default impl returns the alias verbatim — correct for Markdown, Qdrant, None, which have no UUID indirection at the storage layer. PostgresMemory now stores `qualified_agents` alongside `qualified_table` so `ensure_agent_uuid` can build the `<schema>.agents` reference at call time. Wiring (replaces install-wide `create_memory` calls with `create_memory_for_agent`): - `crates/zeroclaw-runtime/src/agent/agent.rs`: `Agent::from_config_with_session_cwd_and_mcp`. - `crates/zeroclaw-runtime/src/agent/loop_.rs`: both `run` entry points (interactive loop + non-interactive single-shot). - `crates/zeroclaw-runtime/src/cron/scheduler.rs`: cron's pre-prompt memory recall (line ~335) and the post-failure session-purge cleanup (line ~427) — both now key off the cron-owning agent's alias so a Markdown-backed agent's cron job recalls from its own dir, a SQLite-backed agent's cron job filters by its agent_id, etc. The same-backend invariant on `read_memory_from` is enforced at config load (P3); this commit therefore never has to reconcile mixed-backend allowlists at runtime. End-to-end now: an agent loop that calls `mem.recall(...)` goes through the wrapper, which calls the inner backend's `recall_for_agents` with the resolved allowlist, which (for SQL) filters via WHERE agent_id IN (...) plus the legacy NULL case. A `mem.store(...)` goes through `store_with_agent` with the bound agent's UUID — every persisted row is attributable to one agent.
Removes the dedicated management CLI (Commands::Agents enum variant, AgentsCommands subcommands, handle_agents_command dispatcher, plus the agents_create / agents_delete / agents_list helpers). Operators add and remove agents in this PR by editing [agents.<alias>] blocks directly; the runtime creates the per-agent workspace dir and seeds bootstrap identity files on first agent-loop entry. The dedicated CLI lands with the v0.8.1 session registry that gates active-session refusal on delete. Docs updated: setup walkthrough now uses the config-edit path; the architecture page lists the management CLI under v0.8.1. Net -251 LoC in src/main.rs, -33 LoC in docs.
Six items the previous commits punted that the issue body called out as required: - SubAgent escalation validator: SecurityPolicy::ensure_no_escalation_beyond back with EscalationViolation enum + Display + Error. SubAgentOverrides surface restored on the SubAgent runtime; SubAgentSpawn::build now takes overrides and runs the subset check. 9 validator tests + 5 builder tests. - SQLite agent_id NOT NULL REFERENCES agents(id): table-rebuild pattern inside migrate_multi_agent (drop FTS, copy rows into memories_new with the constraint, swap, recreate indices/FTS, rebuild FTS content). The bare store + store_with_metadata paths now route through store_with_agent; store_with_agent COALESCEs the agent_id parameter to the default agent's UUID so callers without an agent context still satisfy the FK. - Postgres agent_id NOT NULL REFERENCES + DO-block FK creation guarded by pg_constraint lookup so re-runs are idempotent. - schema_version metadata table on both backends, stamped at the end of each successful migration. - send_message_to_peer agent-loop tool. Validates the target via the new ResolvedPeers::is_known_peer (strict outbound check, distinct from allows_inbound's default-accept inbound semantics) plus a per-agent channel-listener guard. Dispatches via cron::scheduler::deliver_announcement to keep the runtime → channels dependency direction. - Agent-loop self-loop guard fallback. peers::should_drop_self_loop is called from process_channel_message after the SDK-side Channel::drop_self_messages returns false; both layers use identical normalization. - Hermetic peer-group E2E in tests/system/multi_agent_e2e.rs that walks the full authorization surface: resolver admit/reject for peer agents and external peers, plus tool-level rejection of non-peer targets. Tests existing in agent_scoped now provision real agent rows via ensure_agent_uuid before attributing memories — required by the FK.
zeroclaw_runtime::agents ships create_agent / delete_agent / list_agents as the runtime-layer capability that future operator surfaces (CLI, web admin, gateway endpoint) call. Each function keeps the on-disk shape consistent across every entry point: write the [agents.<alias>] config block, create the per-agent workspace dir, seed bootstrap identity files, and atomically save the config (or strip the block, remove the dir, and rewrite peer-group memberships in one save). zeroclaw_runtime::agents::session_registry is the process-global RAII gate that delete_agent consults: register_session(alias) returns a SessionGuard whose Drop decrements the per-alias counter; delete bails on active_sessions_for(alias) > 0 unless force_active_sessions=true. 13 new unit tests cover create roundtrip, duplicate-alias refusal, unknown-risk-profile refusal, delete with peer-group strip, dry-run no-op, unknown-alias refusal, active-session refusal + cycle, force override, sorted list summaries, and the registry's RAII semantics.
WareWolf-MoonWall
left a comment
There was a problem hiding this comment.
Read through the file list and sampled the key architectural additions: Memory::store_with_agent, Memory::recall_for_agents, Memory::ensure_agent_uuid, Channel::drop_self_messages, the per-agent workspace config in zeroclaw-config, AgentScopedMemory, the new agents/ module and session registry in the runtime, and the config schema extensions. Directional notes follow — understood this is not going to master and a follow-up "make it actually work" PR is planned.
Memory trait extension: store_with_agent and recall_for_agents are required trait methods, which is a breaking change for every external Memory implementor. The ensure_agent_uuid default (returns alias verbatim) softens the impact for backends without UUID indirection. Since this lands in the v0.8.0 breaking-change window that's the right place for it — worth being explicit in the final PR description that downstream implementors must stub at minimum store_with_agent and recall_for_agents.
drop_self_messages default implementation: The @-strip and case-fold normalization is correct and the edge-case test (empty handle after stripping @ must not match every sender) is the right guard. Numeric IDs (Discord snowflake, Matrix event ID) pass through trim_start_matches('@') unchanged — the comment correctly notes they are already as_str compatible. No action needed; confirming the logic holds.
Observability trait removal (Hand* events/metrics): The removal of HandStarted, HandCompleted, HandFailed and the three Hand* metrics narrows the surface before the multi-agent loop adds its own observability. Worth confirming no active observer implementation in zeroclaw-providers or zeroclaw-channels still references these variants before the final merge pass.
Open question for the follow-up PR: Where does agent spawn/teardown lifecycle management live, and how does it interact with the session registry? Specifically: what happens to an agent's AgentScopedMemory if the parent session is cancelled mid-turn?
No formal verdict — directional review per @singlerider's request. Happy to do a full blocking review pass once the follow-up is folded in.
…y Memory method The prior wrapper only intercepted store/recall and passthrough'd get, list, forget, count, purge_namespace, purge_session, recall_namespaced, and export. With the wrapper as the Arc<dyn Memory> the agent-loop tools see, every passthrough was a privilege-escalation surface: an agent could read sibling rows by guessed key, list every install row, delete sibling rows by key, or purge another agent's session. Now MemoryEntry carries optional agent_id (populated by every backend on read), the wrapper post-filters reads by the bound + allowlisted set, refuses cross-agent forgets, scopes purge_session to bound rows, refuses cross-agent purge_namespace, and rejects store_with_agent calls that target a foreign agent_id rather than silently overriding. Tests exercise the read filter, the cross-agent forget refusal, list attribution filtering, foreign agent_id store refusal, purge_namespace refusal, and the bound-only purge_session shape.
…tching The prior validator covered allowed_roots (rw + ro), allowed_commands, workspace_only, max_actions_per_hour, and max_cost_per_day_cents — and used exact PathBuf equality, so a child policy that legitimately narrowed /srv to /srv/app failed validation. Several escalation axes were missing entirely: a child policy with autonomy = Full under a ReadOnly parent was accepted, as were child policies that dropped a parent's forbidden_paths entry, expanded shell_env_passthrough, raised shell_timeout_secs, or flipped block_high_risk_commands or require_approval_for_medium_risk to false. Now the validator checks each of those, uses path containment (canonical plus literal-path fallback) so child narrowings inside parent roots are accepted, and the EscalationViolation enum carries one variant per axis. AutonomyLevel grows PartialOrd/Ord so the comparison is direct. Also drop the stale active_workspace.toml entries from is_runtime_config_path: the marker file was retired with the [workspace] block. Tests cover each new axis on both the rejection and the legitimate- narrowing path.
…discarding them
Both spawn sites (cron JobType::Agent dispatch and the spawn_subagent
agent-loop tool) constructed a SubAgentContext via SubAgentSpawn::for_agent
+ build, then handed only ctx.agent_id to the tracing span and dropped
the validated policy and allowlist. agent::run rebuilt both surfaces
from config, so the validator's subset proof never reached the loop —
inherits-verbatim worked by accident, and any future caller-supplied
narrowing override would have been silently ignored.
Adds AgentRunOverrides { security, memory } to loop_::run; both spawn
sites pass Some(subagent_ctx.policy.clone()) so the validated policy
takes effect. Memory override is left None for v0.8.0 inherits-
verbatim and documented as the slot the v0.8.1 [agents.<alias>].subagent_*
config block plumbs into. Existing call sites (interactive launch,
heartbeat phase 1/2, scheduled-leak test) pass
AgentRunOverrides::default().
Also normalize SpawnSubagentTool's empty-prompt error to the
structured ToolResult shape used by every other failure path so the
agent loop sees one shape regardless of which step rejected the call.
Channel::self_handle() defaulted to None, so the orchestrator's two- layer self-loop guard (Channel::drop_self_messages SDK side and peers::should_drop_self_loop agent-loop fallback) was dormant for every channel impl that didn't override — both layers consult the same source. Telegram's bot_username cache, IRC's configured nickname, Discord's token-encoded user_id, and Slack's auth.test user_id are each reachable; expose them through self_handle so the guard runs. For Slack, the cache is populated on the inbound listen path so the sync self_handle() call doesn't have to issue an HTTP round-trip. Update the trait doc to be honest about what overriding means: the default leaves both guard layers dormant, so channels handling inbound traffic must override. Outbound-only channels (webhook, gmail-push, voice-call) keep the default; other inbound channels beyond the four listed remain on the default and rely on per-impl filtering.
is_known_peer / allows_inbound previously normalized only the external- peer side: agent-peer matching used raw set lookup, so is_known_peer( channel, "@beta") rejected a stored alias of "beta" and a config of [agents.Beta] vs an inbound origin of "beta" diverged. Aliases are config map keys with no case enforcement so the chat-channel idiom (@-prefix, mixed case) needs symmetric normalization. resolve_peer_set now stores agent aliases case-folded with @ stripped; is_known_peer / allows_inbound apply identical normalization to the target side. The orphaned doc comment on ResolvedPeers (a method-doc above the wrong method) is moved to its own method. Tighten send_message_to_peer's @-prefix normalization test so the success path actually asserts the peer-set check accepted, not just that the unrelated delivery layer fell through.
…delete DeleteReport gains active_sessions: usize so a dry-run inspection can see whether a real delete would be refused without coupling the inspection to the active-session gate. The destructive path emits a tracing::warn when force_active_sessions=true overrides the refusal — otherwise a scripted operator running with the override leaves no record that an in-flight agent's workspace was ripped out from under the running loop. Also delete the #[allow(dead_code)] _AgentAliasReference hack and restrict the AgentAlias import to the test module where it's actually used.
|
If I'm reading this right, it looks like the SQLite and PostgreSQL backends still have one DB and not DB or table per agent, and only add a new column for which agent is emitting it. Have we considered adding more tables for different purposes? I'm looking at either a new table or a new DB for ACP session replay because it's going to need different schema. |
|
Looks directionally correct. A few issues identified in the current state: Issue closure. The PR currently overstates the #6272 closure. The body says Peer delivery. The peer-message path looks like it proves authorization, not live delivery. Migration path. The migration/storage path needs another pass before old installs are trustworthy. The filesystem migration moves Memory boundary. The memory backends do not all enforce the new attribution boundary the same way yet. Postgres promotes Security surfaces. The access and SubAgent surfaces need a cleanup pass. |
Cuts the unwired zeroclaw_runtime::agents lifecycle module + session_registry (the v0.8.1 zeroclaw agents CLI surface that would consume them does not ship in this PR; the runtime module was dead code on this branch) and lands the four substantive fixes from @Audacity88's directional review: - Migration path: Config::load_or_init resolves ZEROCLAW_CONFIG_DIR / ZEROCLAW_WORKSPACE before running the filesystem migration so custom installs migrate; config.workspace_dir now points at the migrated default- agent workspace so legacy install-wide callers (cost::CostTracker, sop, skills, plugins, memory CLI) read the live agent dir instead of an orphaned legacy path. - Memory boundary: PostgresMemory::store routes through store_with_agent (COALESCE to default agent UUID); SqliteMemory + PostgresMemory recall_for_agents push the agent_id filter into the query layer (WHERE agent_id IN / ANY) and drop the post-fetch attribution lookup that let legacy NULL-agent_id rows leak to scoped callers; QdrantMemory recall_for_agents uses a payload `must` filter on agent_id and store_with_agent attributes None to "default". - Peer delivery (live, not authorization-only): SendMessageToPeerTool resolves agent-alias targets to in-process delivery via agent::loop_::process_message (bot identity is shared across agents on the same channel, so an outbound channel send would loop right back inbound); external peers continue through the channel registry's delivery handler. - AccessMode::Write semantics: SecurityPolicy gains an allowed_roots_write_only tier so AccessMode::Write actually grants write access without read access; is_resolved_path_readable refuses write-only paths, is_resolved_path_allowed admits them; ensure_no_escalation_beyond validates the write-only tier as a SubAgent subset axis with a WriteOnlyRootNotInParent EscalationViolation variant. SubAgent budget sharing: PerSenderTracker.buckets becomes Arc<Mutex<...>> so SubAgent runs that take a caller-supplied policy override inherit the parent's live action/cost bucket. Spawning a SubAgent no longer bypasses max_actions_per_hour or max_cost_per_day_cents. Tests: 1465 zeroclaw-runtime + 615 zeroclaw-config + 307 zeroclaw-memory green; new coverage for write-only enforcement, SubAgent budget inheritance under override, and updated for_agent allowlist tier routing.
The previous "address Audacity directional review" commit landed the
new field surfaces (write-only allowlist tier, escalation variant,
shared PerSenderTracker) but four claimed fixes did not actually reach
the call sites the PR body named. This closes them.
Qdrant bare Memory::store leaked agent_id: None into the payload.
Routes the bare entrypoint through store_with_agent so the existing
unwrap_or("default") attaches and the NOT NULL FK / scoped recall must
filter both behave correctly. Mirrors the SQLite and Postgres bare-
store paths.
glob_search and content_search consulted is_resolved_path_allowed (the
write-side check that honors allowed_roots_write_only) for what are
read operations. A directory granted only via AccessMode::Write would
surface through file enumeration / content matching, silently widening
the write-only grant into a read grant. Both call sites now use
is_resolved_path_readable; tests cover the symlink-into-write-only-
root and the absolute-path-under-write-only-root cases.
DelegateTool inherited the caller's SecurityPolicy verbatim with no
subset validation against the delegated target, and the caller's
PerSenderTracker was not surfaced as a deliberate budget-share
relationship. Plumbs Arc<Config> via with_root_config; adds
policy_for_target which builds the target's SecurityPolicy via
SecurityPolicy::for_agent, validates it as a subset of the caller's
via ensure_no_escalation_beyond, and assigns the caller's tracker so
delegated runs consume from the caller's max_actions_per_hour /
max_cost_per_day_cents bucket. execute_sync, execute_background, and
execute_parallel now invoke the helper at the delegate boundary;
escalating targets surface a structured failure instead of running.
Three new tests cover the escalation, tracker-share, and root-config-
absent fallback paths.
SendMessageToPeerTool: the agent-alias branch is fire-and-forget
(tokio::spawn detached), so a "success: true" tool result does not
mean the recipient processed the message. Module doc + the success
output string now name this explicitly so observers diagnosing missing
peer messages read recipient-side spans rather than the sender's tool
output.
Last of the four read tools the PR body lists. Mirrors file_read, glob_search, and content_search: a directory granted only via AccessMode::Write would otherwise leak through pdf extraction since is_resolved_path_allowed honors allowed_roots_write_only by design.
|
Thanks @Audacity88, @WareWolf-MoonWall, @tidux. Walked your findings against the diff and pushed the gaps in 2d3193b + 0185dfe. Bullet-by-bullet, with file:line: Finding 1, issue closure. The Finding 2, peer delivery. Live in-process delivery is real, not authorization-only. Finding 3, migration path. Env var ordering: Finding 4, memory boundary.
Finding 5, security surfaces.
tidux. Per-agent tables / DBs: in scope is the PR body and issue #6272 body have been updated to reflect what actually shipped (no zeroclaw agents CLI claims; new DelegateTool + read-side checklist items added; test totals refreshed). Pre-push gate ( |
DelegateTool::policy_for_target now refuses narrowing in addition to escalation. The spawned agentic loop reuses the caller's parent_tools registry, so a narrower target policy never reaches those tool calls; catching the narrowing at the delegate boundary turns a silent over-grant into a loud refusal that points operators at spawn_subagent for genuinely narrowed runs. SubAgent allowlist fields renamed to make explicit that they carry config aliases (the [agents.<alias>] keys) rather than backend storage identifiers: SubAgentOverrides.allowed_agent_aliases, SubAgentContext.parent_alias / allowed_agent_aliases, SubAgentSpawn.parent_alias / parent_allowed_agent_aliases. Module doc spells out that consumers building an AgentScopedMemory must resolve via Memory::ensure_agent_uuid first (SQL backends use UUIDs from the agents table; Markdown / Qdrant / None use the alias verbatim per the trait default). The in-tree consumer today is zeroclaw_memory::create_memory_for_agent, which already does the resolution. Drop a stale `let _ = agent_config;` in build_enriched_system_prompt that was claiming the parameter was unused; it is used several lines above to resolve the agent's skill bundles.
Summary
integration/v0.8.0(multi-agent runtime is a v0.8.0-only landing).AliasedAgentConfigrename,[agents.<alias>.workspace]withpath/access/unrestricted_filesystem/read_memory_from,[agents.<alias>.memory]with aMemoryBackendKindenum,[peer_groups.<name>]top-level map). Cross-reference validators on every reference. Per-backendagentstable +agent_idmigrations on SQLite, Postgres, and Lucid (Markdown / Qdrant / None per their idiomatic shapes), withagent_idpromoted toNOT NULL REFERENCES agents(id): SQLite via table rebuild + FTS reindex inside an explicitBEGIN IMMEDIATEtransaction withdefer_foreign_keys = ONandPRAGMA foreign_keys = ONon every connection so the FK is enforced; Postgres viaCHECK ... NOT VALIDplusVALIDATE CONSTRAINTplusSET NOT NULLplusADD CONSTRAINT ... NOT VALIDplusVALIDATE CONSTRAINTso the metadata-onlySET NOT NULLand the FK validation are both low-lock on populated tables. Per-backendschema_versionmetadata table.AgentScopedMemory+AgentScopedMarkdownMemorywrappers plumbed intoAgent::from_config, cron's pre-prompt recall, and the post-failure session-purge cleanup. Audacity-review fixes that ship in this PR: the bareMemory::storepath on every backend now attributes un-scoped writes to the synthesizeddefaultagent (Postgres COALESCE on insert, SQLite COALESCE in the existingstore_with_agent, Qdrantunwrap_or("default")payload), so direct callers cannot trip the NOT NULL FK.recall_for_agentspushes theagent_idfilter into the query layer on every backend (PostgresWHERE agent_id = ANY($n), SQLiteWHERE agent_id IN (...)on the candidate-id fetch, Qdrant payloadmustfilter on the search call), eliminating the over-fetch + post-filter pattern that left legacy NULL-agent_id rows visible to scoped callers.Config::load_or_initresolvesZEROCLAW_CONFIG_DIR/ZEROCLAW_WORKSPACEBEFORE running the filesystem migration so custom installs migrate too, andconfig.workspace_dirnow points at the migrated default-agent workspace (<install>/agents/default/workspace/) so legacy install-wide callers (cost::CostTracker,sop,skills,plugins::PluginHost, memory CLI) read the live agent dir instead of an orphaned legacy path.AccessMode::Writeactually grants write-only access: a newallowed_roots_write_onlytier onSecurityPolicyis honored by write-side tools and explicitly NOT consulted byis_resolved_path_readable, sofile_read/pdf_read/glob_search/content_searchrefuse the path whilefile_write/file_edit/git_operationsadmit it.ensure_no_escalation_beyondvalidates the write-only tier as a SubAgent subset axis with aWriteOnlyRootNotInParentEscalationViolationvariant. SubAgent runtime budget sharing:PerSenderTracker.bucketsisArc<Mutex<...>>so a SubAgent run with a caller-supplied policy override inherits the parent's live action / cost bucket; spawning a child no longer bypassesmax_actions_per_hourormax_cost_per_day_cents. DelegateTool boundary validation: every delegate call now resolves the target's per-agentSecurityPolicyviaSecurityPolicy::for_agent, validates it as a subset of the caller's viaensure_no_escalation_beyond, and assigns the caller'sPerSenderTrackerto the resolved policy so delegated runs share the caller's action / cost bucket. Targets whose configured risk profile or workspace access map would widen permissions beyond the caller surface a structuredToolResultfailure at the delegate boundary instead of running. The hand-rolled chat / agentic loops still execute against the caller's parent_tools registry, which is a deliberate single-registry design; rebuilding the registry per delegation is a v0.8.1 follow-on the body does NOT promise here. Live peer delivery:SendMessageToPeerToolresolves agent-alias targets to in-process routing viaagent::loop_::process_message(the bot identity is shared across agents on the same channel, so an outbound channel send would loop right back inbound and the self-loop guard would drop it; agent-to-agent messaging is process-internal by design); external peers continue through the channel registry's delivery handler. The agent-alias path is fire-and-forget (tokio::spawndetached): asuccess: truetool result means accepted-for-processing, not completed, and recipient-side errors only surface through the recipient's own observability and a sender-sidetracing::warn!. Module docs and the success output string make this explicit. Per-agent memory backend factory (zeroclaw_memory::create_memory_for_agent). Filesystem migration from<install>/workspace/to<install>/agents/default/workspace/with timestamped backup + idempotent re-run. Per-agent identity-file load.SecurityPolicy::for_agentpopulatingallowed_roots/allowed_roots_read_only/allowed_roots_write_onlyfrom[agents.<alias>.workspace.access]perAccessMode.file_readconsults the newis_resolved_path_readable(read-only allowlist + POSIX device files like/dev/null).DelegateToolreads sub-agent system prompts from per-agent identity files instead of anAliasedAgentConfig.system_promptstring (field deleted). The SubAgent runtime withSubAgentOverridesand an expandedSecurityPolicy::ensure_no_escalation_beyondsubset validator (autonomy, allowed_roots rw + ro + write-only with path-containment matching, allowed_commands, workspace_only, forbidden_paths in the parent ⊆ child direction, shell_env_passthrough, max_actions_per_hour, max_cost_per_day_cents, shell_timeout_secs, block_high_risk_commands, require_approval_for_medium_risk) wired through a newAgentRunOverridesparameter onagent::runso the validated child policy reaches the agent loop. The two-layer channel self-loop guard (SDK-sideChannel::drop_self_messages+peers::should_drop_self_loopagent-loop fallback) withself_handleoverrides on the four major inbound channels (Telegram viabot_usernamecache, IRC via configured nickname, Discord via token-decoded user_id, Slack via cachedauth.testuser_id) and an honest trait doc about the gap on the remaining inbound impls. Audit / traceagent_aliasfield. Console-formatter[<alias>]prefix with[system]fallback for boot / migration code paths. Peer-group runtime resolver with the strict outboundis_known_peercheck + symmetric@-prefix and case normalization on both agent-peer and external-peer matching. Retires every legacy primitive the new design supersedes:[workspace]block,WorkspaceManager,WorkspaceTool,WorkspaceBoundary,MemoryNamespaceConfig,NamespacedMemory<M>,active_workspace.tomlmarker including its protected-config-path entries,src/hands/module and theHand*observability events / metrics (HandStarted/HandCompleted/HandFailed), thesystem_promptconfig string onAliasedAgentConfig.[agents.<alias>].subagent_*config block that supplies caller-definedSubAgentOverridesis deferred to v0.8.1; the override type, the validator, AND the runtime wire-through (AgentRunOverrides) are in this PR. Does NOT update web frontend types or.potranslations (deferred to @Audacity88's follow-up). Does NOT introduce a PostgresCREATE INDEX CONCURRENTLYstep (documented as a post-deploy operator action; theSET NOT NULLand FK validation are low-lock via theNOT VALID+VALIDATEpattern). Does NOT add per-agent secret namespacing. Does NOT overrideChannel::self_handleon every inbound channel impl: the four highest-traffic ones (Telegram, IRC, Discord, Slack) are covered; other inbound channels (Matrix, Bluesky, Notion, Mochat, Linq, WeCom, QQ, Wati, ACP) keep the defaultNoneand rely on per-impl filtering until a v0.8.1 audit pass.#[serde(default)], so existing 0.7.x configs deserialize unchanged through the V2→V3 path. TheAliasedAgentConfigrename touches 23 files but is a pures/Delegate/Aliased/plus a doc catch-up commit. Theagentstable +agent_idcolumn onmemorieslands as nullable + backfilled, so the SQLite / Postgres migration is idempotent and survives a re-run on an already-migrated install. The PostgresSET NOT NULL+ FK promotion usesCHECK NOT VALID+VALIDATEto keep the locks short on populated production tables. The SQLite rebuild runs insideBEGIN IMMEDIATEso a crash mid-rebuild rolls back to the pre-migration shape (the timestamped backup is the secondary recovery path). The[workspace]andmemory_namespaceretirements drop fields off the schema but the V2→V3 migration silently strips them from incoming configs: no manual migration step required from operators. TheMemoryEntry.agent_idfield is additive with#[serde(default)], so externalMemoryimplementors and any persistedMemoryEntryJSON deserialize unchanged. The self-loop guard sits on a defaultChanneltrait method, so a forgotten channel impl can't silently leak; the four major inbound channels override it explicitly. DownstreamMemoryimplementors: every backend now implementsstore_with_agent,recall_for_agents, andensure_agent_uuiddirectly. Out-of-tree implementations must stub at minimumstore_with_agentandrecall_for_agents(theensure_agent_uuiddefault returns the alias verbatim, which is correct for backends without UUID indirection). Subsystems touched: every memory backend, every channel impl (via the trait default), the cron scheduler, the audit-log emitter, and the runtime-trace emitter; the onboarding wizard loses itsWorkspacesection and the per-agentmemory_namespacestep.Validation Evidence (required)
cargo +nightly fmt --all -- --check cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings cargo test --workspace --exclude zeroclaw-desktop --features ci-allcargo +nightly fmt --all -- --check— clean (no diff, exit 0).cargo clippy --workspace --exclude zeroclaw-desktop --all-targets --features ci-all -- -D warnings—Finished dev profile [unoptimized + debuginfo] target(s) in 2m 12s. Zero warnings under-D warnings.cargo test --workspace --exclude zeroclaw-desktop --features ci-all— every per-crate test bucket green. Notable green totals: 1656 zeroclaw-runtime + 615 zeroclaw-config + 307 zeroclaw-memory + 1127 zeroclaw-tools + 768 channels + 1653 hardware + 1125 plugins. Tests added in this PR cover:SubAgentSpawn::{for_agent, build}(known-alias success, unknown-alias rejection, inherits-verbatim, escalating-policy rejection, narrowed-allowlist subset, parent action-budget inheritance under override); 16 axes onSecurityPolicy::ensure_no_escalation_beyond(rw-root, ro-root, write-only-root, command, workspace_only, max_actions, max_cost; identical-policy and narrowed-child accept paths; rw→ro downgrade accept; subpath narrowing inside parent root; autonomy escalation; dropped forbidden_paths entry; expanded shell_env_passthrough; higher shell_timeout_secs; disabled block_high_risk_commands; disabled require_approval_for_medium_risk);SpawnSubagentTool(empty/missing prompt rejection via structuredToolResult, unknown parent surfaces structured failure);SendMessageToPeerTool(non-peer rejection, channel-listener rejection, empty-arg rejection, external-peer normalization with peer-set-pass assertion);ResolvedPeers::is_known_peer;peers::should_drop_self_loop;Channel::drop_self_messages; the new schema cross-reference validators; the SQLiteagentsmigration;AgentScopedMemory(own-recall / sibling-allowlist / cross-agent isolation / caller-allowlist intersection / get cross-agent filter / forget cross-agent refusal / list attribution filtering / store_with_agent foreign-id refusal / purge_namespace refusal / purge_session bound-only);AgentScopedMarkdownMemory; the filesystem migration;is_resolved_path_readablewrite-only-root rejection + write-side admission;glob_searchsymlink-into-write-only-root refusal;content_searchabsolute-path-under-write-only-root refusal;DelegateTool::policy_for_target(escalating-target rejection at delegate boundary, caller-tracker inheritance via sharedPerSenderTrackerbucket, root_config-absent fallback to caller's policy);SecurityPolicy::for_agentaccess-tier routing including the new write-only allowlist tier;zeroclaw_runtime::peers::resolve_peer_set(mutual membership, external peers, ignore subtraction,allows_inboundexternal normalization,is_known_peerstrict outbound, agent-peer@-prefix and case normalization); E2E intests/system/multi_agent_e2e.rs(legacy upgrade with backup, two-agent isolated memory, peer-group routing with in-process delivery). Tests deleted: everyWorkspaceManager/WorkspaceTool/WorkspaceBoundary/NamespacedMemory<M>test (modules are gone); thet14e_memory_namespace_wideningmigration test; the active_workspace marker tests; theworkspace_double_run_is_idempotent_on_diskonboard section test; thesection_has_signal_workspace_tracks_enabled_flagtest.agentstable migration runs idempotently on a re-init (no duplicate row, no failedINSERT). Verified the cronJobType::Agentdispatch span shape includesparent_alias,run_id,spawn_site = "cron". Verified thespawn_subagenttool span shape includes the same fields withspawn_site = "tool". Verified thatSubAgentSpawn::for_agentrejects an unknown parent alias with a structured failure that names the alias. Verified a real 0.7.xconfig.tomlwith a populated[workspace]block deserializes cleanly through V2→V3 with the legacy fields silently stripped. Verified that adding an[agents.researcher]block toconfig.tomlcauses the runtime to create<install>/agents/researcher/workspace/and seed bootstrap identity files on first agent-loop entry; the agent then loads its identity files from the per-agent dir. Verified the V3 default-agent path on fresh install: a freshly initialized config now opens its SQLite memory at<install>/agents/default/workspace/memory/brain.db(the previously orphaned legacy path is no longer recreated). VerifiedAccessMode::Writesemantics with a unit test that assertsis_resolved_path_allowedadmits the path whileis_resolved_path_readablerefuses it. Did NOT verify a live multi-agent peer-group message exchange across a real Telegram channel (covered by the in-process E2E and the unit-tested resolver invariants).Security & Privacy Impact (required)
Yes. Adds the SubAgent surface (spawn_subagenttool + cronJobType::Agentrouting). Both spawn sites funnel throughSubAgentSpawn::for_agent(config, alias).build(SubAgentOverrides::default())and inherit the parent agent's identity verbatim by default: sameSecurityPolicy, same memory allowlist, same secret store. TheSubAgentOverridestype ships in this PR. The validated context now reaches the agent loop viaAgentRunOverrides { security, memory }onagent::run(previously discarded; both spawn sites now passSome(subagent_ctx.policy.clone())for the security side). The[agents.<alias>].subagent_*config block that plumbs caller-defined narrowing intoSubAgentOverrideslands in v0.8.1. Any caller-supplied policy override is validated as a subset of the parent viaSecurityPolicy::ensure_no_escalation_beyond, which now covers: autonomy, allowed_roots (rw + ro + write-only with path-containment matching), allowed_commands, workspace_only, forbidden_paths in the parent ⊆ child direction, shell_env_passthrough, max_actions_per_hour, max_cost_per_day_cents, shell_timeout_secs, block_high_risk_commands, and require_approval_for_medium_risk. UUID-set containment on the memory allowlist still applies. Both checks chain a preciseEscalationViolationfor diagnostics. SubAgent runtime budget:PerSenderTracker.bucketsisArc<Mutex<...>>so child runs taking a caller-supplied policy override inherit the parent's live action / cost bucket. A child cannot bypass the parent'smax_actions_per_hourceiling by spawning. DelegateTool boundary: delegate now plumbs an optionalArc<Config>and resolves the target'sSecurityPolicyper call viaSecurityPolicy::for_agent; the resolved policy is validated as a subset of the caller's viaensure_no_escalation_beyond(rejecting any target whose risk profile or workspace access would widen rights beyond the caller) and inherits the caller'sPerSenderTrackerso delegated actions count against the caller's budget. Escalating targets surface a structuredToolResultfailure instead of running. Also addsSendMessageToPeerTool(peer-set authorized outbound; agent-alias targets route in-process, external peers go to the channel registry). SQLitePRAGMA foreign_keys = ONis now enabled on every connection so the multi-agent FK is actually enforced (it was unenforced before — declarative only). AccessMode::Write now grants write-only access cleanly: a newallowed_roots_write_onlytier onSecurityPolicyis honored by write-side path checks and explicitly NOT consulted by read-side path checks, soAccessMode::Writeno longer silently lets the bot read what it's only meant to write.No.No. Per-agent secret namespacing is deferred to v0.8.1 per the plan; the single workspace-wideSecretStoreis unchanged in this PR. The retired[workspace].isolate_secretsflag was a no-op stand-in that the multi-workspace primitive never delivered on; its removal does not change behavior.No. New tests use placeholder aliases (alpha,beta,agent-uuid-alpha,agent-uuid-rogue).Yes, describe the risk and mitigation: SubAgent runtime is a privilege-inheritance primitive; the risk is a child run obtaining rights the parent doesn't have. Mitigation: every spawn site funnels throughSubAgentSpawn::for_agent(config, alias).build(overrides);buildruns the expandedSecurityPolicy::ensure_no_escalation_beyondon any caller-supplied policy override AND copies the parent'sPerSenderTrackerinto the child policy so action / cost budgets are shared, AND the validated context reaches the agent loop. The audit's privilege-touching surfaces are: SubAgent spawn (validator + runtime wire-through + budget inheritance all shipped),AgentScopedMemory(now post-filters every read by the bound + allowlisted set, refuses cross-agent forget / purge_namespace, and rejectsstore_with_agentcalls that target a foreign agent_id rather than silently rewriting them),AccessMode::Write(write-only allowlist tier separates write grants from read grants).Compatibility (required)
Yes(with documented schema retirements). Every new field is#[serde(default)](including the new optionalMemoryEntry.agent_id). Theagentstable lands as a fresh CREATE; theagent_idcolumn onmemorieslands nullable, backfilled to the synthesizeddefaultagent's UUID, and indexed. Re-running the migration on an already-migrated install is idempotent (now detected viaPRAGMA table_info+PRAGMA foreign_key_listrather than substring-matching DDL). The retired schema surface ([workspace],memory_namespaces, per-agentmemory_namespace,KnowledgeConfig.cross_workspace_search, theactive_workspace.tomlmarker file) is silently stripped by the V2→V3 migration: operators do not need to edit theirconfig.tomlby hand.Yes(additions and retirements both). New:[agents.<alias>.workspace]withpath/access/unrestricted_filesystem/read_memory_from;[agents.<alias>.memory]with aMemoryBackendKindenum;[peer_groups.<name>]top-level map; always-on agent-loop toolsspawn_subagentandsend_message_to_peer;agent_alias: Option<String>on audit-log and runtime-trace events. Retired: top-level[workspace]block (entire struct),[memory_namespaces.<alias>]map and the per-agentmemory_namespacefield that referenced it,KnowledgeConfig.cross_workspace_searchfield,OnboardSection::Workspaceenum variant + the--workspace-onlylegacy CLI flag, theHand*observability events and metrics.NoorYesto either: exact upgrade steps for existing users — none required. The V2→V3 migration handles the legacy fields silently. Operators relying on--workspace-onlyshould update their scripts (the wizard now starts atproviders); operators relying on theactive_workspace.tomlmarker for switching profiles should setZEROCLAW_CONFIG_DIRinstead. Operators who setZEROCLAW_CONFIG_DIRorZEROCLAW_WORKSPACEon custom installs will now have their legacy<install>/workspace/migrated to<install>/agents/default/workspace/on first boot (previously the migration silently skipped non-default install roots).Rollback (required for
risk: mediumandrisk: high)This PR is
risk: high(touches schema, memory layer, security policy, the runtime spawn surface, and retires legacy primitives in flight).git revert <merge-sha>onintegration/v0.8.0. The branch is sequenced so reverting the merge cleanly unwinds both the additive surface and the retirements. The DB schema is forward-compatible: theagentstable and theagent_idcolumn onmemoriesremain after a revert (operators can drop them manually if they want a clean back-out). The filesystem migration is operator-recoverable via the timestamped backup at<install>/backup-<timestamp>/legacy-workspace/.#[serde(default)], so a revert silently drops them. Thespawn_subagentandsend_message_to_peertools are registered unconditionally; if a runtime kill switch is required post-deploy it can be added in a one-line follow-up gating the registration onroot_config.subagent spawn failed:in the cron dispatch logs;subagent run failed:from the agent-loop tool;agents.<alias>.workspace.access[<i>] = ...validation errors at config load (a freshly deployed config has a self-reference or dangling alias the validator should have caught);subagent policy override escalates beyond parent:followed by anEscalationViolationdiscriminant when a caller-supplied override violates the validator's subset rules;AgentScopedMemory refuses ...when an agent loop tool tries a cross-agent operation the wrapper does not permit;peer-message in-process delivery failedfromsend_message_to_peerwhen the recipient agent'sprocess_messagefails;[system] filesystem migration failed (continuing with legacy layout)at boot (the migration is non-fatal; the install keeps running on the legacy layout while the operator investigates). The audit-logagent_aliasfield showing up asnullfor system-level events (boot, scheduler ticks not bound to a specific agent) is expected.Supersede Attribution (required only when
Supersedes #is used)Not applicable.