Skip to content

A2A late-reply fold-back to primary session #424

@rockfordlhotka

Description

@rockfordlhotka

Background

When a subagent invokes A2A and its reply arrives after the subagent terminates, the result is silently dropped (see A2ATaskResultHandler:78-82 and siblings). PendingA2ATask already carries PrimarySessionId — the routing data exists, just unused on the late path.

The existing a2aAwaiter already prevents most late arrivals — subagents block on outstanding A2A before publishing their result (SubagentRunner.cs:283-300). The fold-back triggers when the awaiter times out, the subagent is cancelled, or the primary cancels the subagent's parent task. This is the safety net, not the hot path.

(Tool-profile scoping for this new handler — and the rest of the in-process roles — is split out to #425 and will land after this issue. Until then, the new handler uses the unfiltered tool set, matching today's UserMessageHandler surface.)

Design

In the 4 A2A handlers (A2ATaskResultHandler, A2ATaskStatusHandler, A2ATaskErrorHandler, InputRequiredHandler), after tracker.TryRemove(correlationId) returns a PendingA2ATask:

  1. If PendingA2ATask.SubagentSessionId is set AND SubagentManager.IsActive(subagentSessionId) → route to subagent (current path; existing a2aAwaiter handles the still-running case).
  2. Else if PendingA2ATask.PrimarySessionId is set → fold back to primary (new path).
  3. Else → log Warning and drop (current behavior).

Fold-back path:

  • Stash the reply payload to working memory under notifications/a2a/{subagentTaskId}/{kind} (where kind is one of result | status | error | input-required) in the primary's namespace.
  • Append a one-line entry to notifications/index so list_working_memory surfaces pending notifications.
  • Publish a new LateA2ANotificationMessage:
record LateA2ANotificationMessage(
    string PrimarySessionId,
    string SubagentTaskId,
    string SubagentName,
    string PeerAgent,
    NotificationKind Kind,
    string WorkingMemoryKey);
  • LateA2ANotificationHandler (new, registered in RockBot.Agent/Program.cs) receives the message, loads the primary session, and runs AgentLoopRunner.RunAsync (unfiltered tool set; see note above re: Per-invocation tool profiles for in-process roles #425) with a prompt of the form: A late {kind} arrived from A2A peer {peer} for your completed subagent {name}. The payload is in working memory at {key}. Read it, decide whether to act on it, and inform the user if relevant. The prompt requires the model to surface the context to the user.

Why a message + handler rather than a synchronous call: the bus handles queueing, multiple late arrivals serialize into separate primary turns without colliding with an in-progress turn, and the original A2A handler returns promptly.

Primary-agent directive update: add a line telling the primary to check notifications/ on each turn as defense-in-depth.

Implementation order

  1. src/RockBot.Subagent/SubagentManager.cs — add IsActive(subagentSessionId) accessor backed by existing SubagentEntry registry
  2. src/RockBot.A2A/LateA2ANotificationMessage.cs (new) — record + NotificationKind enum
  3. src/RockBot.A2A/LateA2ANotificationHandler.cs (new) + registration in src/RockBot.Agent/Program.cs
  4. Fold-back routing in A2ATaskResultHandler.cs, A2ATaskStatusHandler.cs, A2ATaskErrorHandler.cs, InputRequiredHandler.cs: WM stash → publish LateA2ANotificationMessage → return
  5. Primary-agent directive update (src/RockBot.Agent/agent/directives.md) — line telling the primary to check notifications/ on each turn
  6. Tests:
    • Fold-back routing tests (subagent active vs terminated branches; missing primary session)
    • LateA2ANotificationHandler happy path + missing-WM-key path

Tests

  • MSTest + Rocks (per repo convention)
  • No ROCKBOT_RABBITMQ_HOST-gated integration test changes needed — this is in-process routing, not transport

Risks

  • Existing a2aAwaiter already prevents most late arrivals. This fix is the safety net.
  • Notification storms. Several late replies arriving back-to-back create N primary turns. Acceptable for v1; bus serializes them. If noisy in practice, add debounce later (handler reads notifications/index and consolidates pending entries into one turn).
  • User-side surprise. The primary speaks without a user prompt. The LateA2ANotificationHandler prompt requires the model to state the source so it is not opaque.
  • Cancellation correctness. If PrimarySessionId resolves to nothing (CLI one-shot, disconnected user), the handler drops cleanly with a Warning. Matches existing "session not found" patterns.

Out of scope

  • Per-invocation tool profiles — split to Per-invocation tool profiles for in-process roles #425
  • Notification debounce/coalescing (add only if storms materialize)
  • Surfacing notifications to the user without an intervening primary-agent turn
  • Changing AgentLoopRunner.RunAsync's signature

Related

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions