You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The current agent loop (`pkg/agent/loop.go`) is a black box — external code cannot observe internal execution state, hook into the execution process, interrupt ongoing processing, or inject messages mid-turn. This proposal redesigns the agent loop as an event-driven, hookable, interruptible, and appendable system.
What is the problem?
The existing runAgentLoop → runLLMIteration pipeline has four architectural limitations:
No observability — The loop emits no events. External consumers (UI, logging, automation) cannot know what phase the loop is in, which tool is executing, or why a turn ended.
No hook points — There is no way for external code to intercept or modify LLM requests, approve/deny tool executions, or alter loop behavior at runtime.
No interrupt mechanism — Once a turn starts, it cannot be stopped. There is no graceful interrupt (skip remaining tools, let LLM summarize) or hard abort (cancel everything immediately).
No message injection — External code cannot inject messages into an ongoing turn (steering) or queue messages for after the turn ends (follow-up).
Additionally, tool execution is fully parallel via WaitGroup with no checkpoints between tools, making it impossible to check for interrupts or new steering messages between tool executions.
Goals & Non-goals
Goals
Event transparency: Every critical phase inside the loop emits events to external consumers
Hook control: External code can intercept and modify behavior at key phases
Interruptible: Support both Graceful Interrupt and Hard Abort
Appendable: Support Steering (affects current Turn) and FollowUp (queued for later)
Smart tool parallelism: Read-only tools run in parallel, mutating tools run sequentially
Non-goals
Do not change the MessageBus inbound/outbound model
Do not change the Provider interface (LLMProvider.Chat)
Do not change the core Tool interface signatures (Execute / ExecuteAsync)
Do not change the ContextBuilder.BuildMessages input/output contract
Do not change the session storage format (JSON)
Do not introduce external dependencies
Boundary definitions
What is inside the Loop vs. outside
graph TD
subgraph AgentLoop["AgentLoop (outer shell)"]
Run["Run()<br/>Consume MessageBus, send final reply<br/>This layer unchanged, only adds EventBus init"]
subgraph ProcessMessage["processMessage() — preprocessing layer"]
Transcribe["transcribeAudio<br/><i>stays outside</i>"]
Route["resolveMessageRoute<br/><i>stays outside</i>"]
Command["handleCommand<br/><i>stays outside</i>"]
end
subgraph RunTurn["runTurn() — refactored core (formerly runAgentLoop)"]
subgraph TurnBoundary["Turn boundary — event + hook coverage"]
Context["Initial context assembly<br/>ContextBuilder.BuildMessages() remains a pure function<br/>resolveMediaRefs() called here<br/>Output: []providers.Message"]
subgraph LLMLoop["LLM iteration loop"]
DrainSteering["drain steering queue"]
BeforeLLM["Hook: BeforeLLMRequest"]
CallLLM["call Provider.Chat()"]
AfterLLM["Hook: AfterLLMResponse"]
ParseTools["parse tool calls"]
subgraph ToolExec["Tool execution (smart parallel)"]
BeforeTool["Hook: BeforeToolExecute"]
Approval["Hook: RequestApproval"]
ExecTool["Execute tool"]
AfterTool["Hook: AfterToolExecute"]
end
CheckInterrupt["check steering / interrupt"]
end
Compress["Context compression (retry branch)<br/>forceCompression()<br/>Hook: BeforeContextCompress"]
SessionSave["Session save<br/>Intermediate: AddFullMessage (memory only)<br/>End: AddMessage + Save (disk)<br/>Abort: rollback in-memory state"]
Summarize["Summary trigger (async after Turn ends)"]
end
end
API1["InjectSteering() ← public API"]
API2["InjectFollowUp() ← public API"]
API3["InterruptGraceful() ← public API"]
API4["InterruptHard() ← public API"]
API5["SubscribeEvents() ← public API"]
API6["RegisterHook() ← public API"]
API7["GetActiveTurn() ← public API"]
end
Run --> ProcessMessage
ProcessMessage --> RunTurn
Context --> LLMLoop
DrainSteering --> BeforeLLM --> CallLLM --> AfterLLM --> ParseTools --> ToolExec
BeforeTool --> Approval --> ExecTool --> AfterTool
ToolExec --> CheckInterrupt
CheckInterrupt -->|"continue iteration"| DrainSteering
LLMLoop --> Compress
LLMLoop --> SessionSave
SessionSave --> Summarize
Loading
Core concepts
Turn & Iteration
Term
Definition
Turn
One complete "user input → LLM iterations → final reply" processing cycle
Iteration
One LLM call within a Turn (may produce tool calls, triggering the next Iteration)
Steering
A message injected during a Turn that affects the next Iteration
FollowUp
A message queued during a Turn, processed as a new inbound after the Turn ends
Graceful Interrupt
Skip remaining tools, optionally inject a steering hint, let LLM generate a summary, then end the Turn
Hard Abort
Immediately cancel the Provider call and all tool executions; Turn ends with an error
SubTurn
A child Turn spawned inside a parent Turn, running the full runTurn path with its own turnID
Ephemeral Session
An in-memory-only session used by SubTurns — never persisted to disk
What is the EventBus?
The EventBus is a multi-subscriber broadcast system that the agent loop uses to emit structured events at every critical phase of execution.
You can think of it as: "a way for anyone to watch what the loop is doing, in real time, without affecting its behavior."
Key design points:
Non-blocking fan-out: Emit iterates subscribers with non-blocking sends; full channels drop events (never block the loop)
Per-EventKind drop counters: [EventKindCount]atomic.Int64 array enables precise diagnostics of which event types are being lost
No dedicated goroutine: Emit runs in the loop's own goroutine (fewer goroutines, suitable for $10 boards with <10MB RAM)
Buffer capacity 16 per subscriber: Sufficient given LLM call latency dominates iteration timing
The Hook system allows external code to observe and intercept the loop's execution at well-defined points. Hooks are synchronous, priority-ordered, and timeout-protected.
There are five hook interfaces, each optional — a hook implements only the ones it needs:
Interface
Purpose
Can modify?
EventObserver
Passively receive events
No
LLMInterceptor
Intercept before/after LLM calls
Yes — can modify messages, model, tools; can abort turn
ToolInterceptor
Intercept before/after tool execution
Yes — can modify args, deny tool, abort turn
ToolApprover
Pause for external approval (e.g., UI confirmation)
Result delivery: sync → direct ToolResult; async → deliverSubTurnResult to parent's pendingResults queue (drained at next iteration) or session history (if parent finished)
Relationship to existing code
Current
After refactor
runAgentLoop()
Renamed to runTurn(), restructured with turnState + event emission
runLLMIteration()
Merged into runTurn() iteration loop
Tool execution (WaitGroup in runLLMIteration)
Extracted to tool_exec.go with smart grouping
SubagentTool → RunToolLoop
SubagentTool → spawnSubTurn(Sync) → runTurn
SpawnTool → RunToolLoop (async)
SpawnTool → spawnSubTurn(Async) → runTurn in goroutine
No observability
EventBus with 17 event types
No hooks
HookManager with 5 hook interfaces
No interrupts
InterruptGraceful / InterruptHard with session rollback
No steering
InjectSteering / InjectFollowUp with queue-based delivery
Emit outside locks: EventBus.Emit is called after releasing turnState.mu to minimize lock hold time
Double-check on providerCancel: Loop sets providerCancel then checks aborted; InterruptHard sets aborted then calls providerCancel — mutual redundancy under ts.mu
FollowUp as return value: runTurn returns follow-ups instead of publishing in defer — eliminates race between cleanup and new turn startup
Approval goroutine isolation: Approval runs in separate goroutine with timeout — cannot block loop even if hook misbehaves
Session snapshot correction: ts.sessionSnapshot is updated after forceCompression() to prevent stale rollback targets
Open questions
Should EventBus support filtered subscriptions (subscribe to specific EventKinds only), or should filtering be subscriber-side?
Should there be a built-in "debug hook" that logs all events to a file, enabled via config flag?
should async sub-turn results that arrive after parent turn ends trigger a user notification by default, or should this be opt-in?
Should ReadOnlyIndicator be extended to MCP tools (remote tools declaring read-only via MCP metadata)?
中文
当前的 agent loop(
pkg/agent/loop.go)是一个黑盒——外部代码无法观察内部执行状态,无法 hook 执行过程,无法打断正在进行的处理,也无法在处理过程中追加消息。本提案将 agent loop 重新设计为事件驱动、可 hook、可中断、可追加的系统。问题是什么?
现有的
runAgentLoop→runLLMIteration处理流水线存在四个架构层面的限制:此外,工具执行通过 WaitGroup 完全并行,工具之间没有检查点,无法在工具执行间检查中断或新的 steering 消息。
重构目标与非目标
目标
非目标
LLMProvider.Chat)Execute/ExecuteAsync)ContextBuilder.BuildMessages的输入输出契约边界定义
什么在 Loop 内部,什么在外部
graph TD subgraph AgentLoop["AgentLoop (外壳)"] Run["Run()<br/>消费 MessageBus,发送最终回复<br/>这一层不变,只增加 EventBus 初始化"] subgraph ProcessMessage["processMessage() — 前置处理层"] Transcribe["transcribeAudio<br/><i>保持在外部</i>"] Route["resolveMessageRoute<br/><i>保持在外部</i>"] Command["handleCommand<br/><i>保持在外部</i>"] end subgraph RunTurn["runTurn() — 重构后的核心 (原 runAgentLoop)"] subgraph TurnBoundary["Turn 边界 — 事件 + Hook 覆盖范围"] Context["初始上下文组装<br/>ContextBuilder.BuildMessages() 仍为纯函数<br/>resolveMediaRefs() 在此处调用<br/>产出: []providers.Message"] subgraph LLMLoop["LLM 迭代循环"] DrainSteering["drain steering queue"] BeforeLLM["Hook: BeforeLLMRequest"] CallLLM["调用 Provider.Chat()"] AfterLLM["Hook: AfterLLMResponse"] ParseTools["解析 tool calls"] subgraph ToolExec["工具执行 (智能并行)"] BeforeTool["Hook: BeforeToolExecute"] Approval["Hook: RequestApproval"] ExecTool["Execute tool"] AfterTool["Hook: AfterToolExecute"] end CheckInterrupt["检查 steering / interrupt"] end Compress["上下文压缩 (retry 分支)<br/>forceCompression()<br/>Hook: BeforeContextCompress"] SessionSave["Session 保存<br/>中间: AddFullMessage (只写内存)<br/>结束: AddMessage + Save (落盘)<br/>中断: 回滚内存中的中间状态"] Summarize["摘要触发 (Turn 结束后异步)"] end end API1["InjectSteering() ← 公开 API"] API2["InjectFollowUp() ← 公开 API"] API3["InterruptGraceful() ← 公开 API"] API4["InterruptHard() ← 公开 API"] API5["SubscribeEvents() ← 公开 API"] API6["RegisterHook() ← 公开 API"] API7["GetActiveTurn() ← 公开 API"] end Run --> ProcessMessage ProcessMessage --> RunTurn Context --> LLMLoop DrainSteering --> BeforeLLM --> CallLLM --> AfterLLM --> ParseTools --> ToolExec BeforeTool --> Approval --> ExecTool --> AfterTool ToolExec --> CheckInterrupt CheckInterrupt -->|"继续迭代"| DrainSteering LLMLoop --> Compress LLMLoop --> SessionSave SessionSave --> Summarize核心概念
Turn 与 Iteration
runTurn路径,拥有独立 turnIDEventBus 是什么?
EventBus 是一个多订阅者广播系统,agent loop 通过它在每个关键执行阶段发射结构化事件。
可以理解为:「让任何人都能实时观察 loop 在做什么,而不影响其行为。」
核心设计要点:
[EventKindCount]atomic.Int64数组,可精确诊断哪类事件正在丢失事件类型覆盖完整的 Turn 生命周期:
TurnStart,TurnEndLLMRequest,LLMDelta,LLMResponse,LLMRetryContextCompress,SessionSummarizeToolExecStart,ToolExecEnd,ToolExecSkippedSteeringInjected,FollowUpQueued,InterruptReceivedSubTurnSpawn,SubTurnEnd,SubTurnResultDeliveredErrorHook 系统是什么?
Hook 系统允许外部代码在明确定义的执行节点上观察和拦截 loop 的执行。Hook 是同步的、按优先级排序的、且有超时保护。
共五个 hook 接口,均为可选——hook 只需实现它需要的接口:
EventObserverLLMInterceptorToolInterceptorToolApproverContextCompressInterceptorHook 动作:
Continue、Modify、AbortTurn、HardAbort、DenyTool。超时分级:
Continuedeny(安全优先)审批在独立 goroutine 中运行,使用 channel+timeout 模式,因此即使 hook 实现者行为不当也无法永久阻塞 loop。
中断与消息注入机制是什么?
中断
两种中断模式,作为
AgentLoop的公开 API 暴露:interrupted=true,跳过剩余工具,注入 steering 提示,允许再做一轮 LLM 迭代生成总结aborted=true,立即取消 provider context 和 turn contextcascadeAbort递归级联到所有子 TurnSteering 与 Follow-up
runTurn的返回值,由调用者在所有清理完成后投递bus.PublishInbound在runTurn完全返回后投递——避免与 defer 清理产生竞态工具执行策略
当前基于 WaitGroup 的并行执行替换为智能分组:
ReadOnlyIndicator接口声明):连续的只读工具并行执行IsReadOnly()是静态的(无参数)——有意不支持依赖参数的只读判断,以保持简单和安全多 Agent 协作(子 Turn)
当前的
SubagentManager+RunToolLoop架构存在关键缺陷:RunToolLoop完全脱离 EventBus/Hook/中断体系agent:main:mainsession,丢失原始对话上下文设计:通过
runTurn实现子 Turn子 agent 重构为使用与顶层 Turn 相同的
runTurn路径:spawnSubTurn(parentTS, config)— 同步和异步子 Turn 的统一入口turnState中的parentTurnID/childTurnIDs追踪父子关系deliverSubTurnResult投递到父 Turn 的pendingResults队列(在下次 Iteration 前 drain)或写入 session 历史(如果父 Turn 已结束)与现有代码的关系
runAgentLoop()runTurn(),使用 turnState + 事件发射重构runLLMIteration()runTurn()的迭代循环中runLLMIteration中的 WaitGroup)tool_exec.go,使用智能分组SubagentTool→RunToolLoopSubagentTool→spawnSubTurn(Sync)→runTurnSpawnTool→RunToolLoop(异步)SpawnTool→spawnSubTurn(Async)→ goroutine 中runTurnInterruptGraceful/InterruptHard,含 session 回滚InjectSteering/InjectFollowUp,基于队列的投递新增文件
pkg/agent/events.gopkg/agent/eventbus.gopkg/agent/hooks.gopkg/agent/turn_state.gopkg/agent/tool_exec.gopkg/agent/subturn.go修改文件
pkg/agent/loop.gorunTurn重写,公开 APIpkg/tools/base.goReadOnlyIndicator接口pkg/tools/*.goIsReadOnly()pkg/agent/instance.gopkg/tools/subagent.gospawnSubTurnpkg/tools/spawn.gospawnSubTurn并发安全
核心不变量:
turnState.mu→eventBus.mu (RLock)→sessionManager.mu——永远不逆序turnState.mu后调用,最小化持锁时间providerCancel后检查aborted;InterruptHard设置aborted后调用providerCancel——在ts.mu保护下互为冗余runTurn返回 follow-up 而非在 defer 中投递——消除清理与新 Turn 启动之间的竞态forceCompression()后立即更新ts.sessionSnapshot,防止回滚目标过时开放问题
ReadOnlyIndicator是否应扩展到 MCP 工具(远程工具通过 MCP 元数据声明只读)?What is the problem?
The existing
runAgentLoop→runLLMIterationpipeline has four architectural limitations:Additionally, tool execution is fully parallel via WaitGroup with no checkpoints between tools, making it impossible to check for interrupts or new steering messages between tool executions.
Goals & Non-goals
Goals
Non-goals
LLMProvider.Chat)Execute/ExecuteAsync)ContextBuilder.BuildMessagesinput/output contractBoundary definitions
What is inside the Loop vs. outside
graph TD subgraph AgentLoop["AgentLoop (outer shell)"] Run["Run()<br/>Consume MessageBus, send final reply<br/>This layer unchanged, only adds EventBus init"] subgraph ProcessMessage["processMessage() — preprocessing layer"] Transcribe["transcribeAudio<br/><i>stays outside</i>"] Route["resolveMessageRoute<br/><i>stays outside</i>"] Command["handleCommand<br/><i>stays outside</i>"] end subgraph RunTurn["runTurn() — refactored core (formerly runAgentLoop)"] subgraph TurnBoundary["Turn boundary — event + hook coverage"] Context["Initial context assembly<br/>ContextBuilder.BuildMessages() remains a pure function<br/>resolveMediaRefs() called here<br/>Output: []providers.Message"] subgraph LLMLoop["LLM iteration loop"] DrainSteering["drain steering queue"] BeforeLLM["Hook: BeforeLLMRequest"] CallLLM["call Provider.Chat()"] AfterLLM["Hook: AfterLLMResponse"] ParseTools["parse tool calls"] subgraph ToolExec["Tool execution (smart parallel)"] BeforeTool["Hook: BeforeToolExecute"] Approval["Hook: RequestApproval"] ExecTool["Execute tool"] AfterTool["Hook: AfterToolExecute"] end CheckInterrupt["check steering / interrupt"] end Compress["Context compression (retry branch)<br/>forceCompression()<br/>Hook: BeforeContextCompress"] SessionSave["Session save<br/>Intermediate: AddFullMessage (memory only)<br/>End: AddMessage + Save (disk)<br/>Abort: rollback in-memory state"] Summarize["Summary trigger (async after Turn ends)"] end end API1["InjectSteering() ← public API"] API2["InjectFollowUp() ← public API"] API3["InterruptGraceful() ← public API"] API4["InterruptHard() ← public API"] API5["SubscribeEvents() ← public API"] API6["RegisterHook() ← public API"] API7["GetActiveTurn() ← public API"] end Run --> ProcessMessage ProcessMessage --> RunTurn Context --> LLMLoop DrainSteering --> BeforeLLM --> CallLLM --> AfterLLM --> ParseTools --> ToolExec BeforeTool --> Approval --> ExecTool --> AfterTool ToolExec --> CheckInterrupt CheckInterrupt -->|"continue iteration"| DrainSteering LLMLoop --> Compress LLMLoop --> SessionSave SessionSave --> SummarizeCore concepts
Turn & Iteration
runTurnpath with its own turnIDWhat is the EventBus?
The EventBus is a multi-subscriber broadcast system that the agent loop uses to emit structured events at every critical phase of execution.
You can think of it as: "a way for anyone to watch what the loop is doing, in real time, without affecting its behavior."
Key design points:
[EventKindCount]atomic.Int64array enables precise diagnostics of which event types are being lostEvent types cover the full turn lifecycle:
TurnStart,TurnEndLLMRequest,LLMDelta,LLMResponse,LLMRetryContextCompress,SessionSummarizeToolExecStart,ToolExecEnd,ToolExecSkippedSteeringInjected,FollowUpQueued,InterruptReceivedSubTurnSpawn,SubTurnEnd,SubTurnResultDeliveredErrorWhat is the Hook system?
The Hook system allows external code to observe and intercept the loop's execution at well-defined points. Hooks are synchronous, priority-ordered, and timeout-protected.
There are five hook interfaces, each optional — a hook implements only the ones it needs:
EventObserverLLMInterceptorToolInterceptorToolApproverContextCompressInterceptorHook actions:
Continue,Modify,AbortTurn,HardAbort,DenyTool.Timeout tiers:
Continuedeny(safety-first)Approval runs in a separate goroutine with a channel+timeout pattern, so a misbehaving hook cannot block the loop indefinitely.
What is the Interrupt & Steering mechanism?
Interrupts
Two interrupt modes, exposed as public APIs on
AgentLoop:interrupted=true, skips remaining tools, injects steering hint, allows one more LLM iteration for summaryaborted=true, cancels provider context + turn context immediatelycascadeAbortSteering & Follow-up
runTurn, published by caller after all cleanupbus.PublishInboundafterrunTurnfully returns — avoids race with defer cleanupTool execution strategy
The current WaitGroup-based parallel execution is replaced with smart grouping:
ReadOnlyIndicatorinterface): consecutive read-only tools run in parallelIsReadOnly()is static (no args parameter) — args-dependent read-only is intentionally not supported for simplicity and safetyMulti-agent coordination (Sub-turns)
The current
SubagentManager+RunToolLooparchitecture has critical flaws:RunToolLoopis completely outside the EventBus/Hook/interrupt systemagent:main:mainsession, losing the original conversation contextDesign: Sub-turns via
runTurnSub-agents are refactored to use the same
runTurnpath as top-level turns:spawnSubTurn(parentTS, config)— single entry point for both sync and async sub-turnsparentTurnID/childTurnIDsinturnStatedeliverSubTurnResultto parent'spendingResultsqueue (drained at next iteration) or session history (if parent finished)Relationship to existing code
runAgentLoop()runTurn(), restructured with turnState + event emissionrunLLMIteration()runTurn()iteration looprunLLMIteration)tool_exec.gowith smart groupingSubagentTool→RunToolLoopSubagentTool→spawnSubTurn(Sync)→runTurnSpawnTool→RunToolLoop(async)SpawnTool→spawnSubTurn(Async)→runTurnin goroutineInterruptGraceful/InterruptHardwith session rollbackInjectSteering/InjectFollowUpwith queue-based deliveryNew files
pkg/agent/events.gopkg/agent/eventbus.gopkg/agent/hooks.gopkg/agent/turn_state.gopkg/agent/tool_exec.gopkg/agent/subturn.goModified files
pkg/agent/loop.gorunTurnrewrite, public APIspkg/tools/base.goReadOnlyIndicatorinterfacepkg/tools/*.goIsReadOnly()to known read-only toolspkg/agent/instance.gopkg/tools/subagent.gospawnSubTurnpkg/tools/spawn.gospawnSubTurnConcurrency safety
Key invariants:
turnState.mu→eventBus.mu (RLock)→sessionManager.mu— never reversedturnState.muto minimize lock hold timeproviderCancelthen checksaborted;InterruptHardsetsabortedthen callsproviderCancel— mutual redundancy underts.murunTurnreturns follow-ups instead of publishing in defer — eliminates race between cleanup and new turn startupts.sessionSnapshotis updated afterforceCompression()to prevent stale rollback targetsOpen questions
ReadOnlyIndicatorbe extended to MCP tools (remote tools declaring read-only via MCP metadata)?