Skip to content

feat(agent): add LoopDetectHook and ReflectRetryHook for agent self-correction#3728

Open
MuataSr wants to merge 12 commits into
HKUDS:mainfrom
MuataSr:feat/agent-hooks-loop-detect-reflect-retry
Open

feat(agent): add LoopDetectHook and ReflectRetryHook for agent self-correction#3728
MuataSr wants to merge 12 commits into
HKUDS:mainfrom
MuataSr:feat/agent-hooks-loop-detect-reflect-retry

Conversation

@MuataSr
Copy link
Copy Markdown
Contributor

@MuataSr MuataSr commented May 10, 2026

Summary

Agents frequently get stuck in two failure modes:

  1. Tool-call loops — repeating identical tool calls without progress until hitting iteration limits
  2. Blind retries — failing with the same tool call and arguments, wasting iterations on the same error

This PR adds two lightweight hooks that plug into the existing AgentHook system to help agents self-correct before exhausting their iteration budget.

What Changed

AgentHook — Two new hook points

  • before_execute_tools(tools) — runs before tool batch execution
  • after_tool_call(name, args, result) — runs after each individual tool call

LoopDetectHook (new)

  • Tracks consecutive identical tool-call signatures (SHA256 fingerprint of name + args)
  • At warn_at (default 3): injects in-context warning nudging the model to change approach
  • At halt_at (default 5): injects stronger message explicitly telling the model to stop calling tools
  • Zero runner changes beyond the hook plumbing — works entirely through the hook system
  • Configurable thresholds for different tolerance levels

ReflectRetryHook (new)

  • Intercepts tool error results and enriches them with attempt history
  • Same call with same args = same dedup key (tool_name + sha256(args))
  • First error: adds structured hint with "Things to try differently"
  • Repeated same error: accumulates history showing the model what it already tried
  • After max_retries (default 2): hint pivots to "try a fundamentally different approach"
  • Pure transform — no re-execution, the model decides whether to retry

Testing

  • 32 tests, all passing — composite hook fan-out, error isolation, loop detection thresholds, retry accumulation, dedup, exhaustion
  • Stress tested with 70 parallel subagent tasks (JSON data analysis, math computations, file writes) — 100% success rate, zero loops
32 passed in 0.68s

Files Changed (7 files, +560 −5)

File Change
nanobot/agent/hook.py Added before_execute_tools and after_tool_call to AgentHook base class
nanobot/agent/loop_detect.py NewLoopDetectHook (142 lines)
nanobot/agent/reflect_retry.py NewReflectRetryHook (127 lines)
nanobot/agent/runner.py Hook plumbing for the two new hook points
nanobot/agent/loop.py Pass hooks through to runner
tests/agent/test_hook_composite.py 22 tests for composite hook system
tests/agent/test_reflect_retry.py 10 tests for ReflectRetryHook

Design Decisions

  • Hooks over core changes: These plug into the existing hook system rather than modifying the core loop. Users opt in by configuring hooks, not by changing framework behavior.
  • In-context nudges, not hard stops: Loop detection injects messages into the conversation. The model can still choose to continue — it just gets better information to decide.
  • Pure transforms: ReflectRetryHook doesn't re-execute anything. It enriches error messages so the model self-corrects on the next iteration.
  • Backward compatible: All new hook methods have no-op defaults. Existing hooks continue working unchanged.

Relates to

- Extend AgentHook with before_tool_call/after_tool_call lifecycle
- Add CompositeHook fan-out and pipeline for per-tool hooks
- Add LoopDetectHook to detect and break iteration loops
- Add ReflectRetryHook to enrich tool errors with attempt history
- Wire hooks into runner._run_tool execution path
- 22 new tests (all passing)
chengyongru and others added 11 commits May 11, 2026 08:54
…bility

- Append [Archived Context Summary] to system prompt instead of injecting
  it into the user message runtime context, improving KV cache reuse across
  turns and avoiding consecutive same-role messages.
- _last_summary persists in metadata (no pop) for restart survival;
  summary is re-injected every turn via the stable system prompt.
- Remove dynamic "Inactive for X minutes" from _format_summary — use
  static last_active timestamp instead to preserve KV cache stability.
- Pass session_summary through build_messages() so both normal and
  ask_user paths receive the archived summary in the system prompt.
- estimate_session_prompt_tokens now reads _last_summary from metadata
  to include the summary in token budget estimation.
- Remove obsolete session_summary parameter from
  maybe_consolidate_by_tokens and estimate_session_prompt_tokens
  call sites in loop.py (summary flows through build_messages instead).
- Ensure /new (session.clear()) clears _last_summary from metadata.
Six tests covering:
- AgentDefaults preserves 'nanobot' and the cat icon by default
- camelCase config keys (botName/botIcon) bind to the new fields
- Empty bot_icon is accepted (opt-out of the leading icon)
- ThinkingSpinner uses bot_name in its status text
- StreamRenderer header combines icon and name when icon is set
- StreamRenderer header is just the name when icon is empty
Two new fields with safe defaults that preserve current branding:
- bot_name: str = "nanobot"
- bot_icon: str = "🐈"

Empty string for bot_icon is allowed and lets users opt out of the
leading icon. camelCase keys (botName, botIcon) bind via the existing
to_camel alias generator.
Threads bot_name/bot_icon through ThinkingSpinner and StreamRenderer
with safe defaults that preserve current behavior.

- ThinkingSpinner uses bot_name in its status text
- StreamRenderer header is "<icon> <name>" when icon is set,
  or just "<name>" when icon is empty
- Removes the now-unused __logo__ import (the cat emoji is the
  default value of bot_icon, not a hardcoded constant)
…S#3650)

Both StreamRenderer instantiations in the agent command (single-message
mode and interactive mode) now read bot_name and bot_icon from
config.agents.defaults and forward them to the renderer.

This is the wiring step that makes the schema fields actually take
effect at runtime. With safe defaults of "nanobot" and "🐈", existing
users see no change.
…ng (HKUDS#3585)

The hosted Xiaomi MiMo API accepts {"thinking": {"type": "enabled"|"disabled"}}
to toggle reasoning, which is exactly the shape produced by the existing
thinking_type style. The xiaomi_mimo ProviderSpec just needed to opt in.

Before this fix, setting reasoning_effort="none" had no effect on MiMo
because no thinking_style was configured, so the disable signal never
reached the server. Default-on models (mimo-v2.5-pro and friends) kept
reasoning regardless of user configuration.

Source: https://platform.xiaomimimo.com/docs/en-US/api/chat/openai-api

Co-authored with Claude Opus 4.7. Strategy and review via Claude Desktop,
implementation via Claude Code.
…king mode

- Update AgentDefaults.reasoning_effort comment to document "none"
  (disable) and None (preserve provider default).
- Add configuration.md tip explaining MiMo thinking mode behavior.
`crypto.randomUUID` only exists in secure contexts (HTTPS or localhost).
Over LAN HTTP it is undefined, so `ChatPane`'s welcome-message flush and
streaming-message handlers crash mid-render with `TypeError`, unmounting
the React tree and leaving the user a blank page.

Install a Math.random-backed v4-ish fallback at app entry, gated on the
feature being missing. This mirrors the shim already used in the test
setup and covers all six call sites (`ChatPane.tsx`, `useNanobotStream.ts`)
without touching them. These IDs are client-side message keys with no
security role, so non-cryptographic randomness is fine.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Add a focused regression test for the non-secure-context WebUI entry shim so missing crypto.randomUUID no longer depends on manual verification.

Co-authored-by: Cursor <cursoragent@cursor.com>
…SDK facade

_extra_hooks is meant for user-injected hooks managed by the SDK layer.
Baking LoopDetectHook into it at init caused test_run_with_hooks to fail
because the SDK's prev/restore pattern couldn't clean it back to [].

LoopDetectHook is now an always-on default in the hook composition
(line 819), independent of the mutable _extra_hooks list.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants