An architectural pattern for building reliable, auditable AI agents by separating workflow control from execution
What you just saw:
- User asks "what do you recommend?" โ Claude explains each option with reasoning
- User says "pick something for me" โ Claude intelligently selects and explains why
- Questions follow a strict order (crust โ category โ toppings โ size)
- Natural, helpful conversation, but protocol-enforced
The guide controls the workflow. Claude provides the intelligence.
What you just saw:
- Agent autonomously screens for emergency symptoms (protocol requirement)
- Checks medical history, classifies severity
- Makes escalation decision and saves audit trail
- All while the guide enforces clinical protocols
The guide ensures compliance. The agent does the work.
Autonomous AI agents are powerful but unreliable for critical workflows:
- โ Skip important steps
- โ Inconsistent behavior across runs
- โ Hard to audit or debug
- โ Can't guarantee protocol compliance
- โ Unsuitable for regulated domains
Current solutions don't solve this:
- Prompts: Too fragile ("please follow these steps...")
- Traditional workflows: LLM is passive, no agency
- Agent frameworks: Too unpredictable for critical systems
TL;DR: The Tool-as-Guide pattern is a workflow engine with inversion of control, acting as a protocol-driven supervisor for agentic systems.
When you're driving through a city:
- You're the intelligent, capable driver - You navigate traffic, make decisions, handle unexpected situations
- GPS provides turn-by-turn instructions - Tells you which route to follow, which turn to take next
- GPS doesn't drive for you - But it ensures you don't miss critical turns or get lost
Tool-as-Guide works the same way:
- Agent is the intelligent worker - Capable of reasoning, adapting, executing tasks
- Guide provides step-by-step instructions - What to do next, what protocol to follow
- Guide doesn't do the work - But it ensures critical steps aren't skipped and protocols are followed
The agent stays intelligent and autonomous. The guide ensures reliability and compliance.
Key insight: The workflow engine is a tool that the LLM actively queries.
sequenceDiagram
participant User
participant LLM as LLM/Agent
participant Guide as Workflow Guide (Tool)
User->>LLM: "I need help with X"
LLM->>Guide: guide.start_workflow()
Guide-->>LLM: {instruction: "Do step 1", required_data: [...]}
LLM->>User: Executes step 1
User->>LLM: Provides response
LLM->>Guide: guide.continue(session_id, response)
Guide-->>LLM: {instruction: "Do step 2", validation: "passed"}
LLM->>User: Executes step 2
Note over LLM,Guide: LLM drives execution<br/>Guide enforces protocol
The pattern in three points:
- LLM/Agent has autonomy - Drives conversation, executes tasks, makes decisions
- Guide provides instructions - What to do next, what data is needed, what validates
- Protocol is enforced - Can't skip steps, consistent behavior, auditable trail
Important: The LLM/Agent maintains its intelligence and flexibility:
- Interprets user intent - "I want something spicy" โ translates to pepperoni/jalapeรฑos
- Generates natural responses - Not just relaying template text
- Handles unexpected input - Clarifies, asks for details, adapts phrasing
- Executes complex tasks - Queries databases, calls APIs, processes data
- Reasons within bounds - Makes decisions at each step, guided by protocol
The Guide enforces the protocol. The LLM provides the intelligence.
It's like GPS: You're still the intelligent driver making decisionsโGPS just ensures you don't miss critical turns.
Here's what the interaction looks like:
# Agent queries the guide
response = guide.start_workflow()
# โ {"instruction": "Ask user for pizza crust preference",
# "options": ["thin", "regular", "thick"],
# "required": true}
# Agent uses intelligence to interact with user
agent.ask_user("What kind of crust would you like? We have thin, regular, or thick.")
user_says("Make it crispy!")
# Agent interprets intent, sends to guide
response = guide.continue(session_id, "thin")
# โ {"instruction": "Ask about toppings",
# "category": "vegetarian", ...}
# Guide controls WHAT steps. Agent controls HOW they're executed.| Aspect | Pizza Ordering | Medical Triage |
|---|---|---|
| Interface | Chat (Claude Desktop) | Autonomous Agent (Jupyter) |
| Use Case | Low stakes, conversational | High stakes, critical system |
| AI Role | Relays messages to user | Executes tasks autonomously |
| Guide Role | Returns prompts/questions | Returns tasks & protocol decisions |
| Tools Used | State machine only | State machine + DB + Classifier + Monitor |
| LLM | Claude (cloud) | Gemma (local, open) |
| Domain Knowledge | In guide (menu, options) | Separate classifier tool |
| Purpose | Show pattern basics | Show pattern for autonomous agents |
| Audit Trail | Session states | Full protocol compliance log |
Both demonstrate the same core pattern in different contexts, showing its versatility from simple chatbots to critical autonomous systems.
graph TB
subgraph "Traditional Workflow Engine"
WE[Workflow Engine<br/>Controls Everything] -->|executes| L1[LLM<br/>Passive Task]
L1 -->|returns| WE
end
subgraph "Agent Framework"
A[LLM Agent<br/>Controls Everything] -->|calls| T[Tools<br/>Passive Utilities]
T -->|returns| A
end
subgraph "Tool-as-Guide (Novel)"
L2[LLM/Agent<br/>Active Driver] -->|queries| G[Guide Tool<br/>Workflow Engine]
G -->|returns instructions| L2
L2 -->|executes & reports| G
end
style L2 fill:#90EE90
style G fill:#87CEEB
Tool-as-Guide combines the best of both:
- โ Agent autonomy (LLM drives interaction)
- โ Workflow reliability (Guide enforces protocol)
Choose an example to explore:
Chat interface demo with Claude Desktop or Cursor. Perfect for understanding the basics.
Autonomous agent demo with Jupyter and local LLM. Shows the pattern for critical systems.
โ ๏ธ PREVIEW: A tool that implements this pattern to help Cursor IDE generate workflow guides. Stay tuned!
Each example includes complete setup instructions and usage guide.
-
Separation of Concerns
- Guide: WHAT to do (workflow logic, validation, protocol)
- Agent: HOW to do it (execution, reasoning, tool calls)
-
Inversion of Control
- Agent queries the guide (not controlled by it)
- Guide returns instructions (not commands)
-
Stateful Sessions
- Each workflow has a unique session
- State persists across interactions
- Audit trail is automatic
-
Deterministic Protocols
- Workflow logic lives in code
- Same input โ same workflow
- Testable, verifiable, compliant
-
Progressive Disclosure
- Agent receives only current instruction
- Next steps revealed when needed
- Reduces context window, improves focus
This pattern is particularly powerful for:
- ๐ฅ Healthcare - Enforce clinical protocols, emergency escalation
- ๐ฐ Finance - Regulatory compliance, risk checks
- โ๏ธ Legal - Document review checklists, thoroughness requirements
- ๐ Deployments - Quality gates, approval workflows
- ๐ก๏ธ Security - Incident response protocols
- ๐ Customer Support - Troubleshooting procedures
- Protocol compliance is required
- Steps can't be skipped
- Behavior must be auditable
- Consistency matters more than flexibility
Beyond reliability, this pattern also improves performance and reduces costs:
- Smaller context windows: Guide provides only the current instruction, not the entire workflow tree
- Progressive disclosure: Agent loads only what it needs at each step
- Faster execution: Deterministic protocol logic runs in the guide (not LLM inference)
- Lower token costs: Similar to Anthropic's code execution research, processing logic outside the model reduces token consumption by 90%+
The guide acts as an efficient coordinatorโthe LLM only sees "what to do next," not "all possible paths."
Most LLM applications today rely on carefully crafted prompts ("think step by step...", "call tools as needed"). This approach is:
- Brittle: Minor prompt changes, model updates, or LLM randomness can produce different results
- Opaque: No clear audit trailโworkflow logic is implicit in prompts and model reasoning
- Unpredictable: Can't guarantee every step will be followed in regulated processes
Conventional workflow engines define steps externally but lack LLM adaptability:
- Rigid: The engine holds all logicโLLMs just fill in text generation
- Limited Intelligence: Little opportunity for adaptive reasoning
- Separated: Workflow and AI capabilities don't integrate well
Some architectures use supervising agents to check other agents' work. While this improves reliability:
- Still LLM-driven: Error correction is probabilistic, supervisors can miss violations
- Weakly auditable: Better than single-agent prompts, but no external testable flow
- Complex: Multiple LLMs increase cost and latency
The Tool-as-Guide pattern combines the strengths of both:
- Deterministic Protocol: Workflow state machine decides every stepโLLM executes but doesn't decide
- Context-Rich Guidance: Tool provides detailed instructions, templates, and context for each step
- True Auditability: Code-based workflow with explicit state transitions
- Debuggable: Failures isolated to specific logic/data steps, not buried in prompts
- LLM Intelligence: Agent can still reason and adapt within the bounds of each step
In essence: The workflow is deterministic and auditable (like traditional engines), but the execution is intelligent and adaptive (like LLM agents).
Feedback is appreciated! Open a PR or issue to discuss your ideas.
MIT License

