QwenLM
diff --git a/‎docs/design/fork-subagent/fork-subagent-design.md‎
Lines changed: 112 additions & 0 deletions b/‎docs/design/fork-subagent/fork-subagent-design.md‎
Lines changed: 112 additions & 0 deletions
diff --git a/‎docs/users/features/sub-agents.md‎
Lines changed: 37 additions & 1 deletion b/‎docs/users/features/sub-agents.md‎
Lines changed: 37 additions & 1 deletion
diff --git a/‎packages/core/src/agents/runtime/agent-core.ts‎
Lines changed: 55 additions & 18 deletions b/‎packages/core/src/agents/runtime/agent-core.ts‎
Lines changed: 55 additions & 18 deletions
diff --git a/‎packages/core/src/agents/runtime/agent-headless.ts‎
Lines changed: 15 additions & 2 deletions b/‎packages/core/src/agents/runtime/agent-headless.ts‎
Lines changed: 15 additions & 2 deletions
@@ -0,0 +1,112 @@
+# Fork Subagent Design
+
+> Implicit fork subagent that inherits the parent's full conversation context and shares prompt cache for cost-efficient parallel task execution.
+
+## Overview
+
+When the Agent tool is called without `subagent_type`, it triggers an implicit **fork** — a background subagent that inherits the parent's conversation history, system prompt, and tool definitions. The fork uses `CacheSafeParams` to ensure its API requests share the same prefix as the parent's, enabling DashScope prompt cache hits.
+
+## Architecture
+
+```
+Parent conversation: [SystemPrompt | Tools | Msg1 | Msg2 | ... | MsgN (model)]
+                              ↑ identical prefix for all forks ↑
+
+Fork A: [...MsgN | placeholder results | "Research A"]  ← shared cache
+Fork B: [...MsgN | placeholder results | "Modify B"]    ← shared cache
+Fork C: [...MsgN | placeholder results | "Test C"]      ← shared cache
+```
+
+## Key Components
+
+### 1. FORK_AGENT (`forkSubagent.ts`)
+
+Synthetic agent config, not registered in `builtInAgents`. Has a fallback `systemPrompt` but in practice uses the parent's rendered system prompt via `generationConfigOverride`.
+
+### 2. CacheSafeParams Integration (`agent.ts` + `forkedQuery.ts`)
+
+```
+agent.ts (fork path)
+  │
+  ├── getCacheSafeParams()          ← parent's generationConfig snapshot
+  │     ├── generationConfig        ← systemInstruction + tools + temp/topP
+  │     └── history                 ← (not used — we build extraHistory instead)
+  │
+  ├── forkGenerationConfig          ← passed as generationConfigOverride
+  └── forkToolsOverride             ← FunctionDeclaration[] extracted from tools
+        │
+        ▼
+  AgentHeadless.execute(context, signal, {
+    extraHistory,                   ← parent conversation history
+    generationConfigOverride,       ← parent's exact systemInstruction + tools
+    toolsOverride,                  ← parent's exact tool declarations
+  })
+        │
+        ▼
+  AgentCore.createChat(context, {
+    extraHistory,
+    generationConfigOverride,       ← bypasses buildChatSystemPrompt()
+  })                                   AND skips getInitialChatHistory()
+        │                              (extraHistory already has env context)
+        ▼
+  new GeminiChat(config, generationConfig, startHistory)
+                          ↑ byte-identical to parent's config
+```
+
+### 3. History Construction (`agent.ts` + `forkSubagent.ts`)
+
+The fork's `extraHistory` must end with a model message to maintain Gemini API's user/model alternation when `agent-headless` sends the `task_prompt`.
+
+Three cases:
+
+| Parent history ends with      | extraHistory construction                                              | task_prompt                    |
+| ----------------------------- | ---------------------------------------------------------------------- | ------------------------------ |
+| `model` (no function calls)   | `[...rawHistory]` (unchanged)                                          | `buildChildMessage(directive)` |
+| `model` (with function calls) | `[...rawHistory, model(clone), user(responses+directive), model(ack)]` | `'Begin.'`                     |
+| `user` (unusual)              | `rawHistory.slice(0, -1)` (drop trailing user)                         | `buildChildMessage(directive)` |
+
+### 4. Recursive Fork Prevention (`forkSubagent.ts`)
+
+`isInForkChild()` scans conversation history for the `<fork-boilerplate>` tag. If found, the fork attempt is rejected with an error message.
+
+### 5. Background Execution (`agent.ts`)
+
+Fork uses `void executeSubagent()` (fire-and-forget) and returns `FORK_PLACEHOLDER_RESULT` immediately to the parent. Errors in the background task are caught, logged, and reflected in the display state.
+
+## Data Flow
+
+```
+1. Model calls Agent tool (no subagent_type)
+2. agent.ts: import forkSubagent.js
+3. agent.ts: getCacheSafeParams() → forkGenerationConfig + forkToolsOverride
+4. agent.ts: build extraHistory from parent's getHistory(true)
+5. agent.ts: build forkTaskPrompt (directive or 'Begin.')
+6. agent.ts: createAgentHeadless(FORK_AGENT, ...)
+7. agent.ts: void executeSubagent() — background
+8. agent.ts: return FORK_PLACEHOLDER_RESULT to parent immediately
+9. Background:
+   a. AgentHeadless.execute(context, signal, {extraHistory, generationConfigOverride, toolsOverride})
+   b. AgentCore.createChat() — uses parent's generationConfig (cache-shared)
+   c. runReasoningLoop() — uses parent's tool declarations
+   d. Fork executes tools, produces result
+   e. updateDisplay() with final status
+```
+
+## Graceful Degradation
+
+If `getCacheSafeParams()` returns null (first turn, no history yet), the fork falls back to:
+
+- `FORK_AGENT.systemPrompt` for system instruction
+- `prepareTools()` for tool declarations
+
+This ensures the fork always works, even without cache sharing.
+
+## Files
+
+| File                                                 | Role                                                                                  |
+| ---------------------------------------------------- | ------------------------------------------------------------------------------------- |
+| `packages/core/src/agents/runtime/forkSubagent.ts`   | FORK_AGENT config, buildForkedMessages(), isInForkChild(), buildChildMessage()        |
+| `packages/core/src/tools/agent.ts`                   | Fork path: CacheSafeParams retrieval, extraHistory construction, background execution |
+| `packages/core/src/agents/runtime/agent-headless.ts` | execute() options: generationConfigOverride, toolsOverride                            |
+| `packages/core/src/agents/runtime/agent-core.ts`     | CreateChatOptions.generationConfigOverride                                            |
+| `packages/core/src/followup/forkedQuery.ts`          | CacheSafeParams infrastructure (existing, no changes)                                 |
@@ -12,18 +12,54 @@ Subagents are independent AI assistants that:
 - **Work autonomously** - Once given a task, they work independently until completion or failure
 - **Provide detailed feedback** - You can see their progress, tool usage, and execution statistics in real-time
 
+## Fork Subagent (Implicit Fork)
+
+In addition to named subagents, Qwen Code supports **implicit forking** — when the AI omits the `subagent_type` parameter, it triggers a fork that inherits the parent's full conversation context.
+
+### How Fork Differs from Named Subagents
+
+|               | Named Subagent                    | Fork Subagent                                         |
+| ------------- | --------------------------------- | ----------------------------------------------------- |
+| Context       | Starts fresh, no parent history   | Inherits parent's full conversation history           |
+| System prompt | Uses its own configured prompt    | Uses parent's exact system prompt (for cache sharing) |
+| Execution     | Blocks the parent until done      | Runs in background, parent continues immediately      |
+| Use case      | Specialized tasks (testing, docs) | Parallel tasks that need the current context          |
+
+### When Fork is Used
+
+The AI automatically uses fork when it needs to:
+
+- Run multiple research tasks in parallel (e.g., "investigate module A, B, and C")
+- Perform background work while continuing the main conversation
+- Delegate tasks that require understanding of the current conversation context
+
+### Prompt Cache Sharing
+
+All forks share the parent's exact API request prefix (system prompt, tools, conversation history), enabling DashScope prompt cache hits. When 3 forks run in parallel, the shared prefix is cached once and reused — saving 80%+ token costs compared to independent subagents.
+
+### Recursive Fork Prevention
+
+Fork children cannot create further forks. This is enforced at runtime — if a fork attempts to spawn another fork, it receives an error instructing it to execute tasks directly.
+
+### Current Limitations
+
+- **No result feedback**: Fork results are reflected in the UI progress display but are not automatically fed back into the main conversation. The parent AI sees a placeholder message and cannot act on the fork's output.
+- **No worktree isolation**: Forks share the parent's working directory. Concurrent file modifications from multiple forks may conflict.
+
 ## Key Benefits
 
 - **Task Specialization**: Create agents optimized for specific workflows (testing, documentation, refactoring, etc.)
 - **Context Isolation**: Keep specialized work separate from your main conversation
+- **Context Inheritance**: Fork subagents inherit the full conversation for context-heavy parallel tasks
+- **Prompt Cache Sharing**: Fork subagents share the parent's cache prefix, reducing token costs
 - **Reusability**: Save and reuse agent configurations across projects and sessions
 - **Controlled Access**: Limit which tools each agent can use for security and focus
 - **Progress Visibility**: Monitor agent execution with real-time progress updates
 
 ## How Subagents Work
 
 1. **Configuration**: You create Subagents configurations that define their behavior, tools, and system prompts
-2. **Delegation**: The main AI can automatically delegate tasks to appropriate Subagents
+2. **Delegation**: The main AI can automatically delegate tasks to appropriate Subagents — or implicitly fork when no specific subagent type is needed
 3. **Execution**: Subagents work independently, using their configured tools to complete tasks
 4. **Results**: They return results and execution summaries back to the main conversation
 
 
@@ -58,14 +58,25 @@ import type {
 import { type AgentEventEmitter, AgentEventType } from './agent-events.js';
 import { AgentStatistics, type AgentStatsSummary } from './agent-statistics.js';
 import { matchesMcpPattern } from '../../permissions/rule-parser.js';
-import { AgentTool } from '../../tools/agent.js';
 import { ToolNames } from '../../tools/tool-names.js';
 import { DEFAULT_QWEN_MODEL } from '../../config/models.js';
 import { type ContextState, templateString } from './agent-headless.js';
 
 /**
  * Result of a single reasoning loop invocation.
  */
+/**
+ * Tools that must never be available to subagents (including forked agents).
+ * - AgentTool prevents recursive subagent spawning.
+ * - Cron tools are session-scoped and should only run from the main session.
+ */
+export const EXCLUDED_TOOLS_FOR_SUBAGENTS: ReadonlySet<string> = new Set([
+  ToolNames.AGENT,
+  ToolNames.CRON_CREATE,
+  ToolNames.CRON_LIST,
+  ToolNames.CRON_DELETE,
+]);
+
 export interface ReasoningLoopResult {
   /** The final model text response (empty if terminated by abort/limits). */
   text: string;
@@ -102,6 +113,26 @@ export interface CreateChatOptions {
    * conversational context (e.g., from the main session that spawned it).
    */
   extraHistory?: Content[];
+  /**
+   * When provided, replaces the auto-built generationConfig
+   * (systemInstruction, temperature, etc.) with this exact config.
+   * Used by fork subagents to share the parent conversation's cache
+   * prefix for DashScope prompt caching.
+   */
+  generationConfigOverride?: GenerateContentConfig & {
+    systemInstruction?: string | Content;
+  };
+  /**
+   * When true, skip injecting the env bootstrap messages from
+   * `getInitialChatHistory()`. Set by fork subagents because their
+   * `extraHistory` is the full parent history that already contains
+   * those env messages — re-injecting would duplicate them.
+   *
+   * Other callers (e.g. arena interactive agents) pass an
+   * env-stripped history and DO need fresh env init for their own
+   * working directory, so they must leave this unset.
+   */
+  skipEnvHistory?: boolean;
 }
 
 /**
@@ -223,30 +254,43 @@ export class AgentCore {
       );
     }
 
-    const envHistory = await getInitialChatHistory(this.runtimeContext);
+    // Skip env bootstrap when the caller (fork) explicitly says its
+    // extraHistory already contains those messages. Other callers that
+    // provide an env-stripped history (e.g. arena) still get fresh env init.
+    const envHistory = options?.skipEnvHistory
+      ? []
+      : await getInitialChatHistory(this.runtimeContext);
 
     const startHistory = [
       ...envHistory,
       ...(options?.extraHistory ?? []),
       ...(this.promptConfig.initialMessages ?? []),
     ];
 
-    const systemInstruction = this.promptConfig.systemPrompt
-      ? this.buildChatSystemPrompt(context, options)
-      : undefined;
+    // If an override is provided (fork path), use it directly for cache
+    // sharing. Otherwise, build the config from this agent's promptConfig.
+    // Note: buildChatSystemPrompt is called OUTSIDE the try/catch so template
+    // errors propagate to the caller (not swallowed by reportError).
+    let generationConfig: GenerateContentConfig & {
+      systemInstruction?: string | Content;
+    };
 
-    try {
-      const generationConfig: GenerateContentConfig & {
-        systemInstruction?: string | Content;
-      } = {
+    if (options?.generationConfigOverride) {
+      generationConfig = options.generationConfigOverride;
+    } else {
+      const systemInstruction = this.promptConfig.systemPrompt
+        ? this.buildChatSystemPrompt(context, options)
+        : undefined;
+      generationConfig = {
         temperature: this.modelConfig.temp,
         topP: this.modelConfig.top_p,
       };
-
       if (systemInstruction) {
         generationConfig.systemInstruction = systemInstruction;
       }
+    }
 
+    try {
       return new GeminiChat(
         this.runtimeContext,
         generationConfig,
@@ -275,14 +319,7 @@ export class AgentCore {
     const toolRegistry = this.runtimeContext.getToolRegistry();
     const toolsList: FunctionDeclaration[] = [];
 
-    // Tools excluded from subagents: AgentTool (prevent recursion) and
-    // cron tools (session-scoped, should only be used by the main session).
-    const excludedFromSubagents = new Set<string>([
-      AgentTool.Name,
-      ToolNames.CRON_CREATE,
-      ToolNames.CRON_LIST,
-      ToolNames.CRON_DELETE,
-    ]);
+    const excludedFromSubagents = EXCLUDED_TOOLS_FOR_SUBAGENTS;
 
     if (this.toolConfig) {
       const asStrings = this.toolConfig.tools.filter(
 
@@ -192,8 +192,21 @@ export class AgentHeadless {
   async execute(
     context: ContextState,
     externalSignal?: AbortSignal,
+    options?: {
+      extraHistory?: Array<import('@google/genai').Content>;
+      /** Override generationConfig for cache sharing (fork subagent). */
+      generationConfigOverride?: import('@google/genai').GenerateContentConfig;
+      /** Override tool declarations for cache sharing (fork subagent). */
+      toolsOverride?: Array<import('@google/genai').FunctionDeclaration>;
+      /** Skip env bootstrap injection (fork already inherits parent env). */
+      skipEnvHistory?: boolean;
+    },
   ): Promise<void> {
-    const chat = await this.core.createChat(context);
+    const chat = await this.core.createChat(context, {
+      extraHistory: options?.extraHistory,
+      generationConfigOverride: options?.generationConfigOverride,
+      skipEnvHistory: options?.skipEnvHistory,
+    });
 
     if (!chat) {
       this.terminateMode = AgentTerminateMode.ERROR;
@@ -212,7 +225,7 @@ export class AgentHeadless {
       abortController.abort();
     }
 
-    const toolsList = this.core.prepareTools();
+    const toolsList = options?.toolsOverride ?? this.core.prepareTools();
 
     const initialTaskText = String(
       (context.get('task_prompt') as string) ?? 'Get Started!',