[P1] Startup Optimization / 启动优化

> **Priority**: P1  
> **Labels**: area:performance  
> **Reference**: [View Document](https://github.com/wenshao/codeagents/blob/main/docs/comparison/startup-optimization-deep-dive.md)

## Problem / 问题

qwen-code startup has multiple performance bottlenecks identified through comparison with Claude Code's optimization strategies:

1. **Core barrel export 全量加载**: `@qwen-code/qwen-code-core` re-exports 50+ modules via `export *`, forcing entire dependency graph evaluation at startup
2. **启动路径完全串行**: `gemini.tsx main()` is fully sequential with no parallelism
3. **缺少分层入口设计**: All CLI invocations load full core even for `--version`
4. **esbuild 单体 bundle**: No code splitting, V8 lazy parsing defeated by monolithic file
5. **缺少 API 预连接**: First API call pays full TCP+TLS handshake (100-200ms)
6. **重型依赖未延迟**: OpenTelemetry, highlight.js, React/Ink etc. loaded eagerly
7. **缺少启动期间输入捕获**: REPL initialization (200-500ms) drops user keystrokes
8. **缺少条件编译 DCE**: No dead code elimination for internal-only features
9. **启动性能度量未接入**: `recordStartupPerformance()` exists but not called from `main()`

qwen-code 启动存在多个性能瓶颈：core barrel 全量加载、启动路径串行、缺少分层入口、单体 bundle、缺少 API 预连接、重型依赖未延迟、启动期间输入丢失、缺少条件编译、性能度量未接入。

## Design / 整体设计方案

### Optimization Strategy Pyramid

```
           ┌──────────────────┐
           │   零加载快速路径   │  --version 零导入
           ├──────────────────┤
           │   工具懒加载      │  Tool factory pattern
           ├──────────────────┤
           │   火忘式预取      │  void someAsyncFn()
           ├──────────────────┤
           │   API 预连接      │  TCP+TLS preconnect
           ├──────────────────┤
           │   依赖延迟加载    │  dynamic import()
           ├──────────────────┤
           │   代码分割        │  esbuild splitting
           ├──────────────────┤
           │   体验优化        │  Early input capture
           └──────────────────┘
```

### Target Startup Flow

```
用户执行 qwen
│
▼ cli.ts (引导入口)                    ← NEW: 轻量引导入口
│  --version? → 零导入，直接退出
│  --help?    → 仅加载 yargs help
│  mcp?       → 仅加载 MCP handler
│  默认路径 ↓
│
│  startCapturingEarlyInput()           ← NEW: 提前捕获用户输入
│
▼ await import('./gemini.js')           ← 动态加载主模块
│  [模块求值期间]:
│    startKeychainPrefetch()            ← NEW: 并行子进程预取
│    startGitDetection()                ← NEW: 并行子进程预取
│
▼ main()
│  [关键路径 — await]:
│    await Promise.all([settings, args]) ← NEW: 并行加载
│    await loadCliConfig()
│    await performInitialAuth()
│
│  [火忘式预取 — 不 await]:
│    void preconnectApi()               ← NEW: TCP+TLS 预连接
│    void initTelemetry()               ← NEW: 延迟加载 OTel
│    void extensionManager.refreshCache()
│    void connectMcpServers()
│    void checkForUpdates()
│
▼ startInteractiveUI()                  ← 用户看到 REPL
│  startDeferredPrefetches()            ← 渲染后延迟预取
│  drainEarlyInput()                    ← 注入启动期间用户输入
```

### Sub-issues Breakdown

| # | Issue | Priority | Deps | Est. Impact |
|---|-------|----------|------|-------------|
| #3219 | Add startup performance profiler | P1 | None | Foundation for all optimization |
| #3220 | Bootstrap entry point with fast paths | P0 | #3219 | `--version` < 50ms |
| #3221 | Lazy tool registration | P0 | #3219 | 30%+ reduction in init phase |
| #3222 | Fire-and-forget prefetch | P1 | #3219, #3221 | REPL renders before MCP/telemetry |
| #3223 | API TCP+TLS preconnect | P1 | #3219 | 100ms+ saved on first API call |
| #3224 | Early input capture during REPL init | P2 | None | No keystroke loss |
| #3225 | Lazy load heavy optional dependencies | P2 | #3221 | Reduced initial bundle parse |
| #3226 | esbuild code splitting | P2 | #3225 | V8 lazy parsing effective |

### Dependency Graph

```
#3219 (Profiler) ──┬──→ #3220 (Bootstrap entry)   [P0, 可并行]
                    ├──→ #3221 (Lazy tools)        [P0, 可并行]
                    │       └──→ #3225 (Lazy deps) ──→ #3226 (Code splitting)
                    ├──→ #3222 (Fire-and-forget)   [P1]
                    └──→ #3223 (API preconnect)    [P1]

#3224 (Early input) [P2, 独立可做]

已有独立 issue: #2996 (Bare Mode) — 正交，可并行
```

### Implementation Order

```
Phase 1: #3219 (Profiler)                              ← 先建度量
Phase 2: #3220 + #3221 (Bootstrap + Lazy tools)         ← 并行，最大收益
Phase 3: #3222 + #3223 (Fire-and-forget + Preconnect)   ← 并行
Phase 4: #3224 (Early input)                            ← 独立
Phase 5: #3225 → #3226 (Lazy deps → Code splitting)     ← 链式依赖
```

### Comparison with Claude Code

| Dimension | Claude Code | qwen-code (Current) | qwen-code (Target) |
|-----------|-------------|---------------------|---------------------|
| Entry point | 3-layer, fast path zero-import | Single entry, full load | 2-layer, fast path |
| Parallel I/O | keychain/MDM parallel with import | Fully sequential | Parallel subprocess prefetch |
| Lazy loading | Extensive dynamic import() | 5-6 spots only | Tool factory + deps lazy load |
| DCE | feature() compile-time elimination | None | esbuild define |
| Caching | Multi-level settings + prompt cache latch | Minimal | Startup settings cache |
| Preconnect | API TCP+TLS preconnect | None | Fire-and-forget preconnect |
| Input capture | Early input during startup | None | stdin raw mode buffering |
| Profiling | Dual-mode startupProfiler | recordStartupPerformance() not wired | Checkpoint profiler + 1% sampling |

## Benefit / 收益

- TCP preconnect saves 150ms on first API call
- Early input capture preserves keystrokes during REPL init
- Lazy tool registration reduces init phase by 30%+
- Bootstrap entry makes `--version` < 50ms (from full core load)
- Fire-and-forget prefetch makes REPL render before MCP/telemetry completion
- Code splitting enables V8 lazy parsing for faster startup
- Overall perceived startup time reduced by 40-60%

启动时 TCP preconnect 节省首次 API 调用 150ms，早期输入捕获保留 REPL 初始化期间的击键，工具懒加载减少初始化阶段 30%+，引导入口使 `--version` < 50ms，火忘式预取使 REPL 在 MCP/遥测完成前渲染，代码分割使 V8 懒解析生效，整体感知启动时间减少 40-60%。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[P1] Startup Optimization / 启动优化 #3011

Problem / 问题

Design / 整体设计方案

Optimization Strategy Pyramid

Target Startup Flow

Sub-issues Breakdown

Dependency Graph

Implementation Order

Comparison with Claude Code

Benefit / 收益

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

#	Issue	Priority	Deps	Est. Impact
#3219	Add startup performance profiler	P1	None	Foundation for all optimization
#3220	Bootstrap entry point with fast paths	P0	#3219	`--version` < 50ms
#3221	Lazy tool registration	P0	#3219	30%+ reduction in init phase
#3222	Fire-and-forget prefetch	P1	#3219, #3221	REPL renders before MCP/telemetry
#3223	API TCP+TLS preconnect	P1	#3219	100ms+ saved on first API call
#3224	Early input capture during REPL init	P2	None	No keystroke loss
#3225	Lazy load heavy optional dependencies	P2	#3221	Reduced initial bundle parse
#3226	esbuild code splitting	P2	#3225	V8 lazy parsing effective

Dimension	Claude Code	qwen-code (Current)	qwen-code (Target)
Entry point	3-layer, fast path zero-import	Single entry, full load	2-layer, fast path
Parallel I/O	keychain/MDM parallel with import	Fully sequential	Parallel subprocess prefetch
Lazy loading	Extensive dynamic import()	5-6 spots only	Tool factory + deps lazy load
DCE	feature() compile-time elimination	None	esbuild define
Caching	Multi-level settings + prompt cache latch	Minimal	Startup settings cache
Preconnect	API TCP+TLS preconnect	None	Fire-and-forget preconnect
Input capture	Early input during startup	None	stdin raw mode buffering
Profiling	Dual-mode startupProfiler	recordStartupPerformance() not wired	Checkpoint profiler + 1% sampling

[P1] Startup Optimization / 启动优化 #3011

Description

Problem / 问题

Design / 整体设计方案

Optimization Strategy Pyramid

Target Startup Flow

Sub-issues Breakdown

Dependency Graph

Implementation Order

Comparison with Claude Code

Benefit / 收益

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions