Priority: P1
Labels: area:performance
Reference: View Document
Problem / 问题
qwen-code startup has multiple performance bottlenecks identified through comparison with Claude Code's optimization strategies:
- Core barrel export 全量加载:
@qwen-code/qwen-code-core re-exports 50+ modules via export *, forcing entire dependency graph evaluation at startup
- 启动路径完全串行:
gemini.tsx main() is fully sequential with no parallelism
- 缺少分层入口设计: All CLI invocations load full core even for
--version
- esbuild 单体 bundle: No code splitting, V8 lazy parsing defeated by monolithic file
- 缺少 API 预连接: First API call pays full TCP+TLS handshake (100-200ms)
- 重型依赖未延迟: OpenTelemetry, highlight.js, React/Ink etc. loaded eagerly
- 缺少启动期间输入捕获: REPL initialization (200-500ms) drops user keystrokes
- 缺少条件编译 DCE: No dead code elimination for internal-only features
- 启动性能度量未接入:
recordStartupPerformance() exists but not called from main()
qwen-code 启动存在多个性能瓶颈:core barrel 全量加载、启动路径串行、缺少分层入口、单体 bundle、缺少 API 预连接、重型依赖未延迟、启动期间输入丢失、缺少条件编译、性能度量未接入。
Design / 整体设计方案
Optimization Strategy Pyramid
┌──────────────────┐
│ 零加载快速路径 │ --version 零导入
├──────────────────┤
│ 工具懒加载 │ Tool factory pattern
├──────────────────┤
│ 火忘式预取 │ void someAsyncFn()
├──────────────────┤
│ API 预连接 │ TCP+TLS preconnect
├──────────────────┤
│ 依赖延迟加载 │ dynamic import()
├──────────────────┤
│ 代码分割 │ esbuild splitting
├──────────────────┤
│ 体验优化 │ Early input capture
└──────────────────┘
Target Startup Flow
用户执行 qwen
│
▼ cli.ts (引导入口) ← NEW: 轻量引导入口
│ --version? → 零导入,直接退出
│ --help? → 仅加载 yargs help
│ mcp? → 仅加载 MCP handler
│ 默认路径 ↓
│
│ startCapturingEarlyInput() ← NEW: 提前捕获用户输入
│
▼ await import('./gemini.js') ← 动态加载主模块
│ [模块求值期间]:
│ startKeychainPrefetch() ← NEW: 并行子进程预取
│ startGitDetection() ← NEW: 并行子进程预取
│
▼ main()
│ [关键路径 — await]:
│ await Promise.all([settings, args]) ← NEW: 并行加载
│ await loadCliConfig()
│ await performInitialAuth()
│
│ [火忘式预取 — 不 await]:
│ void preconnectApi() ← NEW: TCP+TLS 预连接
│ void initTelemetry() ← NEW: 延迟加载 OTel
│ void extensionManager.refreshCache()
│ void connectMcpServers()
│ void checkForUpdates()
│
▼ startInteractiveUI() ← 用户看到 REPL
│ startDeferredPrefetches() ← 渲染后延迟预取
│ drainEarlyInput() ← 注入启动期间用户输入
Sub-issues Breakdown
| # |
Issue |
Priority |
Deps |
Est. Impact |
| #3219 |
Add startup performance profiler |
P1 |
None |
Foundation for all optimization |
| #3220 |
Bootstrap entry point with fast paths |
P0 |
#3219 |
--version < 50ms |
| #3221 |
Lazy tool registration |
P0 |
#3219 |
30%+ reduction in init phase |
| #3222 |
Fire-and-forget prefetch |
P1 |
#3219, #3221 |
REPL renders before MCP/telemetry |
| #3223 |
API TCP+TLS preconnect |
P1 |
#3219 |
100ms+ saved on first API call |
| #3224 |
Early input capture during REPL init |
P2 |
None |
No keystroke loss |
| #3225 |
Lazy load heavy optional dependencies |
P2 |
#3221 |
Reduced initial bundle parse |
| #3226 |
esbuild code splitting |
P2 |
#3225 |
V8 lazy parsing effective |
Dependency Graph
#3219 (Profiler) ──┬──→ #3220 (Bootstrap entry) [P0, 可并行]
├──→ #3221 (Lazy tools) [P0, 可并行]
│ └──→ #3225 (Lazy deps) ──→ #3226 (Code splitting)
├──→ #3222 (Fire-and-forget) [P1]
└──→ #3223 (API preconnect) [P1]
#3224 (Early input) [P2, 独立可做]
已有独立 issue: #2996 (Bare Mode) — 正交,可并行
Implementation Order
Phase 1: #3219 (Profiler) ← 先建度量
Phase 2: #3220 + #3221 (Bootstrap + Lazy tools) ← 并行,最大收益
Phase 3: #3222 + #3223 (Fire-and-forget + Preconnect) ← 并行
Phase 4: #3224 (Early input) ← 独立
Phase 5: #3225 → #3226 (Lazy deps → Code splitting) ← 链式依赖
Comparison with Claude Code
| Dimension |
Claude Code |
qwen-code (Current) |
qwen-code (Target) |
| Entry point |
3-layer, fast path zero-import |
Single entry, full load |
2-layer, fast path |
| Parallel I/O |
keychain/MDM parallel with import |
Fully sequential |
Parallel subprocess prefetch |
| Lazy loading |
Extensive dynamic import() |
5-6 spots only |
Tool factory + deps lazy load |
| DCE |
feature() compile-time elimination |
None |
esbuild define |
| Caching |
Multi-level settings + prompt cache latch |
Minimal |
Startup settings cache |
| Preconnect |
API TCP+TLS preconnect |
None |
Fire-and-forget preconnect |
| Input capture |
Early input during startup |
None |
stdin raw mode buffering |
| Profiling |
Dual-mode startupProfiler |
recordStartupPerformance() not wired |
Checkpoint profiler + 1% sampling |
Benefit / 收益
- TCP preconnect saves 150ms on first API call
- Early input capture preserves keystrokes during REPL init
- Lazy tool registration reduces init phase by 30%+
- Bootstrap entry makes
--version < 50ms (from full core load)
- Fire-and-forget prefetch makes REPL render before MCP/telemetry completion
- Code splitting enables V8 lazy parsing for faster startup
- Overall perceived startup time reduced by 40-60%
启动时 TCP preconnect 节省首次 API 调用 150ms,早期输入捕获保留 REPL 初始化期间的击键,工具懒加载减少初始化阶段 30%+,引导入口使 --version < 50ms,火忘式预取使 REPL 在 MCP/遥测完成前渲染,代码分割使 V8 懒解析生效,整体感知启动时间减少 40-60%。
Problem / 问题
qwen-code startup has multiple performance bottlenecks identified through comparison with Claude Code's optimization strategies:
@qwen-code/qwen-code-corere-exports 50+ modules viaexport *, forcing entire dependency graph evaluation at startupgemini.tsx main()is fully sequential with no parallelism--versionrecordStartupPerformance()exists but not called frommain()qwen-code 启动存在多个性能瓶颈:core barrel 全量加载、启动路径串行、缺少分层入口、单体 bundle、缺少 API 预连接、重型依赖未延迟、启动期间输入丢失、缺少条件编译、性能度量未接入。
Design / 整体设计方案
Optimization Strategy Pyramid
Target Startup Flow
Sub-issues Breakdown
--version< 50msDependency Graph
Implementation Order
Comparison with Claude Code
Benefit / 收益
--version< 50ms (from full core load)启动时 TCP preconnect 节省首次 API 调用 150ms,早期输入捕获保留 REPL 初始化期间的击键,工具懒加载减少初始化阶段 30%+,引导入口使
--version< 50ms,火忘式预取使 REPL 在 MCP/遥测完成前渲染,代码分割使 V8 懒解析生效,整体感知启动时间减少 40-60%。