Skip to content

[P1] Startup Optimization / 启动优化 #3011

@pomelo-nwu

Description

@pomelo-nwu

Priority: P1
Labels: area:performance
Reference: View Document

Problem / 问题

qwen-code startup has multiple performance bottlenecks identified through comparison with Claude Code's optimization strategies:

  1. Core barrel export 全量加载: @qwen-code/qwen-code-core re-exports 50+ modules via export *, forcing entire dependency graph evaluation at startup
  2. 启动路径完全串行: gemini.tsx main() is fully sequential with no parallelism
  3. 缺少分层入口设计: All CLI invocations load full core even for --version
  4. esbuild 单体 bundle: No code splitting, V8 lazy parsing defeated by monolithic file
  5. 缺少 API 预连接: First API call pays full TCP+TLS handshake (100-200ms)
  6. 重型依赖未延迟: OpenTelemetry, highlight.js, React/Ink etc. loaded eagerly
  7. 缺少启动期间输入捕获: REPL initialization (200-500ms) drops user keystrokes
  8. 缺少条件编译 DCE: No dead code elimination for internal-only features
  9. 启动性能度量未接入: recordStartupPerformance() exists but not called from main()

qwen-code 启动存在多个性能瓶颈:core barrel 全量加载、启动路径串行、缺少分层入口、单体 bundle、缺少 API 预连接、重型依赖未延迟、启动期间输入丢失、缺少条件编译、性能度量未接入。

Design / 整体设计方案

Optimization Strategy Pyramid

           ┌──────────────────┐
           │   零加载快速路径   │  --version 零导入
           ├──────────────────┤
           │   工具懒加载      │  Tool factory pattern
           ├──────────────────┤
           │   火忘式预取      │  void someAsyncFn()
           ├──────────────────┤
           │   API 预连接      │  TCP+TLS preconnect
           ├──────────────────┤
           │   依赖延迟加载    │  dynamic import()
           ├──────────────────┤
           │   代码分割        │  esbuild splitting
           ├──────────────────┤
           │   体验优化        │  Early input capture
           └──────────────────┘

Target Startup Flow

用户执行 qwen
│
▼ cli.ts (引导入口)                    ← NEW: 轻量引导入口
│  --version? → 零导入,直接退出
│  --help?    → 仅加载 yargs help
│  mcp?       → 仅加载 MCP handler
│  默认路径 ↓
│
│  startCapturingEarlyInput()           ← NEW: 提前捕获用户输入
│
▼ await import('./gemini.js')           ← 动态加载主模块
│  [模块求值期间]:
│    startKeychainPrefetch()            ← NEW: 并行子进程预取
│    startGitDetection()                ← NEW: 并行子进程预取
│
▼ main()
│  [关键路径 — await]:
│    await Promise.all([settings, args]) ← NEW: 并行加载
│    await loadCliConfig()
│    await performInitialAuth()
│
│  [火忘式预取 — 不 await]:
│    void preconnectApi()               ← NEW: TCP+TLS 预连接
│    void initTelemetry()               ← NEW: 延迟加载 OTel
│    void extensionManager.refreshCache()
│    void connectMcpServers()
│    void checkForUpdates()
│
▼ startInteractiveUI()                  ← 用户看到 REPL
│  startDeferredPrefetches()            ← 渲染后延迟预取
│  drainEarlyInput()                    ← 注入启动期间用户输入

Sub-issues Breakdown

# Issue Priority Deps Est. Impact
#3219 Add startup performance profiler P1 None Foundation for all optimization
#3220 Bootstrap entry point with fast paths P0 #3219 --version < 50ms
#3221 Lazy tool registration P0 #3219 30%+ reduction in init phase
#3222 Fire-and-forget prefetch P1 #3219, #3221 REPL renders before MCP/telemetry
#3223 API TCP+TLS preconnect P1 #3219 100ms+ saved on first API call
#3224 Early input capture during REPL init P2 None No keystroke loss
#3225 Lazy load heavy optional dependencies P2 #3221 Reduced initial bundle parse
#3226 esbuild code splitting P2 #3225 V8 lazy parsing effective

Dependency Graph

#3219 (Profiler) ──┬──→ #3220 (Bootstrap entry)   [P0, 可并行]
                    ├──→ #3221 (Lazy tools)        [P0, 可并行]
                    │       └──→ #3225 (Lazy deps) ──→ #3226 (Code splitting)
                    ├──→ #3222 (Fire-and-forget)   [P1]
                    └──→ #3223 (API preconnect)    [P1]

#3224 (Early input) [P2, 独立可做]

已有独立 issue: #2996 (Bare Mode) — 正交,可并行

Implementation Order

Phase 1: #3219 (Profiler)                              ← 先建度量
Phase 2: #3220 + #3221 (Bootstrap + Lazy tools)         ← 并行,最大收益
Phase 3: #3222 + #3223 (Fire-and-forget + Preconnect)   ← 并行
Phase 4: #3224 (Early input)                            ← 独立
Phase 5: #3225 → #3226 (Lazy deps → Code splitting)     ← 链式依赖

Comparison with Claude Code

Dimension Claude Code qwen-code (Current) qwen-code (Target)
Entry point 3-layer, fast path zero-import Single entry, full load 2-layer, fast path
Parallel I/O keychain/MDM parallel with import Fully sequential Parallel subprocess prefetch
Lazy loading Extensive dynamic import() 5-6 spots only Tool factory + deps lazy load
DCE feature() compile-time elimination None esbuild define
Caching Multi-level settings + prompt cache latch Minimal Startup settings cache
Preconnect API TCP+TLS preconnect None Fire-and-forget preconnect
Input capture Early input during startup None stdin raw mode buffering
Profiling Dual-mode startupProfiler recordStartupPerformance() not wired Checkpoint profiler + 1% sampling

Benefit / 收益

  • TCP preconnect saves 150ms on first API call
  • Early input capture preserves keystrokes during REPL init
  • Lazy tool registration reduces init phase by 30%+
  • Bootstrap entry makes --version < 50ms (from full core load)
  • Fire-and-forget prefetch makes REPL render before MCP/telemetry completion
  • Code splitting enables V8 lazy parsing for faster startup
  • Overall perceived startup time reduced by 40-60%

启动时 TCP preconnect 节省首次 API 调用 150ms,早期输入捕获保留 REPL 初始化期间的击键,工具懒加载减少初始化阶段 30%+,引导入口使 --version < 50ms,火忘式预取使 REPL 在 MCP/遥测完成前渲染,代码分割使 V8 懒解析生效,整体感知启动时间减少 40-60%。

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

Status

No status

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions