What's Changed
- Add core FSDP2 data-parallel runtime by @winglian in #4
- Fix FSDP2 sample generate registration by @winglian in #5
- Add FSDP2 pytest coverage and fix CI GPU-test duplication by @winglian in #6
- Flash Speculative Decoding for Sampling by @winglian in #7
- Add ScatterMoE kernel and Qwen3.6 DFlash wiring by @winglian in #9
- Register HFScatterMoEParallelExperts layer mapping for MoE replacement by @winglian in #11
- Fall back to AutoConfig.model_type when the repo string doesn't match by @winglian in #10
- Add loss-plugin registry for operator-installed losses by @winglian in #12
- chore: misc standalone scattermoe updates by @winglian in #13
- fix: reduce target logprob memory by @winglian in #14
- Fixes for FP8 with ao by @winglian in #15
- feat(tinker-compat): /api/v1/create_sampling_session + tinker-wire parity tests by @winglian in #16
Full Changelog: v0.1.0...v0.2.0