Qwen2 supports multimodal input, and integrating speech-to-text would enable voice-enabled Qwen2 applications. FunASR (17.8K+ stars, https://github.com/modelscope/FunASR) is from the same Alibaba/ModelScope ecosystem and provides:
- SenseVoice: Ultra-fast multilingual ASR (50x faster than Whisper-large, strong CJK + Cantonese)
- Paraformer: Non-streaming ASR with timestamps and punctuation
- Fun-ASR-Nano: Lightweight streaming ASR for edge deployment
- OpenAI-compatible API: POST /v1/audio/transcriptions
Since FunASR and Qwen2 are both ModelScope projects, they naturally complement each other — FunASR handles speech input while Qwen2 handles text understanding and generation. Together they form a complete voice-to-text-to-response pipeline, all self-hosted.
Would it be useful to add FunASR integration examples in Qwen2 docs?
Qwen2 supports multimodal input, and integrating speech-to-text would enable voice-enabled Qwen2 applications. FunASR (17.8K+ stars, https://github.com/modelscope/FunASR) is from the same Alibaba/ModelScope ecosystem and provides:
Since FunASR and Qwen2 are both ModelScope projects, they naturally complement each other — FunASR handles speech input while Qwen2 handles text understanding and generation. Together they form a complete voice-to-text-to-response pipeline, all self-hosted.
Would it be useful to add FunASR integration examples in Qwen2 docs?