Fish Speech S2 Pro Integration and Features:
Added Fish Speech S2 Pro (4B parameter TTS model) as a new engine in the UI, supporting 80+ languages and 15,000+ inline expression tags for fine-grained speech control. The model is auto-downloaded from HuggingFace, and [tag] syntax is supported and automatically stripped for other engines.
The TTS model manager (tts_manager.py) now provides get_fish_speech() and generate_voice_clone_fish_speech() methods, handling model loading, reference audio encoding, kernel cache status reporting, and batch/paragraph generation with all Fish Speech parameters exposed.
Performance and Stability Enhancements:
Implemented Triton/Inductor GPU kernel compilation with persistent caching in models/.cache for Fish Speech, significantly accelerating repeat generations. Added Windows-specific patches for compilation compatibility and informative first-run cache warnings.