Skip to content

Version 1.12.5 - Fish Speech with Compilation Speed-Up

Latest

Choose a tag to compare

@FranckyB FranckyB released this 08 Apr 13:03
cff8079

Fish Speech S2 Pro Integration and Features:

Added Fish Speech S2 Pro (4B parameter TTS model) as a new engine in the UI, supporting 80+ languages and 15,000+ inline expression tags for fine-grained speech control. The model is auto-downloaded from HuggingFace, and [tag] syntax is supported and automatically stripped for other engines.

The TTS model manager (tts_manager.py) now provides get_fish_speech() and generate_voice_clone_fish_speech() methods, handling model loading, reference audio encoding, kernel cache status reporting, and batch/paragraph generation with all Fish Speech parameters exposed.

Performance and Stability Enhancements:

Implemented Triton/Inductor GPU kernel compilation with persistent caching in models/.cache for Fish Speech, significantly accelerating repeat generations. Added Windows-specific patches for compilation compatibility and informative first-run cache warnings.