Rust implementation of VibeVoice text-to-speech with voice cloning and multi-speaker synthesis.
- High-quality text-to-speech synthesis
- Voice cloning from audio samples
- Multi-speaker dialogue synthesis
- GPU acceleration (Metal/CUDA)
- Streaming audio generation (realtime model)
| Crate | Description | Documentation |
|---|---|---|
vibevoice |
Core library | README |
vibevoice-cli |
Command-line interface | README |
vibevoice-server |
HTTP server with SSE streaming | README |
vibevoice-web |
Leptos web frontend | README |
vibevoice-tauri |
Desktop application | README |
- Rust 1.85+
- HuggingFace account and token
- GPU recommended (Metal on Apple Silicon, CUDA on NVIDIA)
# Create the cache directory
mkdir -p ~/.cache/huggingface
# Paste your token (get from https://huggingface.co/settings/tokens)
echo "hf_yourTokenHere" > ~/.cache/huggingface/token
# Secure it
chmod 600 ~/.cache/huggingface/tokenVery long inputs may run over the buffer.
Error: Metal error Failed to create metal resource: Buffer
PyTorch MPS uses optimized SDPA (Scaled Dot Product Attention) that doesn't materialize the full attention matrix. Candle has flash attention but it's CUDA-only, not available for Metal.
