-
Notifications
You must be signed in to change notification settings - Fork 24
Voice Upload & Library
The Chatterbox TTS API now includes a comprehensive voice library management system that allows users to upload, manage, and use custom voices across all speech generation endpoints. This feature enables you to create a persistent collection of voices that can be referenced by name in API calls.
- Persistent Voice Storage: Uploaded voices are stored persistently and survive container restarts
- Voice Selection by Name: Reference uploaded voices by name in any speech generation endpoint
- Multiple Audio Formats: Support for MP3, WAV, FLAC, M4A, and OGG files
- RESTful Voice Management: Full CRUD operations for voice management
- Docker & Local Support: Works seamlessly with both Docker and direct Python installations
- Frontend Integration: Complete voice management UI in the web frontend
The voice library is automatically configured when using Docker. Voices are stored in a persistent volume:
# Start with voice library enabled
docker-compose up -d
# Your voices will be persisted in the "chatterbox-voices" Docker volume
Create a voice library directory (default: ./voices
):
# Create voices directory
mkdir voices
# Or set custom location
export VOICE_LIBRARY_DIR="/path/to/your/voices"
GET /v1/voices
Get a list of all voices in the library.
curl -X GET "http://localhost:4123/v1/voices"
Response:
{
"voices": [
{
"name": "sarah_professional",
"filename": "sarah_professional.mp3",
"original_filename": "sarah_recording.mp3",
"file_extension": ".mp3",
"file_size": 1024768,
"upload_date": "2024-01-15T10:30:00Z",
"path": "/voices/sarah_professional.mp3"
}
],
"count": 1
}
POST /v1/voices
Upload a new voice to the library.
curl -X POST "http://localhost:4123/v1/voices" \
-F "voice_name=sarah_professional" \
-F "voice_file=@/path/to/voice.mp3"
Parameters:
-
voice_name
(string): Name for the voice (used in API calls) -
voice_file
(file): Audio file (MP3, WAV, FLAC, M4A, OGG, max 10MB)
DELETE /v1/voices/{voice_name}
Delete a voice from the library.
curl -X DELETE "http://localhost:4123/v1/voices/sarah_professional"
PUT /v1/voices/{voice_name}
Rename an existing voice.
curl -X PUT "http://localhost:4123/v1/voices/sarah_professional" \
-F "new_name=sarah_business"
GET /v1/voices/{voice_name}
Get detailed information about a specific voice.
curl -X GET "http://localhost:4123/v1/voices/sarah_professional"
GET /v1/voices/{voice_name}/download
Download the original voice file.
curl -X GET "http://localhost:4123/v1/voices/sarah_professional/download" \
--output voice.mp3
Use the voice name in the voice
parameter:
curl -X POST "http://localhost:4123/v1/audio/speech" \
-H "Content-Type: application/json" \
-d '{
"input": "Hello! This is using my custom voice.",
"voice": "sarah_professional",
"exaggeration": 0.7,
"temperature": 0.8
}' \
--output speech.wav
curl -X POST "http://localhost:4123/v1/audio/speech/upload" \
-F "input=Hello! This is using my custom voice." \
-F "voice=sarah_professional" \
-F "exaggeration=0.7" \
--output speech.wav
curl -X POST "http://localhost:4123/v1/audio/speech/stream" \
-H "Content-Type: application/json" \
-d '{
"input": "This will stream with my custom voice.",
"voice": "sarah_professional"
}' \
--output stream.wav
# Voice library directory (default: ./voices for local, /voices for Docker)
VOICE_LIBRARY_DIR=/path/to/voices
# For Docker, this is typically set to /voices and mounted as a volume
The voice library is automatically configured in Docker with a persistent volume:
volumes:
- chatterbox-voices:/voices
- Letters (a-z, A-Z)
- Numbers (0-9)
- Underscores (_)
- Hyphens (-)
- Spaces (converted to underscores)
- Forward/backward slashes (/, \)
- Colons (:)
- Asterisks (*)
- Question marks (?)
- Quotes (", ')
- Angle brackets (<, >)
- Pipes (|)
β
Good names:
- "sarah_professional"
- "john-voice-2024"
- "female_american"
- "narration_style"
β Invalid names:
- "sarah/professional" # Contains slash
- "voice:sample" # Contains colon
- "my voice?" # Contains question mark
- Use high-quality audio samples (16-48kHz sample rate)
- Aim for 10-30 seconds of clean speech
- Avoid background noise and music
- Choose samples with consistent volume
- Use descriptive voice names
- Keep file sizes reasonable (< 10MB)
- Organize voices by speaker or style
- Clean up unused voices periodically
- Use the JSON API for better performance
- Cache voice lists on the client side
- Handle voice-not-found errors gracefully
- Test voices before production use
{
"error": {
"message": "Voice 'my_voice' not found in voice library. Use /voices endpoint to list available voices.",
"type": "voice_not_found_error"
}
}
Solution: Check available voices with GET /v1/voices
or upload the voice first.
{
"error": {
"message": "Unsupported audio format: .txt. Supported formats: .mp3, .wav, .flac, .m4a, .ogg",
"type": "invalid_request_error"
}
}
Solution: Use a supported audio format and ensure the file is valid.
{
"error": {
"message": "Voice 'sarah_professional' already exists",
"type": "voice_exists_error"
}
}
Solution: Use a different name or delete the existing voice first.
The web frontend includes a complete voice library management interface:
- Voice Library Panel: Browse and manage voices
- Upload Modal: Easy voice upload with drag-and-drop
- Voice Selection: Choose voices in the TTS interface
- Preview Playback: Listen to voice samples before use
- Rename/Delete: Manage voice metadata
If you were previously using the client-side voice library (localStorage), you'll need to re-upload your voices to the new server-side library for persistence and cross-device access.
All voice endpoints support multiple URL formats:
-
/v1/voices
(recommended) /voices
/voice-library
/voice_library
The voice parameter also accepts OpenAI voice names for compatibility:
-
alloy
,echo
,fable
,onyx
,nova
,shimmer
These will use the default configured voice sample, while custom names will use uploaded voices from the library.
- Voice files are stored on the server filesystem
- File uploads are validated for type and size
- Voice names are sanitized to prevent path traversal
- No authentication required (same as other endpoints)
- Voice library operations are fast (< 100ms typical)
- Voice files are loaded on-demand for TTS generation
- Large voice files may increase TTS processing time
- Consider voice file size vs. quality trade-offs
Planned features for future releases:
- Voice categorization and tagging
- Bulk voice operations
- Voice sharing between users
- Advanced voice metadata
- Voice quality analysis
- Automatic voice optimization
Successfully implemented voice file upload functionality for the Chatterbox TTS API, allowing users to upload custom voice samples per request while maintaining full backward compatibility.
python-multipart>=0.0.6 - Required for FastAPI multipart/form-data support
Files Updated:
-
requirements.txt
- Added python-multipart dependency -
pyproject.toml
- Added python-multipart to project dependencies - All Docker files - Added python-multipart to pip install commands
New Features:
- β
Voice file upload support - Optional
voice_file
parameter - β Multiple endpoint formats - Both JSON and form data support
- β File validation - Format, size, and content validation
- β Temporary file handling - Secure file processing with automatic cleanup
- β Backward compatibility - Existing JSON requests continue to work
Supported File Formats:
- MP3 (.mp3)
- WAV (.wav)
- FLAC (.flac)
- M4A (.m4a)
- OGG (.ogg)
- Maximum size: 10MB
New Endpoints:
-
POST /v1/audio/speech
- Multipart form data (supports voice upload) -
POST /v1/audio/speech/json
- Legacy JSON endpoint (backward compatibility)
New Test Files:
-
tests/test_voice_upload.py
- Dedicated voice upload testing - Updated
tests/test_api.py
- Tests both JSON and form data endpoints
Test Coverage:
- β Default voice (both endpoints)
- β Custom voice upload
- β File format validation
- β Error handling
- β Parameter validation
- β Backward compatibility
README.md Updates:
- Added voice upload examples
- Documented supported file formats
- Provided usage examples in multiple languages (Python, cURL)
- Added file requirements and best practices
# JSON (legacy)
curl -X POST http://localhost:4123/v1/audio/speech/json \
-H "Content-Type: application/json" \
-d '{"input": "Hello world!"}' \
--output output.wav
# Form data (new)
curl -X POST http://localhost:4123/v1/audio/speech \
-F "input=Hello world!" \
--output output.wav
curl -X POST http://localhost:4123/v1/audio/speech \
-F "input=Hello with my custom voice!" \
-F "exaggeration=0.8" \
-F "voice_file=@my_voice.mp3" \
--output custom_voice.wav
import requests
# With custom voice upload
with open("my_voice.mp3", "rb") as voice_file:
response = requests.post(
"http://localhost:4123/v1/audio/speech",
data={
"input": "Hello with my custom voice!",
"exaggeration": 0.8,
"temperature": 1.0
},
files={
"voice_file": ("my_voice.mp3", voice_file, "audio/mpeg")
}
)
with open("output.wav", "wb") as f:
f.write(response.content)
All Docker files updated with python-multipart:
-
docker/Dockerfile
- Standard Docker image -
docker/Dockerfile.cpu
- CPU-only image -
docker/Dockerfile.gpu
- GPU-enabled image -
docker/Dockerfile.uv
- uv-optimized image -
docker/Dockerfile.uv.gpu
- uv + GPU image
Docker Usage:
# Build and run with voice upload support
docker compose -f docker/docker-compose.yml up -d
# Test voice upload
curl -X POST http://localhost:4123/v1/audio/speech \
-F "input=Hello from Docker!" \
-F "[email protected]" \
--output docker_test.wav
- Upload - Receive multipart form data with optional voice file
- Validate - Check file format, size, and content
- Store - Create temporary file with secure naming
- Process - Use uploaded file or default voice sample for TTS
- Cleanup - Automatically remove temporary files
- Temporary files are automatically cleaned up in
finally
blocks - File validation prevents oversized uploads
- Secure temporary file creation with unique names
- File format validation with helpful error messages
- File size limits (10MB maximum)
- Graceful fallback to default voice on upload errors
- Comprehensive error responses with error codes
# Start the API
python main.py
# Run comprehensive tests
python tests/test_voice_upload.py
python tests/test_api.py
- β Health check
- β API documentation endpoints
- β Legacy JSON endpoint compatibility
- β New form data endpoint
- β Voice file upload functionality
- β Error handling and validation
The API documentation is automatically updated and available at:
- Swagger UI: http://localhost:4123/docs
- ReDoc: http://localhost:4123/redoc
- OpenAPI Schema: http://localhost:4123/openapi.json
The documentation now includes:
- Multipart form data support
- File upload parameters
- Example requests and responses
- Error codes and descriptions
100% backward compatible:
- Existing JSON requests work unchanged
- All previous API behavior preserved
- Legacy endpoint (
/v1/audio/speech/json
) maintains exact same interface - No breaking changes to existing functionality
- File type validation prevents malicious uploads
- File size limits prevent DoS attacks
- Temporary files use secure random naming
- Automatic cleanup prevents file system bloat
- No persistent storage of uploaded files
- Minimal overhead for JSON requests (unchanged code path)
- Temporary file I/O only when voice files are uploaded
- Efficient memory management with automatic cleanup
- FastAPI's built-in multipart handling is highly optimized
Status: β Complete and Production Ready
The voice upload feature is fully implemented, tested, and documented. Users can now upload custom voice files for personalized text-to-speech generation while maintaining full backward compatibility with existing implementations.