Legacy Voice API

This repository provides a local HTTP API for voice-conditioned text-to-speech.

It is designed to be easy for AI agents, tool wrappers, and skills to use. The repo now runs a vendored legacy synthesis path locally instead of depending on a live upstream checkout during runtime.

If you are building an agent skill, the shortest useful summary is:

call GET /health to verify the server is ready
call GET /api/v1/voices/list to discover valid voice_profile names
call POST /api/v1/tts/synthesize with text and voice_profile
save the returned WAV bytes to a file

Purpose

This project exists to expose a stable local voice synthesis engine over HTTP so that:

AI assistants can generate speech without shelling into an older CLI repo
automation tools can discover available voice profiles dynamically
local workflows can produce WAV output from a simple authenticated API

High-Level Behavior

The API accepts plain text plus a named voice profile and returns a generated WAV file.

Voice profiles are stored on disk in voice_profiles/. Model assets are stored in weights/. The synthesis engine itself is local to this repo and exposed through FastAPI.

Agent-Friendly Summary

Input contract

To synthesize speech, an agent needs:

text: the text to speak
voice_profile: the exact folder name of an available voice profile
Authorization header: Bearer <token>

Output contract

The synth endpoint returns:

HTTP 200 OK
response body as audio/wav
suggested filename synthesized_speech.wav

Discovery flow for tools or skills

Recommended agent flow:

Check GET /health
Get token
Call GET /api/v1/voices/list
Pick a valid voice
Call POST /api/v1/tts/synthesize
Save returned bytes as .wav

Endpoints

`GET /health`

Simple readiness check.

Example response:

{
  "status": "healthy"
}

`GET /api/v1/voices/list`

Returns available voice profile names from the local voice_profiles/ folder.

Example response:

{
  "profiles": ["Wayne", "House", "Tony_Stark"]
}

`POST /api/v1/tts/synthesize`

Generate speech using a named voice profile.

Request body:

{
  "text": "Hello from the voice API.",
  "voice_profile": "Wayne"
}

Successful response:

status: 200
content type: audio/wav

Error response:

{
  "detail": "error message"
}

Authentication

The API expects a bearer token signed with the configured SECRET_KEY.

Local helper:

python scripts/generate_token.py

That script prints a token you can use as:

Authorization: Bearer <token>

Quick Start

From the project root:

python scripts/setup_directories.py
python scripts/verify_setup.py
docker compose build
docker compose up -d

Health check:

curl.exe -s http://localhost:8081/health

Windows Test Scripts

Included helper scripts:

scripts/test_health.bat
scripts/test_list_voices.bat
scripts/test_synthesize.bat

Examples:

scripts\test_health.bat
scripts\test_list_voices.bat
scripts\test_synthesize.bat Wayne "Hello from Wayne."
scripts\test_synthesize.bat Wayne "Hello from Wayne." test_outputs\wayne.wav

Voice Profile Format

Each voice profile is a folder under voice_profiles/.

Example:

voice_profiles/
  Wayne/
    1_Wayne.mp3
    1_Wayne.txt
    samples.txt
    generated/

The API reads the first line of samples.txt.

Required format:

1_Wayne.mp3|It's okay with like a quad though, like my buddy Big T's got a snorkel kit on his and that's pretty punk rock.

Important rules:

the filename must exist inside the voice folder
the transcript should match the spoken audio exactly
the folder name is the voice_profile value agents must send

Model Assets

Expected files in weights/:

final_finetuned_model.pt
model_1200000.pt
F5TTS_Base_vocab.txt

The service checks for final_finetuned_model.pt first, then falls back to model_1200000.pt.

Important

Model weights are intentionally not part of this repository and should not be committed or published with it.

This repo expects you to supply the model assets locally in weights/.

Where to get the weights

Use the same base model assets referenced by the original F5-TTS project:

checkpoint: F5TTS_Base/model_1200000.pt
vocab: F5TTS_Base/vocab.txt

You can obtain them from the model release referenced by the original F5-TTS distribution:

Hugging Face model repo: SWivid/F5-TTS

After downloading:

place model_1200000.pt in weights/
place vocab.txt in weights/ as F5TTS_Base_vocab.txt
optionally duplicate or rename the checkpoint to final_finetuned_model.pt if you want that path to be the primary file the API picks up

Example Requests

Curl

curl -X POST "http://localhost:8081/api/v1/tts/synthesize" ^
  -H "Authorization: Bearer YOUR_TOKEN" ^
  -H "Content-Type: application/json" ^
  -d "{\"text\":\"Hello from the API.\",\"voice_profile\":\"Wayne\"}" ^
  --output output.wav

PowerShell

$token = "YOUR_TOKEN"
$body = @{
  text = "Hello from the API."
  voice_profile = "Wayne"
} | ConvertTo-Json -Compress

Invoke-WebRequest `
  -Uri "http://localhost:8081/api/v1/tts/synthesize" `
  -Method Post `
  -Headers @{ Authorization = "Bearer $token" } `
  -ContentType "application/json" `
  -Body $body `
  -OutFile "output.wav"

Skill / Tool Builder Notes

This section is intentionally written for people building agent skills, MCP wrappers, or tool-call adapters.

Recommended tool behavior

A good tool wrapper should:

validate server health before synthesis
fetch available voices instead of hardcoding them
surface voice names exactly as returned by the API
store audio output to disk and return the saved path
return meaningful errors when auth fails or the voice name is missing

Suggested tool schema

Minimal arguments for an agent tool:

{
  "text": "string",
  "voice_profile": "string",
  "output_path": "optional string"
}

Optional skill behavior:

auto-list voices when the requested one is missing
default output_path to a temp wav path
preserve exact text instead of rewriting it unless asked

Suggested skill workflow

Call /health
Get or refresh token
Call /api/v1/voices/list
Match requested voice name
Call /api/v1/tts/synthesize
Save WAV
Return file path and selected voice

Suggested failure handling

If voice_profile is invalid:

call /api/v1/voices/list
show the valid choices

If synthesis returns 500:

surface the error detail
keep the original request text and voice name in the error context

Suggested Skill Prompt Snippet

If you are creating a skill for an AI model, a prompt seed like this works well:

Use the local Legacy Voice API.
Always check /health first.
Discover voices from /api/v1/voices/list before choosing a voice_profile.
When synthesizing, send exact user text unless the user asked for rewriting.
Save returned WAV bytes to a file and report the final file path.

Repo Structure

app/
  api/
  core/
  services/
model/
scripts/
test_outputs/
voice_profiles/
weights/
docker-compose.yml
Dockerfile
requirements.txt

Important implementation files:

app/services/tts_service.py
app/services/legacy_f5_infer.py
app/api/routes/voices.py
app/api/routes/tts.py

Development Notes

This repo intentionally prioritizes reproducing a known-good local synthesis path over tracking newer upstream behavior.

If you change synthesis logic:

docker compose build
docker compose up -d --force-recreate
docker compose logs --tail 200 api

License / Model Responsibility

This repo is a local API layer plus vendored legacy synthesis logic. You are responsible for ensuring your use of the model assets and voice material is appropriate for your environment and use case.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
app		app
model		model
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
PLAN.md		PLAN.md
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Legacy Voice API

Purpose

High-Level Behavior

Agent-Friendly Summary

Input contract

Output contract

Discovery flow for tools or skills

Endpoints

GET /health

GET /api/v1/voices/list

POST /api/v1/tts/synthesize

Authentication

Quick Start

Windows Test Scripts

Voice Profile Format

Model Assets

Important

Where to get the weights

Example Requests

Curl

PowerShell

Skill / Tool Builder Notes

Recommended tool behavior

Suggested tool schema

Suggested skill workflow

Suggested failure handling

Suggested Skill Prompt Snippet

Repo Structure

Development Notes

License / Model Responsibility

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`GET /health`

`GET /api/v1/voices/list`

`POST /api/v1/tts/synthesize`

Packages