Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 15 additions & 1 deletion .github/scripts/test-python.sh
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,20 @@ log() {
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

log "test Supertonic TTS"

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2
tar xvf sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2
rm sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2

python3 python-api-examples/supertonic-tts.py

rm -rf sherpa-onnx-supertonic-tts-int8-2026-03-06

mkdir -p tts
cp supertonic-en.wav tts/
ls -lh tts

log "test Moonshine v2"

curl -SL -O https://github.com/k2-fsa/sherpa-onnx/releases/download/asr-models/sherpa-onnx-moonshine-tiny-en-quantized-2026-02-27.tar.bz2
Expand Down Expand Up @@ -399,7 +413,7 @@ done

log "Offline TTS test"
# test waves are saved in ./tts
mkdir ./tts
mkdir -p ./tts

log "test kitten tts"

Expand Down
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -176,3 +176,4 @@ sherpa-onnx-sense-voice-zh-en-ja-ko-yue-int8-2024-07-17
sherpa-onnx-fire-red-asr2-ctc-zh_en-int8-2026-02-25
non-streaming-fire-red-asr-ctc-decode-files
sherpa-onnx-moonshine-*-quantized-2026-02-27
sherpa-onnx-supertonic-tts-int8-2026-03-06
102 changes: 102 additions & 0 deletions python-api-examples/supertonic-tts.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
#!/usr/bin/env python3
#
# Copyright (c) 2026 Xiaomi Corporation

"""
This file demonstrates how to use sherpa-onnx Python API
for SupertonicTTS.


Usage:

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2
tar xvf sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2
rm sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2

python3 ./supertonic-tts.py

You can find more models at
https://github.com/k2-fsa/sherpa-onnx/releases/tag/tts-models

Please see
https://k2-fsa.github.io/sherpa/onnx/tts/supertonic.html
for details.

"""

import time

import sherpa_onnx
import soundfile as sf


def create_tts():
tts_config = sherpa_onnx.OfflineTtsConfig(
model=sherpa_onnx.OfflineTtsModelConfig(
supertonic=sherpa_onnx.OfflineTtsSupertonicModelConfig(
duration_predictor="./sherpa-onnx-supertonic-tts-int8-2026-03-06/duration_predictor.int8.onnx",
text_encoder="./sherpa-onnx-supertonic-tts-int8-2026-03-06/text_encoder.int8.onnx",
vector_estimator="./sherpa-onnx-supertonic-tts-int8-2026-03-06/vector_estimator.int8.onnx",
vocoder="./sherpa-onnx-supertonic-tts-int8-2026-03-06/vocoder.int8.onnx",
tts_json="./sherpa-onnx-supertonic-tts-int8-2026-03-06/tts.json",
unicode_indexer="./sherpa-onnx-supertonic-tts-int8-2026-03-06/unicode_indexer.bin",
voice_style="./sherpa-onnx-supertonic-tts-int8-2026-03-06/voice.bin",
),
Comment on lines +33 to +44
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The model directory path is hardcoded in multiple places. To improve maintainability and make it easier to update the model version in the future, it's better to define the model directory in a variable and use f-strings to construct the file paths.

Suggested change
def create_tts():
tts_config = sherpa_onnx.OfflineTtsConfig(
model=sherpa_onnx.OfflineTtsModelConfig(
supertonic=sherpa_onnx.OfflineTtsSupertonicModelConfig(
duration_predictor="./sherpa-onnx-supertonic-tts-int8-2026-03-06/duration_predictor.int8.onnx",
text_encoder="./sherpa-onnx-supertonic-tts-int8-2026-03-06/text_encoder.int8.onnx",
vector_estimator="./sherpa-onnx-supertonic-tts-int8-2026-03-06/vector_estimator.int8.onnx",
vocoder="./sherpa-onnx-supertonic-tts-int8-2026-03-06/vocoder.int8.onnx",
tts_json="./sherpa-onnx-supertonic-tts-int8-2026-03-06/tts.json",
unicode_indexer="./sherpa-onnx-supertonic-tts-int8-2026-03-06/unicode_indexer.bin",
voice_style="./sherpa-onnx-supertonic-tts-int8-2026-03-06/voice.bin",
),
def create_tts():
model_dir = "sherpa-onnx-supertonic-tts-int8-2026-03-06"
tts_config = sherpa_onnx.OfflineTtsConfig(
model=sherpa_onnx.OfflineTtsModelConfig(
supertonic=sherpa_onnx.OfflineTtsSupertonicModelConfig(
duration_predictor=f"./{model_dir}/duration_predictor.int8.onnx",
text_encoder=f"./{model_dir}/text_encoder.int8.onnx",
vector_estimator=f"./{model_dir}/vector_estimator.int8.onnx",
vocoder=f"./{model_dir}/vocoder.int8.onnx",
tts_json=f"./{model_dir}/tts.json",
unicode_indexer=f"./{model_dir}/unicode_indexer.bin",
voice_style=f"./{model_dir}/voice.bin",
),

debug=False,
num_threads=2,
provider="cpu",
)
)
if not tts_config.validate():
raise ValueError(
"Please read the previous error messages and re-check your config"
)

return sherpa_onnx.OfflineTts(tts_config)


def main():
tts = create_tts()

text = "Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be, a statesman, a businessman, an official, or a scholar."

gen_config = sherpa_onnx.GenerationConfig()

# This model has 10 speakers. Valid sid: 0-9
gen_config.sid = 6
gen_config.num_steps = 5
gen_config.speed = 1.25 # larger -> faster

# We use en for English.
# You can also use es, pt, fr, ko.
# This single model supports 5 languages.
gen_config.extra["lang"] = "en"

start = time.time()
audio = tts.generate(text, gen_config)
end = time.time()

if len(audio.samples) == 0:
print("Error in generating audios. Please read previous error messages.")
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The wording is grammatically off here; “audio” is typically uncountable in this context. Consider changing “audios” to “audio”.

Suggested change
print("Error in generating audios. Please read previous error messages.")
print("Error in generating audio. Please read previous error messages.")

Copilot uses AI. Check for mistakes.
return

elapsed_seconds = end - start
audio_duration = len(audio.samples) / audio.sample_rate
real_time_factor = elapsed_seconds / audio_duration

output_filename = "./supertonic-en.wav"
sf.write(
output_filename,
audio.samples,
samplerate=audio.sample_rate,
subtype="PCM_16",
)
print(f"Saved to {output_filename}")
print(f"The text is '{text}'")
print(f"Elapsed seconds: {elapsed_seconds:.3f}")
print(f"Audio duration in seconds: {audio_duration:.3f}")
print(f"RTF: {elapsed_seconds:.3f}/{audio_duration:.3f} = {real_time_factor:.3f}")


if __name__ == "__main__":
main()
4 changes: 4 additions & 0 deletions sherpa-onnx/csrc/offline-tts-supertonic-impl.cc
Original file line number Diff line number Diff line change
Expand Up @@ -603,6 +603,10 @@ void OfflineTtsSupertonicImpl::InitVoiceStyle(const std::vector<char> &buf) {
}
num_speakers_ = num_speakers;
full_style_ = std::move(style);

if (config_.model.debug) {
SHERPA_ONNX_LOGE("Number of speakers: %d", num_speakers_);
Copy link

Copilot AI Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SHERPA_ONNX_LOGE logs at error level, but this message is emitted only when debug is enabled and isn’t an error condition. Consider switching to an info/debug-level macro (e.g., SHERPA_ONNX_LOGI/SHERPA_ONNX_LOGD) so debug runs don’t incorrectly surface errors in logs/CI.

Suggested change
SHERPA_ONNX_LOGE("Number of speakers: %d", num_speakers_);
SHERPA_ONNX_LOGD("Number of speakers: %d", num_speakers_);

Copilot uses AI. Check for mistakes.
}
Comment on lines +607 to +609
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using SHERPA_ONNX_LOGE for a debug message is misleading, as LOGE implies an error-level message. For debug information, it's better to use a logging level appropriate for debugging, such as SHERPA_ONNX_LOG(DEBUG). This improves clarity and allows for more granular control over log verbosity.

  if (config_.model.debug) {
    SHERPA_ONNX_LOG(DEBUG) << "Number of speakers: " << num_speakers_;
  }

}

OfflineTtsSupertonicImpl::StyleSliceView
Expand Down
1 change: 1 addition & 0 deletions sherpa-onnx/python/sherpa_onnx/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
OfflineTtsMatchaModelConfig,
OfflineTtsModelConfig,
OfflineTtsPocketModelConfig,
OfflineTtsSupertonicModelConfig,
OfflineTtsVitsModelConfig,
OfflineTtsZipvoiceModelConfig,
OfflineWenetCtcModelConfig,
Expand Down
Loading