Skip to content

Upload supertonic tts models#3263

Merged
csukuangfj merged 2 commits intok2-fsa:masterfrom
csukuangfj:ci-supertonic
Mar 6, 2026
Merged

Upload supertonic tts models#3263
csukuangfj merged 2 commits intok2-fsa:masterfrom
csukuangfj:ci-supertonic

Conversation

@csukuangfj
Copy link
Copy Markdown
Collaborator

@csukuangfj csukuangfj commented Mar 6, 2026

cc @Wasser1462 @rodrigomatta

Usage

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2
tar xvf sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2
rm sherpa-onnx-supertonic-tts-int8-2026-03-06.tar.bz2

ls -lh sherpa-onnx-supertonic-tts-int8-2026-03-06

You would get

ls -lh sherpa-onnx-supertonic-tts-int8-2026-03-06
total 188624
-rw-r--r--@ 1 fangjun  staff   1.5M  6 Mar 14:35 duration_predictor.int8.onnx
-rw-r--r--@ 1 fangjun  staff    11K  6 Mar 14:27 LICENSE
-rw-r--r--@ 1 fangjun  staff    20K  6 Mar 14:35 README.md
-rw-r--r--@ 1 fangjun  staff    26M  6 Mar 14:35 text_encoder.int8.onnx
-rw-r--r--@ 1 fangjun  staff   8.5K  6 Mar 14:35 tts.json
-rw-r--r--@ 1 fangjun  staff   256K  6 Mar 14:35 unicode_indexer.bin
-rw-r--r--@ 1 fangjun  staff    39M  6 Mar 14:35 vector_estimator.int8.onnx
-rw-r--r--@ 1 fangjun  staff    25M  6 Mar 14:35 vocoder.int8.onnx
-rw-r--r--@ 1 fangjun  staff   505K  6 Mar 14:35 voice.bin

And then build sherpa-onnx from master.

After that, run

Note that the single model supports 5 languages: English, Korean, Spanish, Portuguese, French.
We use en below for English.
You can use

  • ko for Korean
  • es for Spanish
  • pt for Portuguese
  • fr for French

English

#!/usr/bin/env bash

for i in $(seq 0 9); do
build/bin/sherpa-onnx-offline-tts \
  --supertonic-duration-predictor=./sherpa-onnx-supertonic-tts-int8-2026-03-06/duration_predictor.int8.onnx \
  --supertonic-text-encoder=./sherpa-onnx-supertonic-tts-int8-2026-03-06/text_encoder.int8.onnx \
  --supertonic-vector-estimator=./sherpa-onnx-supertonic-tts-int8-2026-03-06/vector_estimator.int8.onnx \
  --supertonic-vocoder=./sherpa-onnx-supertonic-tts-int8-2026-03-06/vocoder.int8.onnx \
  --supertonic-tts-json=./sherpa-onnx-supertonic-tts-int8-2026-03-06/tts.json \
  --supertonic-unicode-indexer=./sherpa-onnx-supertonic-tts-int8-2026-03-06/unicode_indexer.bin \
  --supertonic-voice-style=./sherpa-onnx-supertonic-tts-int8-2026-03-06/voice.bin \
  --sid=$i \
  --lang=en \
  --output-filename=./en-$i.wav \
  "Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be, a statesman, a businessman, an official, or a scholar."
done
/Users/fangjun/open-source/sherpa-onnx/sherpa-onnx/csrc/parse-options.cc:Read:373 build/bin/sherpa-onnx-offline-tts --supertonic-duration-predictor=./sherpa-onnx-supertonic-tts-int8-2026-03-06/duration_predictor.int8.onnx --supertonic-text-encoder=./sherpa-onnx-supertonic-tts-int8-2026-03-06/text_encoder.int8.onnx --supertonic-vector-estimator=./sherpa-onnx-supertonic-tts-int8-2026-03-06/vector_estimator.int8.onnx --supertonic-vocoder=./sherpa-onnx-supertonic-tts-int8-2026-03-06/vocoder.int8.onnx --supertonic-tts-json=./sherpa-onnx-supertonic-tts-int8-2026-03-06/tts.json --supertonic-unicode-indexer=./sherpa-onnx-supertonic-tts-int8-2026-03-06/unicode_indexer.bin --supertonic-voice-style=./sherpa-onnx-supertonic-tts-int8-2026-03-06/voice.bin --sid=0 --lang=en --output-filename=./en-0.wav 'Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be, a statesman, a businessman, an official, or a scholar.'

sample=706854, progress=1.000000
Number of threads: 1
Elapsed seconds: 1.139 s
Audio duration: 16.028 s
Real-time factor (RTF): 1.139/16.028 = 0.071
The text is: Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be, a statesman, a businessman, an official, or a scholar.. Speaker ID: 0
Saved to ./en-0.wav successfully!

You can see it is very fast.

The generated audio files are given below:

en-0.mov
en-1.mov
en-2.mov
en-3.mov
en-4.mov
en-5.mov
en-6.mov
en-7.mov
en-8.mov
en-9.mov

Summary by CodeRabbit

  • New Features

    • Publish models to an additional platform alongside HuggingFace and support audio (wav) artifacts.
  • Chores

    • CI now runs on macOS runners and supports multi-item publishing pipelines with standardized token handling.
    • Improved asset management: repository-based asset retrieval replaces prior download method; destination naming and artifact collection streamlined.

@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Mar 6, 2026
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request integrates new Supertonic Text-to-Speech models, significantly expanding the system's multilingual TTS capabilities. It also refines the model acquisition process by switching to a direct git clone from Hugging Face, streamlining the setup for users.

Highlights

  • New Supertonic TTS Models: Introduced new Supertonic Text-to-Speech models, supporting five languages: English, Korean, Spanish, Portuguese, and French.
  • Model Download Method Updated: Changed the method for acquiring ONNX models from modelscope to git clone directly from Hugging Face, simplifying the setup process.
Changelog
  • scripts/supertonic/run.sh
    • Updated the script to use git clone from Hugging Face for downloading Supertonic TTS ONNX models instead of modelscope.
    • Modified the directory check for existing ONNX models.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/export-supertonic.yaml
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Mar 6, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

Updates CI/CD: change trigger branch and runner to macOS, add brew-based tooling, implement multi-item publishing loops for HuggingFace and ModelScope (with LFS support), adjust artifact naming and retrieval, and refactor Stage 0 asset download from ModelScope to a git clone move.

Changes

Cohort / File(s) Summary
CI/CD Workflow Configuration
.github/workflows/export-supertonic.yaml
Changed trigger branch and runner to macos-latest; added Homebrew/git-xet setup; reordered dependency steps; retrieve LICENSE/README from upstream; renamed destination directory; replaced single-dir publish with a multi-item loop (host changed to csukuangfj2), added wav tracking, and added a Publish-to-ModelScope stage that clones per artifact, copies files, tracks tar.bz2 with LFS, and pushes.
Asset Download Script
scripts/supertonic/run.sh
Stage 0: replaced modelscope download with a git clone of the upstream Supertonic repo and move into assets/; changed existence check from ./assets/onnx to assets/onnx/; added ls -lh verification output.

Sequence Diagram(s)

sequenceDiagram
    participant CI as CI Runner (macOS)
    participant Up as Upstream Repo (git)
    participant HF as HuggingFace Repo
    participant MS as ModelScope Repo
    participant Art as Artifacts (tar.bz2, wav, LICENSE, README)

    CI->>Up: git clone supertonic repo (stage0 assets)
    Up-->>CI: repository contents (assets/)
    CI->>Art: prepare artifacts, rename destination
    CI->>HF: clone/push per-item (track wav, README, LICENSE)
    HF-->>CI: confirm upload
    CI->>MS: clone modelscope repo per artifact, copy files, git lfs track tar.bz2, push
    MS-->>CI: confirm push
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

size:M

Poem

🐰 I hopped through CI on a macOS day,

Cloned assets and bundled them away.
Pushed to HF, then to ModelScope too,
Tar.bz2 and wavs in a bright new queue.
Little rabbit cheers — the pipeline hops anew!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Upload supertonic tts models' accurately reflects the main objective of the PR, which is to upload Supertonic TTS models to HuggingFace and modelscope as evidenced by the workflow and script changes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
.github/workflows/export-supertonic.yaml (2)

145-163: Consider moving loop-invariant environment variables outside the loop.

GIT_LFS_SKIP_SMUDGE and GIT_CLONE_PROTECTION_ACTIVE are set on every iteration but their values don't change. Moving them before the loop improves clarity.

Proposed refactor
             git config --global user.email "csukuangfj@gmail.com"
             git config --global user.name "Fangjun Kuang"
+
+            export GIT_LFS_SKIP_SMUDGE=1
+            export GIT_CLONE_PROTECTION_ACTIVE=false
+
             for m in *.tar.bz2; do
-              export GIT_LFS_SKIP_SMUDGE=1
-              export GIT_CLONE_PROTECTION_ACTIVE=false
-
               rm -rf ms
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/export-supertonic.yaml around lines 145 - 163, Move the
loop-invariant environment exports out of the loop: set GIT_LFS_SKIP_SMUDGE and
GIT_CLONE_PROTECTION_ACTIVE once before the for m in *.tar.bz2; do loop instead
of exporting them inside the loop; update the script so the exports occur prior
to the git clone/copy/push sequence (refer to the variables GIT_LFS_SKIP_SMUDGE
and GIT_CLONE_PROTECTION_ACTIVE and the for m in *.tar.bz2; do loop) to improve
clarity and avoid redundant commands.

133-134: Remove redundant if: true condition.

The static analysis tool correctly identified that if: true is always true and serves no purpose. This should be removed.

Proposed fix
       - name: Publish to modelscope
-        if: true
         env:
           MS_TOKEN: ${{ secrets.MODEL_SCOPE_GIT_TOKEN }}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/export-supertonic.yaml around lines 133 - 134, Remove the
redundant always-true conditional from the GitHub Actions step named "Publish to
modelscope": delete the line "if: true" from that step block so the step relies
on normal job-level or step-level conditions (or no condition) instead of an
unnecessary tautology.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/supertonic/run.sh`:
- Around line 64-68: The current sequence clones into supertonic-2 and then runs
"mv supertonic-2 assets", which will create assets/supertonic-2 if an assets/
directory already exists; change the flow in run.sh so the clone targets the
intended assets/onnx path directly or explicitly place the cloned repo into
assets/onnx (e.g., clone into a temporary dir then move/rename to assets/onnx or
clone with the target directory argument), and ensure you create assets/ or
remove/rename any partial existing directories before moving to guarantee the
final structure contains assets/onnx as expected.

---

Nitpick comments:
In @.github/workflows/export-supertonic.yaml:
- Around line 145-163: Move the loop-invariant environment exports out of the
loop: set GIT_LFS_SKIP_SMUDGE and GIT_CLONE_PROTECTION_ACTIVE once before the
for m in *.tar.bz2; do loop instead of exporting them inside the loop; update
the script so the exports occur prior to the git clone/copy/push sequence (refer
to the variables GIT_LFS_SKIP_SMUDGE and GIT_CLONE_PROTECTION_ACTIVE and the for
m in *.tar.bz2; do loop) to improve clarity and avoid redundant commands.
- Around line 133-134: Remove the redundant always-true conditional from the
GitHub Actions step named "Publish to modelscope": delete the line "if: true"
from that step block so the step relies on normal job-level or step-level
conditions (or no condition) instead of an unnecessary tautology.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 9f60d8ca-5019-43f6-8c26-7fc36edf0a81

📥 Commits

Reviewing files that changed from the base of the PR and between 48cd9c3 and 9566bf6.

📒 Files selected for processing (2)
  • .github/workflows/export-supertonic.yaml
  • scripts/supertonic/run.sh

Comment thread scripts/supertonic/run.sh
Comment on lines +64 to +68
if [ ! -d assets/onnx/ ]; then
git clone https://huggingface.co/Supertone/supertonic-2
mv supertonic-2 assets

ls -lh assets/onnx/
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Potential issue: mv behavior depends on pre-existing assets/ directory.

If assets/ exists but assets/onnx/ doesn't (e.g., from a partial/failed run), mv supertonic-2 assets will move the cloned directory into assets/ as assets/supertonic-2/, rather than renaming it to assets. This results in an incorrect structure.

Proposed fix to ensure consistent behavior
     if [ ! -d assets/onnx/ ]; then
+        rm -rf assets supertonic-2
         git clone https://huggingface.co/Supertone/supertonic-2
         mv supertonic-2 assets

Alternatively, clone directly to the target:

     if [ ! -d assets/onnx/ ]; then
-        git clone https://huggingface.co/Supertone/supertonic-2
-        mv supertonic-2 assets
+        rm -rf assets
+        git clone https://huggingface.co/Supertone/supertonic-2 assets
 
         ls -lh assets/onnx/
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if [ ! -d assets/onnx/ ]; then
git clone https://huggingface.co/Supertone/supertonic-2
mv supertonic-2 assets
ls -lh assets/onnx/
if [ ! -d assets/onnx/ ]; then
rm -rf assets supertonic-2
git clone https://huggingface.co/Supertone/supertonic-2
mv supertonic-2 assets
ls -lh assets/onnx/
Suggested change
if [ ! -d assets/onnx/ ]; then
git clone https://huggingface.co/Supertone/supertonic-2
mv supertonic-2 assets
ls -lh assets/onnx/
if [ ! -d assets/onnx/ ]; then
rm -rf assets
git clone https://huggingface.co/Supertone/supertonic-2 assets
ls -lh assets/onnx/
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/supertonic/run.sh` around lines 64 - 68, The current sequence clones
into supertonic-2 and then runs "mv supertonic-2 assets", which will create
assets/supertonic-2 if an assets/ directory already exists; change the flow in
run.sh so the clone targets the intended assets/onnx path directly or explicitly
place the cloned repo into assets/onnx (e.g., clone into a temporary dir then
move/rename to assets/onnx or clone with the target directory argument), and
ensure you create assets/ or remove/rename any partial existing directories
before moving to guarantee the final structure contains assets/onnx as expected.

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the model download mechanism in run.sh to use git clone from Hugging Face. However, the current implementation has several critical issues that will cause the script to fail, including a less robust check for model existence, missing directory creation, and incorrect file paths. I've provided a suggestion to fix these issues.

Comment thread scripts/supertonic/run.sh
Comment on lines +64 to 70
if [ ! -d assets/onnx/ ]; then
git clone https://huggingface.co/Supertone/supertonic-2
mv supertonic-2 assets

ls -lh assets/onnx/

echo "Download completed"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The new logic for downloading models has a few critical issues:

  1. Less robust check: The condition if [ ! -d assets/onnx/ ] does not handle the case where the directory exists but is empty, which would cause the download to be skipped incorrectly. The original check was more robust.
  2. Missing directory creation: The assets directory is not created before attempting to move files into it. If the directory doesn't exist, the mv command will fail.
  3. Incorrect destination path: The command mv supertonic-2 assets results in assets/supertonic-2, but the script expects models to be in assets/onnx. This will cause subsequent commands to fail.

To ensure the script runs correctly, I recommend revising this block to address these points.

Suggested change
if [ ! -d assets/onnx/ ]; then
git clone https://huggingface.co/Supertone/supertonic-2
mv supertonic-2 assets
ls -lh assets/onnx/
echo "Download completed"
if [ ! -d assets/onnx/ ] || [ -z "$(ls -A assets/onnx/ 2>/dev/null)" ]; then
echo "ONNX models not found, downloading..."
git clone https://huggingface.co/Supertone/supertonic-2
mkdir -p assets
mv supertonic-2 assets/onnx
ls -lh assets/onnx/
echo "Download completed"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant