Add C++ runtime for kitten-tts by csukuangfj · Pull Request #2460 · k2-fsa/sherpa-onnx

csukuangfj · 2025-08-07T12:33:32Z

RTF comparison among kitten-tts, piper tts, matcha tts, and kokoro tts on my MacBook Pro

(num_threads == 1 is used)

Model	weight type	RTF	model file size
kitten-nano-en-v0_1-fp16.tar.bz2	float16	0.389	23 MB
vits-piper-en_US-libritts_r-medium.tar.bz2	float32	0.114	75 MB
vits-piper-en_US-libritts_r-medium-int8.tar.bz2	int8	0.320	22 MB
vits-piper-en_US-libritts_r-medium-fp16.tar.bz2	float16	0.123	38 MB
kokoro-en-v0_19.tar.bz2	float32	1.128	330 MB
kokoro-int8-en-v0_19.tar.bz2	int8	1.972	128 MB
matcha-icefall-en_US-ljspeech.tar.bz2	float32	0.118	acoustic model (71 MB), vocoder (51 MB)

info about my mac

Usage

1. Download the model

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kitten-nano-en-v0_1-fp16.tar.bz2
tar xvf kitten-nano-en-v0_1-fp16.tar.bz2
rm kitten-nano-en-v0_1-fp16.tar.bz2

2. Build sherpa-onnx

3. Run it

for sid in 0 1 2 3 4 5 6 7; do
  build/bin/sherpa-onnx-offline-tts \
    --kitten-model=./kitten-nano-en-v0_1-fp16/model.fp16.onnx \
    --kitten-voices=./kitten-nano-en-v0_1-fp16/voices.bin \
    --kitten-tokens=./kitten-nano-en-v0_1-fp16/tokens.txt \
    --kitten-data-dir=./kitten-nano-en-v0_1-fp16/espeak-ng-data \
    --debug=1 \
    --sid=$sid \
    --output-filename="./kitten-$sid.wav" \
    "Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be, a statesman, a businessman, an official, or a scholar."
done

The mapping between speaker IDs (sid) and speaker names is

sid	0	1	2	3
speaker name	expr-voice-2-m	expr-voice-2-f	expr-voice-3-m	expr-voice-3-f

sid	4	5	6	7
speaker name	expr-voice-4-m	expr-voice-4-f	expr-voice-5-m	expr-voice-5-f

Generated audios are given below

sid 0 (expr-voice-2-m)

kitten-0.mov

sid 1 (expr-voice-2-f)

kitten-1.mov

sid 2 (expr-voice-3-m)

kitten-2.mov

sid 3 (expr-voice-3-f)

kitten-3.mov

sid 4 (expr-voice-4-m)

kitten-4.mov

sid 5 (expr-voice-4-f)

kitten-5.mov

sid 6 (expr-voice-5-m)

kitten-6.mov

sid 7 (expr-voice-5-f)

kitten-7.mov

Summary by CodeRabbit

New Features
- Added support for a new "Kitten" offline text-to-speech (TTS) model, including configuration, metadata, phonemization, and synthesis pipeline.
- Enabled selection and configuration of the Kitten model in both C++ and Python APIs with comprehensive validation and error handling.
- Provided Python bindings for the Kitten TTS model configuration.
- Extended phoneme-to-token conversion and lexicon handling to support the Kitten model.
Bug Fixes
- Corrected error messages and documentation comments for model metadata and output shapes.
Documentation
- Updated internal documentation to reflect new model support and clarify usage of model metadata scripts.

coderabbitai · 2025-08-07T12:33:41Z

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This change introduces support for the Kitten TTS model in the codebase. It adds new configuration, metadata, implementation, and Python binding files for the Kitten model, updates build scripts, and integrates Kitten model handling into the TTS creation, frontend, and phonemization logic. Existing structures and methods are extended to recognize and process the Kitten model alongside other supported TTS models.

Changes

Cohort / File(s)	Change Summary
Kitten Model Core Implementation `sherpa-onnx/csrc/offline-tts-kitten-model.h`, `sherpa-onnx/csrc/offline-tts-kitten-model.cc`, `sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h`, `sherpa-onnx/csrc/offline-tts-kitten-impl.h`	Introduced Kitten TTS model implementation, metadata, and main interface for offline TTS synthesis, including text normalization, tokenization, model inference, and audio generation.
Kitten Model Configuration `sherpa-onnx/csrc/offline-tts-kitten-model-config.h`, `sherpa-onnx/csrc/offline-tts-kitten-model-config.cc`	Added Kitten model configuration struct and implementation, including validation, registration, and string representation methods.
Kitten Model Python Bindings `sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h`, `sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc`	Added Python bindings for the Kitten model configuration, exposing its members and methods to Python via pybind11.
TTS Model Config Integration `sherpa-onnx/csrc/offline-tts-model-config.h`, `sherpa-onnx/csrc/offline-tts-model-config.cc`, `sherpa-onnx/python/csrc/offline-tts-model-config.cc`	Extended core TTS model configuration structures and Python bindings to include the Kitten model as an option, updating constructors, validation, and string output.
TTS Model Factory and Control Flow `sherpa-onnx/csrc/offline-tts-impl.cc`	Updated the TTS implementation factory to recognize and instantiate the Kitten model, handle configuration, and adjust error handling for missing models.
Build Integration `sherpa-onnx/csrc/CMakeLists.txt`, `sherpa-onnx/python/csrc/CMakeLists.txt`	Added Kitten model source files to the build process for both C++ and Python components.
Phonemizer and Frontend Support `sherpa-onnx/csrc/piper-phonemize-lexicon.h`, `sherpa-onnx/csrc/piper-phonemize-lexicon.cc`, `sherpa-onnx/csrc/offline-tts-frontend.h`, `sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc`	Extended phonemizer and frontend logic to support Kitten model metadata and tokenization, adding constructors, flags, and renaming functions for shared logic between Kitten and Kokoro models.
Documentation and Minor Updates `sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h`, `sherpa-onnx/csrc/offline-tts-kokoro-model.h`, `sherpa-onnx/csrc/offline-tts-kokoro-model.cc`, `sherpa-onnx/csrc/sherpa-onnx-offline-tts.cc`	Updated comments, error messages, and added minor logging for clarity and accuracy.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant OfflineTtsImpl
    participant OfflineTtsKittenImpl
    participant OfflineTtsKittenModel
    participant PiperPhonemizeLexicon

    User->>OfflineTtsImpl: Create(config)
    OfflineTtsImpl->>OfflineTtsKittenImpl: (if config.kitten.model set)
    OfflineTtsKittenImpl->>OfflineTtsKittenModel: Initialize with config
    OfflineTtsKittenImpl->>PiperPhonemizeLexicon: Initialize with Kitten metadata
    User->>OfflineTtsKittenImpl: Generate(text, sid, speed)
    OfflineTtsKittenImpl->>PiperPhonemizeLexicon: Tokenize text
    OfflineTtsKittenImpl->>OfflineTtsKittenModel: Run(token_ids, sid, speed)
    OfflineTtsKittenModel-->>OfflineTtsKittenImpl: Audio samples
    OfflineTtsKittenImpl-->>User: GeneratedAudio

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Assessment against linked issues

Objective	Addressed	Explanation
Support for Kitten TTS model (#2450)	✅
Integration of Kitten model into TTS config, factory, and frontend (#2450)	✅
Python binding for Kitten TTS config (#2450)	✅

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Poem

A Kitten now purrs in the TTS den,
With code and configs, it speaks once again.
From phonemes to samples, the pipeline runs neat,
Now English and Kokoro quality meet!
🐾 The build scripts meow, the bindings do too—
This rabbit applauds what the devs did pursue!

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai generate unit tests to generate unit tests for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

Copilot

Pull Request Overview

This PR adds C++ runtime support for the Kitten TTS model, a new text-to-speech model that provides efficient voice synthesis with multiple speaker support. The implementation follows the existing pattern used for other TTS models like Kokoro and Vits.

Implements complete Kitten TTS model support including configuration, model loading, and inference
Adds Python bindings for the new Kitten TTS model configuration
Integrates Kitten TTS into the existing TTS pipeline with shared phonemization logic

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
sherpa-onnx/csrc/offline-tts-kitten-model.h	Header defining the Kitten TTS model class interface
sherpa-onnx/csrc/offline-tts-kitten-model.cc	Core implementation of Kitten TTS model inference
sherpa-onnx/csrc/offline-tts-kitten-model-config.h	Configuration structure for Kitten TTS models
sherpa-onnx/csrc/offline-tts-kitten-model-config.cc	Implementation of Kitten TTS configuration with validation
sherpa-onnx/csrc/offline-tts-kitten-impl.h	High-level Kitten TTS implementation with text processing
sherpa-onnx/csrc/offline-tts-model-config.h	Updated main TTS config to include Kitten support
sherpa-onnx/csrc/offline-tts-model-config.cc	Updated validation logic for multiple TTS models
sherpa-onnx/csrc/piper-phonemize-lexicon.h	Extended phonemization to support Kitten models
sherpa-onnx/csrc/piper-phonemize-lexicon.cc	Shared phonemization logic between Kokoro and Kitten
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc	Python bindings for Kitten TTS configuration

Copilot · 2025-08-07T12:35:11Z

+  OfflineTtsKittenModelMetaData meta_data_;
+  std::vector<int32_t> style_dim_;
+
+  // (num_speakers, style_dim_[0], style_dim_[2])


The comment refers to 'style_dim_[2]' but the code uses 'style_dim_[1]' throughout. This inconsistency could mislead developers about the data structure.

Suggested change

// (num_speakers, style_dim_[0], style_dim_[2])

// (num_speakers, style_dim_[0], style_dim_[1])

Copilot · 2025-08-07T12:35:11Z

@@ -277,7 +277,6 @@ static std::vector<int64_t> CoquiPhonemesToIds(
 void InitEspeak(const std::string &data_dir) {
  static std::once_flag init_flag;
  std::call_once(init_flag, [data_dir]() {


[nitpick] The empty line removal at line 280 changes formatting without adding value and may indicate an unintentional modification during development.

Suggested change

std::call_once(init_flag, [data_dir]() {

std::call_once(init_flag, [data_dir]() {

coderabbitai

Actionable comments posted: 6

🔭 Outside diff range comments (2)

sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)
1-32: Missing pybind11 include for py::module.

The file uses py::module and py::class_ but doesn't include the necessary pybind11 headers. This could lead to compilation errors.

Add the missing include at the top of the file:
 #include "sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h"
 
 #include <string>
+
+#include "pybind11/pybind11.h"
 
 #include "sherpa-onnx/csrc/offline-tts-kitten-model-config.h"
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (1)
526-555: Add template instantiations for Kitten model constructors.

The template instantiations are missing for the new Kitten model constructors on Android and OHOS platforms. This could lead to linking errors when using Kitten models on these platforms.

Add the missing template instantiations:
 template PiperPhonemizeLexicon::PiperPhonemizeLexicon(
     AAssetManager *mgr, const std::string &tokens, const std::string &data_dir,
     const OfflineTtsKokoroModelMetaData &kokoro_meta_data);
+
+template PiperPhonemizeLexicon::PiperPhonemizeLexicon(
+    AAssetManager *mgr, const std::string &tokens, const std::string &data_dir,
+    const OfflineTtsKittenModelMetaData &kitten_meta_data);
 #endif
 
 #if __OHOS__
And similarly for OHOS:
 template PiperPhonemizeLexicon::PiperPhonemizeLexicon(
     NativeResourceManager *mgr, const std::string &tokens,
     const std::string &data_dir,
     const OfflineTtsKokoroModelMetaData &kokoro_meta_data);
+
+template PiperPhonemizeLexicon::PiperPhonemizeLexicon(
+    NativeResourceManager *mgr, const std::string &tokens,
+    const std::string &data_dir,
+    const OfflineTtsKittenModelMetaData &kitten_meta_data);
 #endif

🧹 Nitpick comments (4)

sherpa-onnx/csrc/offline-tts-kitten-impl.h (1)
360-362: Use consistent types in tensor shape array

The array is declared as int64_t but initialized with a casted int32_t. Use int64_t directly for consistency.
-    std::array<int64_t, 2> x_shape = {1, static_cast<int32_t>(x.size())};
+    std::array<int64_t, 2> x_shape = {1, static_cast<int64_t>(x.size())};
sherpa-onnx/csrc/offline-tts-kitten-model.cc (3)
72-72: Remove misleading const comment

The comment /*const*/ is confusing. Since the pointer p is not const and the data it points to may be modified, remove the comment to avoid confusion.
-    /*const*/ float *p = styles_.data() + sid * dim1;
+    float *p = styles_.data() + sid * dim1;
94-94: Remove unnecessary std::move on return value

The std::move on the return statement is unnecessary due to NRVO/RVO optimizations.
-    return std::move(out[0]);
+    return out[0];
226-226: Fix incorrect array index in comment

The comment mentions style_dim_[2] but the array only has 2 elements (indices 0 and 1).
-  // (num_speakers, style_dim_[0], style_dim_[2])
+  // (num_speakers, style_dim_[0], style_dim_[1])

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08aaa89 and 57ddf3d.

📒 Files selected for processing (22)

sherpa-onnx/csrc/CMakeLists.txt (1 hunks)
sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc (1 hunks)
sherpa-onnx/csrc/offline-tts-frontend.h (1 hunks)
sherpa-onnx/csrc/offline-tts-impl.cc (3 hunks)
sherpa-onnx/csrc/offline-tts-kitten-impl.h (1 hunks)
sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (1 hunks)
sherpa-onnx/csrc/offline-tts-kitten-model-config.h (1 hunks)
sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h (1 hunks)
sherpa-onnx/csrc/offline-tts-kitten-model.cc (1 hunks)
sherpa-onnx/csrc/offline-tts-kitten-model.h (1 hunks)
sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h (1 hunks)
sherpa-onnx/csrc/offline-tts-kokoro-model.cc (1 hunks)
sherpa-onnx/csrc/offline-tts-kokoro-model.h (1 hunks)
sherpa-onnx/csrc/offline-tts-model-config.cc (3 hunks)
sherpa-onnx/csrc/offline-tts-model-config.h (3 hunks)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (5 hunks)
sherpa-onnx/csrc/piper-phonemize-lexicon.h (4 hunks)
sherpa-onnx/csrc/sherpa-onnx-offline-tts.cc (1 hunks)
sherpa-onnx/python/csrc/CMakeLists.txt (1 hunks)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1 hunks)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h (1 hunks)
sherpa-onnx/python/csrc/offline-tts-model-config.cc (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: the sherpa-onnx jni library files are stored in hugging face repository at https://huggingface.co/cs...

Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.

Applied to files:

sherpa-onnx/csrc/CMakeLists.txt
sherpa-onnx/python/csrc/CMakeLists.txt
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h
sherpa-onnx/csrc/offline-tts-impl.cc
sherpa-onnx/python/csrc/offline-tts-model-config.cc
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc
sherpa-onnx/csrc/offline-tts-model-config.h
sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h
sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h
sherpa-onnx/csrc/offline-tts-kitten-model.h
sherpa-onnx/csrc/offline-tts-kitten-model.cc
sherpa-onnx/csrc/piper-phonemize-lexicon.h

📚 Learning: in sherpa-onnx java api, the native library names in core.java (win_native_library_name = "sherpa-on...

Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.

Applied to files:

sherpa-onnx/csrc/CMakeLists.txt
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h
sherpa-onnx/csrc/offline-tts-impl.cc
sherpa-onnx/python/csrc/offline-tts-model-config.cc
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc

🧬 Code Graph Analysis (7)

sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h (1)

sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (2)

PybindOfflineTtsKittenModelConfig (13-29)

PybindOfflineTtsKittenModelConfig (13-13)

sherpa-onnx/csrc/offline-tts-frontend.h (1)

sherpa-onnx/csrc/piper-phonemize-lexicon.cc (2)

ConvertTextToTokenIdsKokoroOrKitten (463-489)

ConvertTextToTokenIdsKokoroOrKitten (463-466)

sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc (1)

sherpa-onnx/csrc/piper-phonemize-lexicon.cc (2)

ConvertTextToTokenIdsKokoroOrKitten (463-489)

ConvertTextToTokenIdsKokoroOrKitten (463-466)

sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (1)

sherpa-onnx/csrc/offline-tts-model-config.cc (6)

Register (11-25)

Register (11-11)

Validate (27-52)

Validate (27-27)

ToString (54-67)

ToString (54-54)

sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)

sherpa-onnx/python/csrc/offline-tts-vits-model-config.cc (1)

sherpa_onnx (11-37)

sherpa-onnx/csrc/piper-phonemize-lexicon.cc (3)

sherpa-onnx/csrc/piper-phonemize-lexicon.h (1)

PiperPhonemizeLexicon (20-74)

sherpa-onnx/csrc/offline-tts-character-frontend.cc (5)

ReadTokens (29-80)

ReadTokens (29-29)

is (85-85)

ConvertTextToTokenIds (99-192)

ConvertTextToTokenIds (99-100)

sherpa-onnx/csrc/file-utils.cc (6)

ReadFile (27-31)

ReadFile (27-27)

ReadFile (34-49)

ReadFile (34-34)

ReadFile (53-81)

ReadFile (53-54)

sherpa-onnx/csrc/piper-phonemize-lexicon.h (1)

sherpa-onnx/csrc/piper-phonemize-lexicon.cc (14)

PiperPhonemizeLexicon (307-317)

PiperPhonemizeLexicon (320-334)

PiperPhonemizeLexicon (336-346)

PiperPhonemizeLexicon (348-358)

PiperPhonemizeLexicon (360-370)

PiperPhonemizeLexicon (373-387)

PiperPhonemizeLexicon (390-404)

PiperPhonemizeLexicon (407-421)

PiperPhonemizeLexicon (527-529)

PiperPhonemizeLexicon (531-533)

PiperPhonemizeLexicon (535-537)

PiperPhonemizeLexicon (541-544)

PiperPhonemizeLexicon (546-549)

PiperPhonemizeLexicon (551-554)

🔇 Additional comments (36)

sherpa-onnx/csrc/sherpa-onnx-offline-tts.cc (1)

104-104: LGTM! Good addition for performance monitoring.

The thread count information complements the existing RTF and timing metrics, which is valuable for performance analysis and debugging.

sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h (1)

14-16: LGTM! Improved documentation accuracy.

The updated references to version-specific metadata scripts provide clearer guidance for developers working with different Kokoro model versions.

sherpa-onnx/python/csrc/CMakeLists.txt (1)

70-70: LGTM! Correct addition for Kitten model Python bindings.

The new source file is properly added to the TTS-enabled build configuration, ensuring Python access to Kitten model configuration.

sherpa-onnx/csrc/CMakeLists.txt (1)

195-196: LGTM! Proper integration of Kitten model source files.

Both the configuration and implementation files are correctly added to the TTS-enabled build, following the established pattern for other TTS models.

sherpa-onnx/csrc/offline-tts-kokoro-model.cc (1)

173-173: LGTM! Corrected misleading error message.

The error message now accurately reflects the dimension being validated (style_dim[1] instead of style_dim[0]), improving debugging clarity.

sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc (1)

263-264: LGTM! Clean function rename for dual model support.

The function call has been properly updated to use ConvertTextToTokenIdsKokoroOrKitten instead of the Kokoro-specific function. This change extends support to both Kokoro and Kitten models while maintaining the same interface and parameters.

sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h (1)

1-17: LGTM! Well-structured Python binding header.

The header file follows all best practices with proper include guards, copyright notice, minimal includes, and a clean function declaration. The structure is consistent with other Python binding headers in the codebase.

sherpa-onnx/csrc/offline-tts-kokoro-model.h (1)

26-27: LGTM! Documentation update improves clarity.

The comment has been corrected to accurately reflect that the Run method returns audio samples rather than mel spectrogram data. This documentation improvement aligns with the expected TTS output interface and provides clearer guidance for users of this method.

sherpa-onnx/csrc/offline-tts-frontend.h (1)

62-62: LGTM! Function declaration updated for dual model support.

The function declaration has been properly renamed to ConvertTextToTokenIdsKokoroOrKitten to reflect support for both Kokoro and Kitten models. The signature remains unchanged, maintaining backward compatibility while clearly indicating the extended functionality.

sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h (1)

15-24: LGTM! Well-designed metadata struct for Kitten model.

The OfflineTtsKittenModelMetaData struct is well-structured with appropriate fields for TTS model configuration. The default values are sensible:

has_espeak = 1 enables espeak support by default

version = 1 provides a reasonable initial version

max_token_len = 256 sets a practical token sequence limit

Other fields defaulted to 0 will be populated from the model

The reference to the external script provides helpful context for understanding metadata generation.

sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)

13-29: LGTM! Consistent Python binding implementation.

The binding implementation follows the established pattern from other TTS model configs, correctly exposing constructors, member variables, and methods. The parameter naming and structure are appropriate for the Kitten model configuration.

sherpa-onnx/csrc/offline-tts-model-config.cc (3)

15-15: LGTM! Proper integration of Kitten model registration.

The Kitten model is correctly integrated into the command-line option registration alongside other TTS models.

41-51: LGTM! Consistent validation logic for Kitten model.

The validation follows the same pattern as other TTS models, checking for non-empty model path and delegating to the specific model's validation method. The updated error message is appropriate for the general case.

61-61: LGTM! Proper inclusion in string representation.

The Kitten model configuration is correctly included in the ToString output, maintaining consistency with other model configurations.

sherpa-onnx/csrc/offline-tts-model-config.h (3)

10-10: LGTM! Proper header inclusion.

The include for the Kitten model config header is correctly placed in alphabetical order with other TTS model headers.

22-22: LGTM! Consistent member variable addition.

The Kitten model configuration member is properly added alongside other TTS model configurations.

33-42: LGTM! Proper constructor integration.

The constructor parameter and member initialization for the Kitten model configuration follow the established pattern and maintain consistency with other TTS models.

sherpa-onnx/csrc/offline-tts-impl.cc (3)

19-19: LGTM! Proper header inclusion.

The include for the Kitten implementation is correctly added to support the new model type.

44-52: LGTM! Improved factory method with proper error handling.

The factory method now correctly handles all TTS model types including Kitten, and properly returns null instead of defaulting to an incorrect model when no model is specified. This is a significant improvement in error handling.

62-69: LGTM! Consistent template method implementation.

The template factory method maintains the same improved logic as the regular factory method, ensuring consistent behavior across both overloads.

sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (2)

15-25: LGTM! Well-structured command-line option registration.

The command-line options are properly named with consistent "kitten-" prefix and have clear, descriptive help text. The registration follows the established pattern.

84-95: LGTM! Standard ToString implementation.

The string representation includes all configuration fields and follows the established pattern used by other TTS model configurations.

sherpa-onnx/csrc/offline-tts-kitten-model-config.h (1)

1-44: LGTM! Well-structured configuration header.

The header follows established patterns from other model configurations in the codebase, with appropriate member variables, constructors, and method declarations. The structure is consistent and the default length_scale = 1.0 provides reasonable speed behavior.

sherpa-onnx/python/csrc/offline-tts-model-config.cc (3)

10-10: LGTM! Proper include for Kitten model bindings.

The include directive follows the established pattern for other model configuration headers.

21-21: LGTM! Consistent binding registration.

The call to PybindOfflineTtsKittenModelConfig(m) follows the same pattern as other model binding registrations.

29-30: LGTM! Complete Python API integration.

The constructor parameter addition and property exposure for the kitten model are consistent with the established patterns for other TTS models (vits, matcha, kokoro). The default value initialization ensures backward compatibility.

Also applies to: 35-35, 41-41

sherpa-onnx/csrc/offline-tts-kitten-model.h (1)

17-36: LGTM! Well-designed model class interface.

The OfflineTtsKittenModel class declaration follows established patterns:

Uses Pimpl idiom for implementation hiding

Provides both direct and template constructors for resource management flexibility

Run method signature aligns with TTS model expectations (tensor input, speaker ID, speed control)

Const reference return from GetMetaData() provides safe metadata access

The interface design is consistent with other TTS model classes in the codebase.

sherpa-onnx/csrc/piper-phonemize-lexicon.h (3)

13-13: LGTM! Appropriate include for Kitten metadata.

The include directive follows the established pattern for other model metadata headers.

31-32: LGTM! Consistent constructor declarations.

The constructor declarations for both regular and template versions follow the established patterns for other model types (Vits, Matcha, Kokoro). The template version enables resource manager flexibility for Android/OHOS platforms.

Also applies to: 49-52

70-70: LGTM! Proper member additions.

The kitten_meta_data_ member and is_kitten_ flag follow the established naming conventions and patterns used by other model types in the class.

Also applies to: 73-73

sherpa-onnx/csrc/piper-phonemize-lexicon.cc (5)

183-183: LGTM! Appropriate function renaming for model unification.

Renaming PiperPhonemesToIdsKokoro to PiperPhonemesToIdsKokoroOrKitten makes sense since the same phoneme processing logic can be shared between Kokoro and Kitten models.

360-370: LGTM! Consistent constructor implementation.

The Kitten model constructor follows the established pattern from other model constructors:

Properly initializes kitten_meta_data_ and sets is_kitten_ flag

Uses the same ReadTokens function for token loading

Calls InitEspeak for phoneme processing setup

406-421: LGTM! Consistent template constructor implementation.

The template constructor for resource manager support follows the same pattern as other model types:

Uses ReadFile with manager to load tokens from assets/resources

Creates std::istrstream for token parsing

Maintains the same initialization flow as the file-based constructor

428-432: LGTM! Proper branching logic for Kitten model.

The addition of the is_kitten_ branch follows the established pattern and correctly uses kitten_meta_data_.max_token_len parameter with the unified conversion function.

463-463: LGTM! Consistent function renaming and usage.

The function rename from ConvertTextToTokenIdsKokoro to ConvertTextToTokenIdsKokoroOrKitten properly reflects the unified functionality, and the call site is updated appropriately to use the renamed function.

Also applies to: 480-481
sherpa-onnx/csrc/offline-tts-kitten-model.cc (1)

204-206: No alignment issue with reinterpret_cast here

The voices_data pointer always comes from std::vector<char>::data(), which uses the default allocator (calling the global operator new), and that guarantees alignment to at least alignof(std::max_align_t). Since alignof(float) ≤ alignof(std::max_align_t), the cast
reinterpret_cast<const float *>(voices_data)
is safe on all standard-conforming C++ implementations. No runtime alignment check or change to memcpy is required here.

coderabbitai · 2025-08-07T12:42:22Z

+      SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",
+                       static_cast<int32_t>(x_shape[0]));
+      exit(-1);
+    }


⚠️ Potential issue

Replace exit(-1) with exception for better error handling

Using exit(-1) terminates the entire program abruptly. Consider throwing an exception to allow callers to handle the error gracefully.

if (x_shape[0] != 1) { - SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d", - static_cast<int32_t>(x_shape[0])); - exit(-1); + std::string error_msg = "Support only batch_size == 1. Given: " + + std::to_string(x_shape[0]); + SHERPA_ONNX_LOGE("%s", error_msg.c_str()); + throw std::runtime_error(error_msg); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",

static_cast<int32_t>(x_shape[0]));

exit(-1);

}

if (x_shape[0] != 1) {

std::string error_msg = "Support only batch_size == 1. Given: " +

std::to_string(x_shape[0]);

SHERPA_ONNX_LOGE("%s", error_msg.c_str());

throw std::runtime_error(error_msg);

}

🤖 Prompt for AI Agents

In sherpa-onnx/csrc/offline-tts-kitten-model.cc around lines 64 to 67, replace the call to exit(-1) with throwing an appropriate exception to signal the error. This change will allow the caller to catch and handle the error instead of terminating the program abruptly. Use a standard exception type or define a custom one if needed, and include a descriptive error message indicating that only batch_size == 1 is supported.

coderabbitai · 2025-08-07T12:42:23Z

+    if (model_type != "kitten-tts") {
+      SHERPA_ONNX_LOGE(
+          "Please download the kitten tts model from us containing meta data");
+      SHERPA_ONNX_EXIT(-1);
+    }


🛠️ Refactor suggestion

Consider using exceptions instead of SHERPA_ONNX_EXIT(-1)

Multiple validation failures use SHERPA_ONNX_EXIT(-1) which terminates the program. Consider throwing exceptions for better error handling and recovery options.

Example refactor for line 137-141:

if (model_type != "kitten-tts") { - SHERPA_ONNX_LOGE( - "Please download the kitten tts model from us containing meta data"); - SHERPA_ONNX_EXIT(-1); + std::string error_msg = + "Please download the kitten tts model from us containing meta data"; + SHERPA_ONNX_LOGE("%s", error_msg.c_str()); + throw std::runtime_error(error_msg); }

Also applies to: 150-152, 173-181, 201-201

🤖 Prompt for AI Agents

In sherpa-onnx/csrc/offline-tts-kitten-model.cc lines 137 to 141, replace the use of SHERPA_ONNX_EXIT(-1) with throwing a suitable exception to allow better error handling and recovery. Change the code to throw an exception after logging the error instead of terminating the program immediately. Apply the same refactor pattern to the similar cases at lines 150-152, 173-181, and 201.

coderabbitai

Actionable comments posted: 0

♻️ Duplicate comments (5)

sherpa-onnx/csrc/offline-tts-kitten-model.cc (3)
110-110: Debug output incorrectly identifies this as 'kitten model' when it should match the class name

This could be confusing when debugging multiple model types.
-      os << "---kitten model---\n";
+      os << "---OfflineTtsKittenModel---\n";
63-67: Replace SHERPA_ONNX_EXIT(-1) with exception for better error handling

Using SHERPA_ONNX_EXIT(-1) terminates the entire program abruptly. Consider throwing an exception to allow callers to handle the error gracefully.
     if (x_shape[0] != 1) {
-      SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",
-                       static_cast<int32_t>(x_shape[0]));
-      SHERPA_ONNX_EXIT(-1);
+      std::string error_msg = "Support only batch_size == 1. Given: " + 
+                             std::to_string(x_shape[0]);
+      SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+      throw std::runtime_error(error_msg);
     }
137-141: Consider using exceptions instead of SHERPA_ONNX_EXIT(-1)

Multiple validation failures use SHERPA_ONNX_EXIT(-1) which terminates the program. Consider throwing exceptions for better error handling and recovery options.

Example refactor for lines 137-141:
     if (model_type != "kitten-tts") {
-      SHERPA_ONNX_LOGE(
-          "Please download the kitten tts model from us containing meta data");
-      SHERPA_ONNX_EXIT(-1);
+      std::string error_msg = 
+          "Please download the kitten tts model from us containing meta data";
+      SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+      throw std::runtime_error(error_msg);
     }
Apply similar changes to lines 150-152, 173-181, and 201.

Also applies to: 150-152, 173-181, 201-201
sherpa-onnx/csrc/offline-tts-kitten-impl.h (2)
11-11: Replace deprecated <strstream> with <sstream>

The header <strstream> has been deprecated since C++98. Use <sstream> instead for better portability and future compatibility.
-#include <strstream>
+#include <sstream>
105-105: Replace deprecated std::istrstream with std::istringstream

The templated constructor uses deprecated std::istrstream. Replace with std::istringstream for modern C++ compliance.
         auto buf = ReadFile(mgr, f);
-        std::istrstream is(buf.data(), buf.size());
+        std::istringstream is(std::string(buf.data(), buf.size()));
         tn_list_.push_back(std::make_unique<kaldifst::TextNormalizer>(is));
And similarly on line 127:
         std::unique_ptr<std::istream> s(
-            new std::istrstream(buf.data(), buf.size()));
+            new std::istringstream(std::string(buf.data(), buf.size())));
Also applies to: 127-127

🧹 Nitpick comments (3)

sherpa-onnx/csrc/offline-tts-kitten-model.cc (2)
72-72: Make pointer const as indicated by the comment

The comment /*const*/ suggests this pointer should be const-qualified.
-    /*const*/ float *p = styles_.data() + sid * dim1;
+    const float *p = styles_.data() + sid * dim1;
184-185: Simplify calculation since style_dim_[0] is always 1

Since style_dim_[0] is validated to be 1 (lines 178-181), the multiplication is unnecessary.
     int32_t expected_num_floats =
-        style_dim_[0] * style_dim_[1] * meta_data_.num_speakers;
+        style_dim_[1] * meta_data_.num_speakers;
sherpa-onnx/csrc/offline-tts-kitten-impl.h (1)

151-327: Consider refactoring the long Generate method for better maintainability

The Generate method spans 176 lines with multiple responsibilities: validation, text normalization, tokenization, batching, and audio processing. Consider extracting helper methods for better readability and testability.

Consider extracting methods like:

ValidateSpeakerId(sid, num_speakers) for speaker validation logic (lines 157-184)

NormalizeText(text) for text normalization (lines 209-220)

ProcessBatches(x, sid, speed, callback) for batch processing logic (lines 287-324)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57ddf3d and 8742c63.

📒 Files selected for processing (3)

sherpa-onnx/csrc/offline-tts-kitten-impl.h (1 hunks)
sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (1 hunks)
sherpa-onnx/csrc/offline-tts-kitten-model.cc (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

sherpa-onnx/csrc/offline-tts-kitten-model-config.cc

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: the sherpa-onnx jni library files are stored in hugging face repository at https://huggingface.co/cs...

Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.

Applied to files:

sherpa-onnx/csrc/offline-tts-kitten-model.cc

csukuangfj added 5 commits August 7, 2025 18:31

Add model config to kitten tts

78316c3

Add Python API for Kitten TTS model

20c44a0

Add meta data for kitten tts model

9ece4bf

Add kitten tts model

2e1857f

first working version

57ddf3d

csukuangfj requested a review from Copilot August 7, 2025 12:34

Copilot AI reviewed Aug 7, 2025

View reviewed changes

csukuangfj mentioned this pull request Aug 7, 2025

https://github.com/KittenML/KittenTTS TTS model support #2450

Closed

coderabbitai Bot reviewed Aug 7, 2025

View reviewed changes

csukuangfj mentioned this pull request Aug 7, 2025

CPU speed comparison among KittenTTS, Piper, MatchaTTS, and Kokoro TTS KittenML/KittenTTS#40

Open

small fixes

8742c63

coderabbitai Bot reviewed Aug 7, 2025

View reviewed changes

csukuangfj merged commit 090e4f4 into k2-fsa:master Aug 7, 2025
47 of 228 checks passed

csukuangfj deleted the cpp-kitten-tts branch August 7, 2025 13:47

This was referenced Aug 7, 2025

Add Kotlin and Java API for KittenTTS #2461

Merged

Add Android TTS Engine APK for KittenTTS #2465

Merged

coderabbitai Bot mentioned this pull request Aug 16, 2025

Export https://huggingface.co/nvidia/parakeet-tdt-0.6b-v3 to sherpa-onnx #2500

Merged

coderabbitai Bot mentioned this pull request Sep 11, 2025

Add Pipe TTS model support #2591

Closed

coderabbitai Bot mentioned this pull request Oct 9, 2025

Add missing python class definitions for builds without TTS support #2660

Merged

This was referenced Jan 26, 2026

Add C++ runtime and Python support PocketTTS for streaming voice cloning on CPU #3083

Merged

Add Java and Kotlin API for PocketTTS #3095

Merged

Add Supertonic2 TTS support #3094

Merged

coderabbitai Bot mentioned this pull request Feb 24, 2026

Fix hclust_cpp build noise: FetchContent_Populate deprecation and #pragma message #3216

Merged

4 tasks

This was referenced Mar 9, 2026

kokoro TTS Supported axcl backend #3270

Open

kokoro TTS Supported axera backend #3281

Open

coderabbitai Bot mentioned this pull request May 8, 2026

Add KittenTTS v0.8 support #3591

Open

	// (num_speakers, style_dim_[0], style_dim_[2])
	// (num_speakers, style_dim_[0], style_dim_[1])

	std::call_once(init_flag, [data_dir]() {
	std::call_once(init_flag, [data_dir]() {

-      SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",
-                       static_cast<int32_t>(x_shape[0]));
-      exit(-1);
-    }
+    if (x_shape[0] != 1) {
+      std::string error_msg = "Support only batch_size == 1. Given: " +
+                             std::to_string(x_shape[0]);
+      SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+      throw std::runtime_error(error_msg);
+    }

Conversation

csukuangfj commented Aug 7, 2025 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

RTF comparison among kitten-tts, piper tts, matcha tts, and kokoro tts on my MacBook Pro

Usage

1. Download the model

2. Build sherpa-onnx

3. Run it

sid 0 (expr-voice-2-m)

sid 1 (expr-voice-2-f)

sid 2 (expr-voice-3-m)

sid 3 (expr-voice-3-f)

sid 4 (expr-voice-4-m)

sid 5 (expr-voice-4-f)

sid 6 (expr-voice-5-m)

sid 7 (expr-voice-5-f)

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Other AI code review bot(s) detected

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Assessment against linked issues

Assessment against linked issues: Out-of-scope changes

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

csukuangfj commented Aug 7, 2025 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Aug 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)