Skip to content

Add C++ runtime for kitten-tts#2460

Merged
csukuangfj merged 6 commits intok2-fsa:masterfrom
csukuangfj:cpp-kitten-tts
Aug 7, 2025
Merged

Add C++ runtime for kitten-tts#2460
csukuangfj merged 6 commits intok2-fsa:masterfrom
csukuangfj:cpp-kitten-tts

Conversation

@csukuangfj
Copy link
Copy Markdown
Collaborator

@csukuangfj csukuangfj commented Aug 7, 2025

Fixes #2450
CC @BarfingLemurs @Ashoka74

RTF comparison among kitten-tts, piper tts, matcha tts, and kokoro tts on my MacBook Pro

(num_threads == 1 is used)

Model weight type RTF model file size
kitten-nano-en-v0_1-fp16.tar.bz2 float16 0.389 23 MB
vits-piper-en_US-libritts_r-medium.tar.bz2 float32 0.114 75 MB
vits-piper-en_US-libritts_r-medium-int8.tar.bz2 int8 0.320 22 MB
vits-piper-en_US-libritts_r-medium-fp16.tar.bz2 float16 0.123 38 MB
kokoro-en-v0_19.tar.bz2 float32 1.128 330 MB
kokoro-int8-en-v0_19.tar.bz2 int8 1.972 128 MB
matcha-icefall-en_US-ljspeech.tar.bz2 float32 0.118 acoustic model (71 MB), vocoder (51 MB)

info about my mac
Screenshot 2025-08-07 at 20 30 04

Usage

1. Download the model

wget https://github.com/k2-fsa/sherpa-onnx/releases/download/tts-models/kitten-nano-en-v0_1-fp16.tar.bz2
tar xvf kitten-nano-en-v0_1-fp16.tar.bz2
rm kitten-nano-en-v0_1-fp16.tar.bz2

2. Build sherpa-onnx

3. Run it

for sid in 0 1 2 3 4 5 6 7; do
  build/bin/sherpa-onnx-offline-tts \
    --kitten-model=./kitten-nano-en-v0_1-fp16/model.fp16.onnx \
    --kitten-voices=./kitten-nano-en-v0_1-fp16/voices.bin \
    --kitten-tokens=./kitten-nano-en-v0_1-fp16/tokens.txt \
    --kitten-data-dir=./kitten-nano-en-v0_1-fp16/espeak-ng-data \
    --debug=1 \
    --sid=$sid \
    --output-filename="./kitten-$sid.wav" \
    "Today as always, men fall into two groups: slaves and free men. Whoever does not have two-thirds of his day for himself, is a slave, whatever he may be, a statesman, a businessman, an official, or a scholar."
done

The mapping between speaker IDs (sid) and speaker names is

sid 0 1 2 3
speaker name expr-voice-2-m expr-voice-2-f expr-voice-3-m expr-voice-3-f
sid 4 5 6 7
speaker name expr-voice-4-m expr-voice-4-f expr-voice-5-m expr-voice-5-f

Generated audios are given below

sid 0 (expr-voice-2-m)

kitten-0.mov

sid 1 (expr-voice-2-f)

kitten-1.mov

sid 2 (expr-voice-3-m)

kitten-2.mov

sid 3 (expr-voice-3-f)

kitten-3.mov

sid 4 (expr-voice-4-m)

kitten-4.mov

sid 5 (expr-voice-4-f)

kitten-5.mov

sid 6 (expr-voice-5-m)

kitten-6.mov

sid 7 (expr-voice-5-f)

kitten-7.mov

Summary by CodeRabbit

  • New Features

    • Added support for a new "Kitten" offline text-to-speech (TTS) model, including configuration, metadata, phonemization, and synthesis pipeline.
    • Enabled selection and configuration of the Kitten model in both C++ and Python APIs with comprehensive validation and error handling.
    • Provided Python bindings for the Kitten TTS model configuration.
    • Extended phoneme-to-token conversion and lexicon handling to support the Kitten model.
  • Bug Fixes

    • Corrected error messages and documentation comments for model metadata and output shapes.
  • Documentation

    • Updated internal documentation to reflect new model support and clarify usage of model metadata scripts.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Aug 7, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

This change introduces support for the Kitten TTS model in the codebase. It adds new configuration, metadata, implementation, and Python binding files for the Kitten model, updates build scripts, and integrates Kitten model handling into the TTS creation, frontend, and phonemization logic. Existing structures and methods are extended to recognize and process the Kitten model alongside other supported TTS models.

Changes

Cohort / File(s) Change Summary
Kitten Model Core Implementation
sherpa-onnx/csrc/offline-tts-kitten-model.h, sherpa-onnx/csrc/offline-tts-kitten-model.cc, sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h, sherpa-onnx/csrc/offline-tts-kitten-impl.h
Introduced Kitten TTS model implementation, metadata, and main interface for offline TTS synthesis, including text normalization, tokenization, model inference, and audio generation.
Kitten Model Configuration
sherpa-onnx/csrc/offline-tts-kitten-model-config.h, sherpa-onnx/csrc/offline-tts-kitten-model-config.cc
Added Kitten model configuration struct and implementation, including validation, registration, and string representation methods.
Kitten Model Python Bindings
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h, sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc
Added Python bindings for the Kitten model configuration, exposing its members and methods to Python via pybind11.
TTS Model Config Integration
sherpa-onnx/csrc/offline-tts-model-config.h, sherpa-onnx/csrc/offline-tts-model-config.cc, sherpa-onnx/python/csrc/offline-tts-model-config.cc
Extended core TTS model configuration structures and Python bindings to include the Kitten model as an option, updating constructors, validation, and string output.
TTS Model Factory and Control Flow
sherpa-onnx/csrc/offline-tts-impl.cc
Updated the TTS implementation factory to recognize and instantiate the Kitten model, handle configuration, and adjust error handling for missing models.
Build Integration
sherpa-onnx/csrc/CMakeLists.txt, sherpa-onnx/python/csrc/CMakeLists.txt
Added Kitten model source files to the build process for both C++ and Python components.
Phonemizer and Frontend Support
sherpa-onnx/csrc/piper-phonemize-lexicon.h, sherpa-onnx/csrc/piper-phonemize-lexicon.cc, sherpa-onnx/csrc/offline-tts-frontend.h, sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc
Extended phonemizer and frontend logic to support Kitten model metadata and tokenization, adding constructors, flags, and renaming functions for shared logic between Kitten and Kokoro models.
Documentation and Minor Updates
sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h, sherpa-onnx/csrc/offline-tts-kokoro-model.h, sherpa-onnx/csrc/offline-tts-kokoro-model.cc, sherpa-onnx/csrc/sherpa-onnx-offline-tts.cc
Updated comments, error messages, and added minor logging for clarity and accuracy.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant OfflineTtsImpl
    participant OfflineTtsKittenImpl
    participant OfflineTtsKittenModel
    participant PiperPhonemizeLexicon

    User->>OfflineTtsImpl: Create(config)
    OfflineTtsImpl->>OfflineTtsKittenImpl: (if config.kitten.model set)
    OfflineTtsKittenImpl->>OfflineTtsKittenModel: Initialize with config
    OfflineTtsKittenImpl->>PiperPhonemizeLexicon: Initialize with Kitten metadata
    User->>OfflineTtsKittenImpl: Generate(text, sid, speed)
    OfflineTtsKittenImpl->>PiperPhonemizeLexicon: Tokenize text
    OfflineTtsKittenImpl->>OfflineTtsKittenModel: Run(token_ids, sid, speed)
    OfflineTtsKittenModel-->>OfflineTtsKittenImpl: Audio samples
    OfflineTtsKittenImpl-->>User: GeneratedAudio
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Assessment against linked issues

Objective Addressed Explanation
Support for Kitten TTS model (#2450)
Integration of Kitten model into TTS config, factory, and frontend (#2450)
Python binding for Kitten TTS config (#2450)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes found.

Poem

A Kitten now purrs in the TTS den,
With code and configs, it speaks once again.
From phonemes to samples, the pipeline runs neat,
Now English and Kokoro quality meet!
🐾 The build scripts meow, the bindings do too—
This rabbit applauds what the devs did pursue!

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@csukuangfj csukuangfj requested a review from Copilot August 7, 2025 12:34
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds C++ runtime support for the Kitten TTS model, a new text-to-speech model that provides efficient voice synthesis with multiple speaker support. The implementation follows the existing pattern used for other TTS models like Kokoro and Vits.

  • Implements complete Kitten TTS model support including configuration, model loading, and inference
  • Adds Python bindings for the new Kitten TTS model configuration
  • Integrates Kitten TTS into the existing TTS pipeline with shared phonemization logic

Reviewed Changes

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sherpa-onnx/csrc/offline-tts-kitten-model.h Header defining the Kitten TTS model class interface
sherpa-onnx/csrc/offline-tts-kitten-model.cc Core implementation of Kitten TTS model inference
sherpa-onnx/csrc/offline-tts-kitten-model-config.h Configuration structure for Kitten TTS models
sherpa-onnx/csrc/offline-tts-kitten-model-config.cc Implementation of Kitten TTS configuration with validation
sherpa-onnx/csrc/offline-tts-kitten-impl.h High-level Kitten TTS implementation with text processing
sherpa-onnx/csrc/offline-tts-model-config.h Updated main TTS config to include Kitten support
sherpa-onnx/csrc/offline-tts-model-config.cc Updated validation logic for multiple TTS models
sherpa-onnx/csrc/piper-phonemize-lexicon.h Extended phonemization to support Kitten models
sherpa-onnx/csrc/piper-phonemize-lexicon.cc Shared phonemization logic between Kokoro and Kitten
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc Python bindings for Kitten TTS configuration

Comment thread sherpa-onnx/csrc/offline-tts-kitten-model.cc
OfflineTtsKittenModelMetaData meta_data_;
std::vector<int32_t> style_dim_;

// (num_speakers, style_dim_[0], style_dim_[2])
Copy link

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment refers to 'style_dim_[2]' but the code uses 'style_dim_[1]' throughout. This inconsistency could mislead developers about the data structure.

Suggested change
// (num_speakers, style_dim_[0], style_dim_[2])
// (num_speakers, style_dim_[0], style_dim_[1])

Copilot uses AI. Check for mistakes.
Comment thread sherpa-onnx/csrc/offline-tts-model-config.cc
@@ -277,7 +277,6 @@ static std::vector<int64_t> CoquiPhonemesToIds(
void InitEspeak(const std::string &data_dir) {
static std::once_flag init_flag;
std::call_once(init_flag, [data_dir]() {
Copy link

Copilot AI Aug 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The empty line removal at line 280 changes formatting without adding value and may indicate an unintentional modification during development.

Suggested change
std::call_once(init_flag, [data_dir]() {
std::call_once(init_flag, [data_dir]() {

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🔭 Outside diff range comments (2)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)

1-32: Missing pybind11 include for py::module.

The file uses py::module and py::class_ but doesn't include the necessary pybind11 headers. This could lead to compilation errors.

Add the missing include at the top of the file:

 #include "sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h"
 
 #include <string>
+
+#include "pybind11/pybind11.h"
 
 #include "sherpa-onnx/csrc/offline-tts-kitten-model-config.h"
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (1)

526-555: Add template instantiations for Kitten model constructors.

The template instantiations are missing for the new Kitten model constructors on Android and OHOS platforms. This could lead to linking errors when using Kitten models on these platforms.

Add the missing template instantiations:

 template PiperPhonemizeLexicon::PiperPhonemizeLexicon(
     AAssetManager *mgr, const std::string &tokens, const std::string &data_dir,
     const OfflineTtsKokoroModelMetaData &kokoro_meta_data);
+
+template PiperPhonemizeLexicon::PiperPhonemizeLexicon(
+    AAssetManager *mgr, const std::string &tokens, const std::string &data_dir,
+    const OfflineTtsKittenModelMetaData &kitten_meta_data);
 #endif
 
 #if __OHOS__

And similarly for OHOS:

 template PiperPhonemizeLexicon::PiperPhonemizeLexicon(
     NativeResourceManager *mgr, const std::string &tokens,
     const std::string &data_dir,
     const OfflineTtsKokoroModelMetaData &kokoro_meta_data);
+
+template PiperPhonemizeLexicon::PiperPhonemizeLexicon(
+    NativeResourceManager *mgr, const std::string &tokens,
+    const std::string &data_dir,
+    const OfflineTtsKittenModelMetaData &kitten_meta_data);
 #endif
🧹 Nitpick comments (4)
sherpa-onnx/csrc/offline-tts-kitten-impl.h (1)

360-362: Use consistent types in tensor shape array

The array is declared as int64_t but initialized with a casted int32_t. Use int64_t directly for consistency.

-    std::array<int64_t, 2> x_shape = {1, static_cast<int32_t>(x.size())};
+    std::array<int64_t, 2> x_shape = {1, static_cast<int64_t>(x.size())};
sherpa-onnx/csrc/offline-tts-kitten-model.cc (3)

72-72: Remove misleading const comment

The comment /*const*/ is confusing. Since the pointer p is not const and the data it points to may be modified, remove the comment to avoid confusion.

-    /*const*/ float *p = styles_.data() + sid * dim1;
+    float *p = styles_.data() + sid * dim1;

94-94: Remove unnecessary std::move on return value

The std::move on the return statement is unnecessary due to NRVO/RVO optimizations.

-    return std::move(out[0]);
+    return out[0];

226-226: Fix incorrect array index in comment

The comment mentions style_dim_[2] but the array only has 2 elements (indices 0 and 1).

-  // (num_speakers, style_dim_[0], style_dim_[2])
+  // (num_speakers, style_dim_[0], style_dim_[1])
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08aaa89 and 57ddf3d.

📒 Files selected for processing (22)
  • sherpa-onnx/csrc/CMakeLists.txt (1 hunks)
  • sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-frontend.h (1 hunks)
  • sherpa-onnx/csrc/offline-tts-impl.cc (3 hunks)
  • sherpa-onnx/csrc/offline-tts-kitten-impl.h (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kitten-model-config.h (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kitten-model.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kitten-model.h (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kokoro-model.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kokoro-model.h (1 hunks)
  • sherpa-onnx/csrc/offline-tts-model-config.cc (3 hunks)
  • sherpa-onnx/csrc/offline-tts-model-config.h (3 hunks)
  • sherpa-onnx/csrc/piper-phonemize-lexicon.cc (5 hunks)
  • sherpa-onnx/csrc/piper-phonemize-lexicon.h (4 hunks)
  • sherpa-onnx/csrc/sherpa-onnx-offline-tts.cc (1 hunks)
  • sherpa-onnx/python/csrc/CMakeLists.txt (1 hunks)
  • sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1 hunks)
  • sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h (1 hunks)
  • sherpa-onnx/python/csrc/offline-tts-model-config.cc (2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: the sherpa-onnx jni library files are stored in hugging face repository at https://huggingface.co/cs...
Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.

Applied to files:

  • sherpa-onnx/csrc/CMakeLists.txt
  • sherpa-onnx/python/csrc/CMakeLists.txt
  • sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h
  • sherpa-onnx/csrc/offline-tts-impl.cc
  • sherpa-onnx/python/csrc/offline-tts-model-config.cc
  • sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc
  • sherpa-onnx/csrc/offline-tts-model-config.h
  • sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h
  • sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h
  • sherpa-onnx/csrc/offline-tts-kitten-model.h
  • sherpa-onnx/csrc/offline-tts-kitten-model.cc
  • sherpa-onnx/csrc/piper-phonemize-lexicon.h
📚 Learning: in sherpa-onnx java api, the native library names in core.java (win_native_library_name = "sherpa-on...
Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.

Applied to files:

  • sherpa-onnx/csrc/CMakeLists.txt
  • sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h
  • sherpa-onnx/csrc/offline-tts-impl.cc
  • sherpa-onnx/python/csrc/offline-tts-model-config.cc
  • sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc
🧬 Code Graph Analysis (7)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h (1)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (2)
  • PybindOfflineTtsKittenModelConfig (13-29)
  • PybindOfflineTtsKittenModelConfig (13-13)
sherpa-onnx/csrc/offline-tts-frontend.h (1)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (2)
  • ConvertTextToTokenIdsKokoroOrKitten (463-489)
  • ConvertTextToTokenIdsKokoroOrKitten (463-466)
sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc (1)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (2)
  • ConvertTextToTokenIdsKokoroOrKitten (463-489)
  • ConvertTextToTokenIdsKokoroOrKitten (463-466)
sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (1)
sherpa-onnx/csrc/offline-tts-model-config.cc (6)
  • Register (11-25)
  • Register (11-11)
  • Validate (27-52)
  • Validate (27-27)
  • ToString (54-67)
  • ToString (54-54)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)
sherpa-onnx/python/csrc/offline-tts-vits-model-config.cc (1)
  • sherpa_onnx (11-37)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (3)
sherpa-onnx/csrc/piper-phonemize-lexicon.h (1)
  • PiperPhonemizeLexicon (20-74)
sherpa-onnx/csrc/offline-tts-character-frontend.cc (5)
  • ReadTokens (29-80)
  • ReadTokens (29-29)
  • is (85-85)
  • ConvertTextToTokenIds (99-192)
  • ConvertTextToTokenIds (99-100)
sherpa-onnx/csrc/file-utils.cc (6)
  • ReadFile (27-31)
  • ReadFile (27-27)
  • ReadFile (34-49)
  • ReadFile (34-34)
  • ReadFile (53-81)
  • ReadFile (53-54)
sherpa-onnx/csrc/piper-phonemize-lexicon.h (1)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (14)
  • PiperPhonemizeLexicon (307-317)
  • PiperPhonemizeLexicon (320-334)
  • PiperPhonemizeLexicon (336-346)
  • PiperPhonemizeLexicon (348-358)
  • PiperPhonemizeLexicon (360-370)
  • PiperPhonemizeLexicon (373-387)
  • PiperPhonemizeLexicon (390-404)
  • PiperPhonemizeLexicon (407-421)
  • PiperPhonemizeLexicon (527-529)
  • PiperPhonemizeLexicon (531-533)
  • PiperPhonemizeLexicon (535-537)
  • PiperPhonemizeLexicon (541-544)
  • PiperPhonemizeLexicon (546-549)
  • PiperPhonemizeLexicon (551-554)
🔇 Additional comments (36)
sherpa-onnx/csrc/sherpa-onnx-offline-tts.cc (1)

104-104: LGTM! Good addition for performance monitoring.

The thread count information complements the existing RTF and timing metrics, which is valuable for performance analysis and debugging.

sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h (1)

14-16: LGTM! Improved documentation accuracy.

The updated references to version-specific metadata scripts provide clearer guidance for developers working with different Kokoro model versions.

sherpa-onnx/python/csrc/CMakeLists.txt (1)

70-70: LGTM! Correct addition for Kitten model Python bindings.

The new source file is properly added to the TTS-enabled build configuration, ensuring Python access to Kitten model configuration.

sherpa-onnx/csrc/CMakeLists.txt (1)

195-196: LGTM! Proper integration of Kitten model source files.

Both the configuration and implementation files are correctly added to the TTS-enabled build, following the established pattern for other TTS models.

sherpa-onnx/csrc/offline-tts-kokoro-model.cc (1)

173-173: LGTM! Corrected misleading error message.

The error message now accurately reflects the dimension being validated (style_dim[1] instead of style_dim[0]), improving debugging clarity.

sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc (1)

263-264: LGTM! Clean function rename for dual model support.

The function call has been properly updated to use ConvertTextToTokenIdsKokoroOrKitten instead of the Kokoro-specific function. This change extends support to both Kokoro and Kitten models while maintaining the same interface and parameters.

sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h (1)

1-17: LGTM! Well-structured Python binding header.

The header file follows all best practices with proper include guards, copyright notice, minimal includes, and a clean function declaration. The structure is consistent with other Python binding headers in the codebase.

sherpa-onnx/csrc/offline-tts-kokoro-model.h (1)

26-27: LGTM! Documentation update improves clarity.

The comment has been corrected to accurately reflect that the Run method returns audio samples rather than mel spectrogram data. This documentation improvement aligns with the expected TTS output interface and provides clearer guidance for users of this method.

sherpa-onnx/csrc/offline-tts-frontend.h (1)

62-62: LGTM! Function declaration updated for dual model support.

The function declaration has been properly renamed to ConvertTextToTokenIdsKokoroOrKitten to reflect support for both Kokoro and Kitten models. The signature remains unchanged, maintaining backward compatibility while clearly indicating the extended functionality.

sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h (1)

15-24: LGTM! Well-designed metadata struct for Kitten model.

The OfflineTtsKittenModelMetaData struct is well-structured with appropriate fields for TTS model configuration. The default values are sensible:

  • has_espeak = 1 enables espeak support by default
  • version = 1 provides a reasonable initial version
  • max_token_len = 256 sets a practical token sequence limit
  • Other fields defaulted to 0 will be populated from the model

The reference to the external script provides helpful context for understanding metadata generation.

sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)

13-29: LGTM! Consistent Python binding implementation.

The binding implementation follows the established pattern from other TTS model configs, correctly exposing constructors, member variables, and methods. The parameter naming and structure are appropriate for the Kitten model configuration.

sherpa-onnx/csrc/offline-tts-model-config.cc (3)

15-15: LGTM! Proper integration of Kitten model registration.

The Kitten model is correctly integrated into the command-line option registration alongside other TTS models.


41-51: LGTM! Consistent validation logic for Kitten model.

The validation follows the same pattern as other TTS models, checking for non-empty model path and delegating to the specific model's validation method. The updated error message is appropriate for the general case.


61-61: LGTM! Proper inclusion in string representation.

The Kitten model configuration is correctly included in the ToString output, maintaining consistency with other model configurations.

sherpa-onnx/csrc/offline-tts-model-config.h (3)

10-10: LGTM! Proper header inclusion.

The include for the Kitten model config header is correctly placed in alphabetical order with other TTS model headers.


22-22: LGTM! Consistent member variable addition.

The Kitten model configuration member is properly added alongside other TTS model configurations.


33-42: LGTM! Proper constructor integration.

The constructor parameter and member initialization for the Kitten model configuration follow the established pattern and maintain consistency with other TTS models.

sherpa-onnx/csrc/offline-tts-impl.cc (3)

19-19: LGTM! Proper header inclusion.

The include for the Kitten implementation is correctly added to support the new model type.


44-52: LGTM! Improved factory method with proper error handling.

The factory method now correctly handles all TTS model types including Kitten, and properly returns null instead of defaulting to an incorrect model when no model is specified. This is a significant improvement in error handling.


62-69: LGTM! Consistent template method implementation.

The template factory method maintains the same improved logic as the regular factory method, ensuring consistent behavior across both overloads.

sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (2)

15-25: LGTM! Well-structured command-line option registration.

The command-line options are properly named with consistent "kitten-" prefix and have clear, descriptive help text. The registration follows the established pattern.


84-95: LGTM! Standard ToString implementation.

The string representation includes all configuration fields and follows the established pattern used by other TTS model configurations.

sherpa-onnx/csrc/offline-tts-kitten-model-config.h (1)

1-44: LGTM! Well-structured configuration header.

The header follows established patterns from other model configurations in the codebase, with appropriate member variables, constructors, and method declarations. The structure is consistent and the default length_scale = 1.0 provides reasonable speed behavior.

sherpa-onnx/python/csrc/offline-tts-model-config.cc (3)

10-10: LGTM! Proper include for Kitten model bindings.

The include directive follows the established pattern for other model configuration headers.


21-21: LGTM! Consistent binding registration.

The call to PybindOfflineTtsKittenModelConfig(m) follows the same pattern as other model binding registrations.


29-30: LGTM! Complete Python API integration.

The constructor parameter addition and property exposure for the kitten model are consistent with the established patterns for other TTS models (vits, matcha, kokoro). The default value initialization ensures backward compatibility.

Also applies to: 35-35, 41-41

sherpa-onnx/csrc/offline-tts-kitten-model.h (1)

17-36: LGTM! Well-designed model class interface.

The OfflineTtsKittenModel class declaration follows established patterns:

  • Uses Pimpl idiom for implementation hiding
  • Provides both direct and template constructors for resource management flexibility
  • Run method signature aligns with TTS model expectations (tensor input, speaker ID, speed control)
  • Const reference return from GetMetaData() provides safe metadata access

The interface design is consistent with other TTS model classes in the codebase.

sherpa-onnx/csrc/piper-phonemize-lexicon.h (3)

13-13: LGTM! Appropriate include for Kitten metadata.

The include directive follows the established pattern for other model metadata headers.


31-32: LGTM! Consistent constructor declarations.

The constructor declarations for both regular and template versions follow the established patterns for other model types (Vits, Matcha, Kokoro). The template version enables resource manager flexibility for Android/OHOS platforms.

Also applies to: 49-52


70-70: LGTM! Proper member additions.

The kitten_meta_data_ member and is_kitten_ flag follow the established naming conventions and patterns used by other model types in the class.

Also applies to: 73-73

sherpa-onnx/csrc/piper-phonemize-lexicon.cc (5)

183-183: LGTM! Appropriate function renaming for model unification.

Renaming PiperPhonemesToIdsKokoro to PiperPhonemesToIdsKokoroOrKitten makes sense since the same phoneme processing logic can be shared between Kokoro and Kitten models.


360-370: LGTM! Consistent constructor implementation.

The Kitten model constructor follows the established pattern from other model constructors:

  • Properly initializes kitten_meta_data_ and sets is_kitten_ flag
  • Uses the same ReadTokens function for token loading
  • Calls InitEspeak for phoneme processing setup

406-421: LGTM! Consistent template constructor implementation.

The template constructor for resource manager support follows the same pattern as other model types:

  • Uses ReadFile with manager to load tokens from assets/resources
  • Creates std::istrstream for token parsing
  • Maintains the same initialization flow as the file-based constructor

428-432: LGTM! Proper branching logic for Kitten model.

The addition of the is_kitten_ branch follows the established pattern and correctly uses kitten_meta_data_.max_token_len parameter with the unified conversion function.


463-463: LGTM! Consistent function renaming and usage.

The function rename from ConvertTextToTokenIdsKokoro to ConvertTextToTokenIdsKokoroOrKitten properly reflects the unified functionality, and the call site is updated appropriately to use the renamed function.

Also applies to: 480-481

sherpa-onnx/csrc/offline-tts-kitten-model.cc (1)

204-206: No alignment issue with reinterpret_cast here

The voices_data pointer always comes from std::vector<char>::data(), which uses the default allocator (calling the global operator new), and that guarantees alignment to at least alignof(std::max_align_t). Since alignof(float) ≤ alignof(std::max_align_t), the cast

reinterpret_cast<const float *>(voices_data)

is safe on all standard-conforming C++ implementations. No runtime alignment check or change to memcpy is required here.

Comment thread sherpa-onnx/csrc/offline-tts-kitten-impl.h
Comment thread sherpa-onnx/csrc/offline-tts-kitten-impl.h
Comment thread sherpa-onnx/csrc/offline-tts-kitten-impl.h
Comment thread sherpa-onnx/csrc/offline-tts-kitten-model-config.cc
Comment on lines +64 to +67
SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",
static_cast<int32_t>(x_shape[0]));
exit(-1);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace exit(-1) with exception for better error handling

Using exit(-1) terminates the entire program abruptly. Consider throwing an exception to allow callers to handle the error gracefully.

     if (x_shape[0] != 1) {
-      SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",
-                       static_cast<int32_t>(x_shape[0]));
-      exit(-1);
+      std::string error_msg = "Support only batch_size == 1. Given: " + 
+                             std::to_string(x_shape[0]);
+      SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+      throw std::runtime_error(error_msg);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",
static_cast<int32_t>(x_shape[0]));
exit(-1);
}
if (x_shape[0] != 1) {
std::string error_msg = "Support only batch_size == 1. Given: " +
std::to_string(x_shape[0]);
SHERPA_ONNX_LOGE("%s", error_msg.c_str());
throw std::runtime_error(error_msg);
}
🤖 Prompt for AI Agents
In sherpa-onnx/csrc/offline-tts-kitten-model.cc around lines 64 to 67, replace
the call to exit(-1) with throwing an appropriate exception to signal the error.
This change will allow the caller to catch and handle the error instead of
terminating the program abruptly. Use a standard exception type or define a
custom one if needed, and include a descriptive error message indicating that
only batch_size == 1 is supported.

Comment on lines +137 to +141
if (model_type != "kitten-tts") {
SHERPA_ONNX_LOGE(
"Please download the kitten tts model from us containing meta data");
SHERPA_ONNX_EXIT(-1);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider using exceptions instead of SHERPA_ONNX_EXIT(-1)

Multiple validation failures use SHERPA_ONNX_EXIT(-1) which terminates the program. Consider throwing exceptions for better error handling and recovery options.

Example refactor for line 137-141:

     if (model_type != "kitten-tts") {
-      SHERPA_ONNX_LOGE(
-          "Please download the kitten tts model from us containing meta data");
-      SHERPA_ONNX_EXIT(-1);
+      std::string error_msg = 
+          "Please download the kitten tts model from us containing meta data";
+      SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+      throw std::runtime_error(error_msg);
     }

Also applies to: 150-152, 173-181, 201-201

🤖 Prompt for AI Agents
In sherpa-onnx/csrc/offline-tts-kitten-model.cc lines 137 to 141, replace the
use of SHERPA_ONNX_EXIT(-1) with throwing a suitable exception to allow better
error handling and recovery. Change the code to throw an exception after logging
the error instead of terminating the program immediately. Apply the same
refactor pattern to the similar cases at lines 150-152, 173-181, and 201.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (5)
sherpa-onnx/csrc/offline-tts-kitten-model.cc (3)

110-110: Debug output incorrectly identifies this as 'kitten model' when it should match the class name

This could be confusing when debugging multiple model types.

-      os << "---kitten model---\n";
+      os << "---OfflineTtsKittenModel---\n";

63-67: Replace SHERPA_ONNX_EXIT(-1) with exception for better error handling

Using SHERPA_ONNX_EXIT(-1) terminates the entire program abruptly. Consider throwing an exception to allow callers to handle the error gracefully.

     if (x_shape[0] != 1) {
-      SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",
-                       static_cast<int32_t>(x_shape[0]));
-      SHERPA_ONNX_EXIT(-1);
+      std::string error_msg = "Support only batch_size == 1. Given: " + 
+                             std::to_string(x_shape[0]);
+      SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+      throw std::runtime_error(error_msg);
     }

137-141: Consider using exceptions instead of SHERPA_ONNX_EXIT(-1)

Multiple validation failures use SHERPA_ONNX_EXIT(-1) which terminates the program. Consider throwing exceptions for better error handling and recovery options.

Example refactor for lines 137-141:

     if (model_type != "kitten-tts") {
-      SHERPA_ONNX_LOGE(
-          "Please download the kitten tts model from us containing meta data");
-      SHERPA_ONNX_EXIT(-1);
+      std::string error_msg = 
+          "Please download the kitten tts model from us containing meta data";
+      SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+      throw std::runtime_error(error_msg);
     }

Apply similar changes to lines 150-152, 173-181, and 201.

Also applies to: 150-152, 173-181, 201-201

sherpa-onnx/csrc/offline-tts-kitten-impl.h (2)

11-11: Replace deprecated <strstream> with <sstream>

The header <strstream> has been deprecated since C++98. Use <sstream> instead for better portability and future compatibility.

-#include <strstream>
+#include <sstream>

105-105: Replace deprecated std::istrstream with std::istringstream

The templated constructor uses deprecated std::istrstream. Replace with std::istringstream for modern C++ compliance.

         auto buf = ReadFile(mgr, f);
-        std::istrstream is(buf.data(), buf.size());
+        std::istringstream is(std::string(buf.data(), buf.size()));
         tn_list_.push_back(std::make_unique<kaldifst::TextNormalizer>(is));

And similarly on line 127:

         std::unique_ptr<std::istream> s(
-            new std::istrstream(buf.data(), buf.size()));
+            new std::istringstream(std::string(buf.data(), buf.size())));

Also applies to: 127-127

🧹 Nitpick comments (3)
sherpa-onnx/csrc/offline-tts-kitten-model.cc (2)

72-72: Make pointer const as indicated by the comment

The comment /*const*/ suggests this pointer should be const-qualified.

-    /*const*/ float *p = styles_.data() + sid * dim1;
+    const float *p = styles_.data() + sid * dim1;

184-185: Simplify calculation since style_dim_[0] is always 1

Since style_dim_[0] is validated to be 1 (lines 178-181), the multiplication is unnecessary.

     int32_t expected_num_floats =
-        style_dim_[0] * style_dim_[1] * meta_data_.num_speakers;
+        style_dim_[1] * meta_data_.num_speakers;
sherpa-onnx/csrc/offline-tts-kitten-impl.h (1)

151-327: Consider refactoring the long Generate method for better maintainability

The Generate method spans 176 lines with multiple responsibilities: validation, text normalization, tokenization, batching, and audio processing. Consider extracting helper methods for better readability and testability.

Consider extracting methods like:

  • ValidateSpeakerId(sid, num_speakers) for speaker validation logic (lines 157-184)
  • NormalizeText(text) for text normalization (lines 209-220)
  • ProcessBatches(x, sid, speed, callback) for batch processing logic (lines 287-324)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 57ddf3d and 8742c63.

📒 Files selected for processing (3)
  • sherpa-onnx/csrc/offline-tts-kitten-impl.h (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (1 hunks)
  • sherpa-onnx/csrc/offline-tts-kitten-model.cc (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • sherpa-onnx/csrc/offline-tts-kitten-model-config.cc
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: the sherpa-onnx jni library files are stored in hugging face repository at https://huggingface.co/cs...
Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.

Applied to files:

  • sherpa-onnx/csrc/offline-tts-kitten-model.cc

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

https://github.com/KittenML/KittenTTS TTS model support

2 participants