Add C++ runtime for kitten-tts#2460
Conversation
|
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughThis change introduces support for the Kitten TTS model in the codebase. It adds new configuration, metadata, implementation, and Python binding files for the Kitten model, updates build scripts, and integrates Kitten model handling into the TTS creation, frontend, and phonemization logic. Existing structures and methods are extended to recognize and process the Kitten model alongside other supported TTS models. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant OfflineTtsImpl
participant OfflineTtsKittenImpl
participant OfflineTtsKittenModel
participant PiperPhonemizeLexicon
User->>OfflineTtsImpl: Create(config)
OfflineTtsImpl->>OfflineTtsKittenImpl: (if config.kitten.model set)
OfflineTtsKittenImpl->>OfflineTtsKittenModel: Initialize with config
OfflineTtsKittenImpl->>PiperPhonemizeLexicon: Initialize with Kitten metadata
User->>OfflineTtsKittenImpl: Generate(text, sid, speed)
OfflineTtsKittenImpl->>PiperPhonemizeLexicon: Tokenize text
OfflineTtsKittenImpl->>OfflineTtsKittenModel: Run(token_ids, sid, speed)
OfflineTtsKittenModel-->>OfflineTtsKittenImpl: Audio samples
OfflineTtsKittenImpl-->>User: GeneratedAudio
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes found. Poem
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Pull Request Overview
This PR adds C++ runtime support for the Kitten TTS model, a new text-to-speech model that provides efficient voice synthesis with multiple speaker support. The implementation follows the existing pattern used for other TTS models like Kokoro and Vits.
- Implements complete Kitten TTS model support including configuration, model loading, and inference
- Adds Python bindings for the new Kitten TTS model configuration
- Integrates Kitten TTS into the existing TTS pipeline with shared phonemization logic
Reviewed Changes
Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| sherpa-onnx/csrc/offline-tts-kitten-model.h | Header defining the Kitten TTS model class interface |
| sherpa-onnx/csrc/offline-tts-kitten-model.cc | Core implementation of Kitten TTS model inference |
| sherpa-onnx/csrc/offline-tts-kitten-model-config.h | Configuration structure for Kitten TTS models |
| sherpa-onnx/csrc/offline-tts-kitten-model-config.cc | Implementation of Kitten TTS configuration with validation |
| sherpa-onnx/csrc/offline-tts-kitten-impl.h | High-level Kitten TTS implementation with text processing |
| sherpa-onnx/csrc/offline-tts-model-config.h | Updated main TTS config to include Kitten support |
| sherpa-onnx/csrc/offline-tts-model-config.cc | Updated validation logic for multiple TTS models |
| sherpa-onnx/csrc/piper-phonemize-lexicon.h | Extended phonemization to support Kitten models |
| sherpa-onnx/csrc/piper-phonemize-lexicon.cc | Shared phonemization logic between Kokoro and Kitten |
| sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc | Python bindings for Kitten TTS configuration |
| OfflineTtsKittenModelMetaData meta_data_; | ||
| std::vector<int32_t> style_dim_; | ||
|
|
||
| // (num_speakers, style_dim_[0], style_dim_[2]) |
There was a problem hiding this comment.
The comment refers to 'style_dim_[2]' but the code uses 'style_dim_[1]' throughout. This inconsistency could mislead developers about the data structure.
| // (num_speakers, style_dim_[0], style_dim_[2]) | |
| // (num_speakers, style_dim_[0], style_dim_[1]) |
| @@ -277,7 +277,6 @@ static std::vector<int64_t> CoquiPhonemesToIds( | |||
| void InitEspeak(const std::string &data_dir) { | |||
| static std::once_flag init_flag; | |||
| std::call_once(init_flag, [data_dir]() { | |||
There was a problem hiding this comment.
[nitpick] The empty line removal at line 280 changes formatting without adding value and may indicate an unintentional modification during development.
| std::call_once(init_flag, [data_dir]() { | |
| std::call_once(init_flag, [data_dir]() { |
There was a problem hiding this comment.
Actionable comments posted: 6
🔭 Outside diff range comments (2)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)
1-32: Missing pybind11 include for py::module.The file uses
py::moduleandpy::class_but doesn't include the necessary pybind11 headers. This could lead to compilation errors.Add the missing include at the top of the file:
#include "sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h" #include <string> + +#include "pybind11/pybind11.h" #include "sherpa-onnx/csrc/offline-tts-kitten-model-config.h"sherpa-onnx/csrc/piper-phonemize-lexicon.cc (1)
526-555: Add template instantiations for Kitten model constructors.The template instantiations are missing for the new Kitten model constructors on Android and OHOS platforms. This could lead to linking errors when using Kitten models on these platforms.
Add the missing template instantiations:
template PiperPhonemizeLexicon::PiperPhonemizeLexicon( AAssetManager *mgr, const std::string &tokens, const std::string &data_dir, const OfflineTtsKokoroModelMetaData &kokoro_meta_data); + +template PiperPhonemizeLexicon::PiperPhonemizeLexicon( + AAssetManager *mgr, const std::string &tokens, const std::string &data_dir, + const OfflineTtsKittenModelMetaData &kitten_meta_data); #endif #if __OHOS__And similarly for OHOS:
template PiperPhonemizeLexicon::PiperPhonemizeLexicon( NativeResourceManager *mgr, const std::string &tokens, const std::string &data_dir, const OfflineTtsKokoroModelMetaData &kokoro_meta_data); + +template PiperPhonemizeLexicon::PiperPhonemizeLexicon( + NativeResourceManager *mgr, const std::string &tokens, + const std::string &data_dir, + const OfflineTtsKittenModelMetaData &kitten_meta_data); #endif
🧹 Nitpick comments (4)
sherpa-onnx/csrc/offline-tts-kitten-impl.h (1)
360-362: Use consistent types in tensor shape arrayThe array is declared as
int64_tbut initialized with a castedint32_t. Useint64_tdirectly for consistency.- std::array<int64_t, 2> x_shape = {1, static_cast<int32_t>(x.size())}; + std::array<int64_t, 2> x_shape = {1, static_cast<int64_t>(x.size())};sherpa-onnx/csrc/offline-tts-kitten-model.cc (3)
72-72: Remove misleading const commentThe comment
/*const*/is confusing. Since the pointerpis not const and the data it points to may be modified, remove the comment to avoid confusion.- /*const*/ float *p = styles_.data() + sid * dim1; + float *p = styles_.data() + sid * dim1;
94-94: Remove unnecessarystd::moveon return valueThe
std::moveon the return statement is unnecessary due to NRVO/RVO optimizations.- return std::move(out[0]); + return out[0];
226-226: Fix incorrect array index in commentThe comment mentions
style_dim_[2]but the array only has 2 elements (indices 0 and 1).- // (num_speakers, style_dim_[0], style_dim_[2]) + // (num_speakers, style_dim_[0], style_dim_[1])
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (22)
sherpa-onnx/csrc/CMakeLists.txt(1 hunks)sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc(1 hunks)sherpa-onnx/csrc/offline-tts-frontend.h(1 hunks)sherpa-onnx/csrc/offline-tts-impl.cc(3 hunks)sherpa-onnx/csrc/offline-tts-kitten-impl.h(1 hunks)sherpa-onnx/csrc/offline-tts-kitten-model-config.cc(1 hunks)sherpa-onnx/csrc/offline-tts-kitten-model-config.h(1 hunks)sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h(1 hunks)sherpa-onnx/csrc/offline-tts-kitten-model.cc(1 hunks)sherpa-onnx/csrc/offline-tts-kitten-model.h(1 hunks)sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h(1 hunks)sherpa-onnx/csrc/offline-tts-kokoro-model.cc(1 hunks)sherpa-onnx/csrc/offline-tts-kokoro-model.h(1 hunks)sherpa-onnx/csrc/offline-tts-model-config.cc(3 hunks)sherpa-onnx/csrc/offline-tts-model-config.h(3 hunks)sherpa-onnx/csrc/piper-phonemize-lexicon.cc(5 hunks)sherpa-onnx/csrc/piper-phonemize-lexicon.h(4 hunks)sherpa-onnx/csrc/sherpa-onnx-offline-tts.cc(1 hunks)sherpa-onnx/python/csrc/CMakeLists.txt(1 hunks)sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc(1 hunks)sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h(1 hunks)sherpa-onnx/python/csrc/offline-tts-model-config.cc(2 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: the sherpa-onnx jni library files are stored in hugging face repository at https://huggingface.co/cs...
Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.
Applied to files:
sherpa-onnx/csrc/CMakeLists.txtsherpa-onnx/python/csrc/CMakeLists.txtsherpa-onnx/python/csrc/offline-tts-kitten-model-config.hsherpa-onnx/csrc/offline-tts-impl.ccsherpa-onnx/python/csrc/offline-tts-model-config.ccsherpa-onnx/python/csrc/offline-tts-kitten-model-config.ccsherpa-onnx/csrc/offline-tts-model-config.hsherpa-onnx/csrc/offline-tts-kitten-model-meta-data.hsherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.hsherpa-onnx/csrc/offline-tts-kitten-model.hsherpa-onnx/csrc/offline-tts-kitten-model.ccsherpa-onnx/csrc/piper-phonemize-lexicon.h
📚 Learning: in sherpa-onnx java api, the native library names in core.java (win_native_library_name = "sherpa-on...
Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:18:47.981Z
Learning: In sherpa-onnx Java API, the native library names in Core.java (WIN_NATIVE_LIBRARY_NAME = "sherpa-onnx-jni.dll", UNIX_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.so", MACOS_NATIVE_LIBRARY_NAME = "libsherpa-onnx-jni.dylib") are copied directly from the compiled binary filenames and should not be changed to match other libraries' naming conventions.
Applied to files:
sherpa-onnx/csrc/CMakeLists.txtsherpa-onnx/python/csrc/offline-tts-kitten-model-config.hsherpa-onnx/csrc/offline-tts-impl.ccsherpa-onnx/python/csrc/offline-tts-model-config.ccsherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc
🧬 Code Graph Analysis (7)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h (1)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (2)
PybindOfflineTtsKittenModelConfig(13-29)PybindOfflineTtsKittenModelConfig(13-13)
sherpa-onnx/csrc/offline-tts-frontend.h (1)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (2)
ConvertTextToTokenIdsKokoroOrKitten(463-489)ConvertTextToTokenIdsKokoroOrKitten(463-466)
sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc (1)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (2)
ConvertTextToTokenIdsKokoroOrKitten(463-489)ConvertTextToTokenIdsKokoroOrKitten(463-466)
sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (1)
sherpa-onnx/csrc/offline-tts-model-config.cc (6)
Register(11-25)Register(11-11)Validate(27-52)Validate(27-27)ToString(54-67)ToString(54-54)
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)
sherpa-onnx/python/csrc/offline-tts-vits-model-config.cc (1)
sherpa_onnx(11-37)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (3)
sherpa-onnx/csrc/piper-phonemize-lexicon.h (1)
PiperPhonemizeLexicon(20-74)sherpa-onnx/csrc/offline-tts-character-frontend.cc (5)
ReadTokens(29-80)ReadTokens(29-29)is(85-85)ConvertTextToTokenIds(99-192)ConvertTextToTokenIds(99-100)sherpa-onnx/csrc/file-utils.cc (6)
ReadFile(27-31)ReadFile(27-27)ReadFile(34-49)ReadFile(34-34)ReadFile(53-81)ReadFile(53-54)
sherpa-onnx/csrc/piper-phonemize-lexicon.h (1)
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (14)
PiperPhonemizeLexicon(307-317)PiperPhonemizeLexicon(320-334)PiperPhonemizeLexicon(336-346)PiperPhonemizeLexicon(348-358)PiperPhonemizeLexicon(360-370)PiperPhonemizeLexicon(373-387)PiperPhonemizeLexicon(390-404)PiperPhonemizeLexicon(407-421)PiperPhonemizeLexicon(527-529)PiperPhonemizeLexicon(531-533)PiperPhonemizeLexicon(535-537)PiperPhonemizeLexicon(541-544)PiperPhonemizeLexicon(546-549)PiperPhonemizeLexicon(551-554)
🔇 Additional comments (36)
sherpa-onnx/csrc/sherpa-onnx-offline-tts.cc (1)
104-104: LGTM! Good addition for performance monitoring.The thread count information complements the existing RTF and timing metrics, which is valuable for performance analysis and debugging.
sherpa-onnx/csrc/offline-tts-kokoro-model-meta-data.h (1)
14-16: LGTM! Improved documentation accuracy.The updated references to version-specific metadata scripts provide clearer guidance for developers working with different Kokoro model versions.
sherpa-onnx/python/csrc/CMakeLists.txt (1)
70-70: LGTM! Correct addition for Kitten model Python bindings.The new source file is properly added to the TTS-enabled build configuration, ensuring Python access to Kitten model configuration.
sherpa-onnx/csrc/CMakeLists.txt (1)
195-196: LGTM! Proper integration of Kitten model source files.Both the configuration and implementation files are correctly added to the TTS-enabled build, following the established pattern for other TTS models.
sherpa-onnx/csrc/offline-tts-kokoro-model.cc (1)
173-173: LGTM! Corrected misleading error message.The error message now accurately reflects the dimension being validated (
style_dim[1]instead ofstyle_dim[0]), improving debugging clarity.sherpa-onnx/csrc/kokoro-multi-lang-lexicon.cc (1)
263-264: LGTM! Clean function rename for dual model support.The function call has been properly updated to use
ConvertTextToTokenIdsKokoroOrKitteninstead of the Kokoro-specific function. This change extends support to both Kokoro and Kitten models while maintaining the same interface and parameters.sherpa-onnx/python/csrc/offline-tts-kitten-model-config.h (1)
1-17: LGTM! Well-structured Python binding header.The header file follows all best practices with proper include guards, copyright notice, minimal includes, and a clean function declaration. The structure is consistent with other Python binding headers in the codebase.
sherpa-onnx/csrc/offline-tts-kokoro-model.h (1)
26-27: LGTM! Documentation update improves clarity.The comment has been corrected to accurately reflect that the
Runmethod returns audio samples rather than mel spectrogram data. This documentation improvement aligns with the expected TTS output interface and provides clearer guidance for users of this method.sherpa-onnx/csrc/offline-tts-frontend.h (1)
62-62: LGTM! Function declaration updated for dual model support.The function declaration has been properly renamed to
ConvertTextToTokenIdsKokoroOrKittento reflect support for both Kokoro and Kitten models. The signature remains unchanged, maintaining backward compatibility while clearly indicating the extended functionality.sherpa-onnx/csrc/offline-tts-kitten-model-meta-data.h (1)
15-24: LGTM! Well-designed metadata struct for Kitten model.The
OfflineTtsKittenModelMetaDatastruct is well-structured with appropriate fields for TTS model configuration. The default values are sensible:
has_espeak = 1enables espeak support by defaultversion = 1provides a reasonable initial versionmax_token_len = 256sets a practical token sequence limit- Other fields defaulted to 0 will be populated from the model
The reference to the external script provides helpful context for understanding metadata generation.
sherpa-onnx/python/csrc/offline-tts-kitten-model-config.cc (1)
13-29: LGTM! Consistent Python binding implementation.The binding implementation follows the established pattern from other TTS model configs, correctly exposing constructors, member variables, and methods. The parameter naming and structure are appropriate for the Kitten model configuration.
sherpa-onnx/csrc/offline-tts-model-config.cc (3)
15-15: LGTM! Proper integration of Kitten model registration.The Kitten model is correctly integrated into the command-line option registration alongside other TTS models.
41-51: LGTM! Consistent validation logic for Kitten model.The validation follows the same pattern as other TTS models, checking for non-empty model path and delegating to the specific model's validation method. The updated error message is appropriate for the general case.
61-61: LGTM! Proper inclusion in string representation.The Kitten model configuration is correctly included in the ToString output, maintaining consistency with other model configurations.
sherpa-onnx/csrc/offline-tts-model-config.h (3)
10-10: LGTM! Proper header inclusion.The include for the Kitten model config header is correctly placed in alphabetical order with other TTS model headers.
22-22: LGTM! Consistent member variable addition.The Kitten model configuration member is properly added alongside other TTS model configurations.
33-42: LGTM! Proper constructor integration.The constructor parameter and member initialization for the Kitten model configuration follow the established pattern and maintain consistency with other TTS models.
sherpa-onnx/csrc/offline-tts-impl.cc (3)
19-19: LGTM! Proper header inclusion.The include for the Kitten implementation is correctly added to support the new model type.
44-52: LGTM! Improved factory method with proper error handling.The factory method now correctly handles all TTS model types including Kitten, and properly returns null instead of defaulting to an incorrect model when no model is specified. This is a significant improvement in error handling.
62-69: LGTM! Consistent template method implementation.The template factory method maintains the same improved logic as the regular factory method, ensuring consistent behavior across both overloads.
sherpa-onnx/csrc/offline-tts-kitten-model-config.cc (2)
15-25: LGTM! Well-structured command-line option registration.The command-line options are properly named with consistent "kitten-" prefix and have clear, descriptive help text. The registration follows the established pattern.
84-95: LGTM! Standard ToString implementation.The string representation includes all configuration fields and follows the established pattern used by other TTS model configurations.
sherpa-onnx/csrc/offline-tts-kitten-model-config.h (1)
1-44: LGTM! Well-structured configuration header.The header follows established patterns from other model configurations in the codebase, with appropriate member variables, constructors, and method declarations. The structure is consistent and the default
length_scale = 1.0provides reasonable speed behavior.sherpa-onnx/python/csrc/offline-tts-model-config.cc (3)
10-10: LGTM! Proper include for Kitten model bindings.The include directive follows the established pattern for other model configuration headers.
21-21: LGTM! Consistent binding registration.The call to
PybindOfflineTtsKittenModelConfig(m)follows the same pattern as other model binding registrations.
29-30: LGTM! Complete Python API integration.The constructor parameter addition and property exposure for the
kittenmodel are consistent with the established patterns for other TTS models (vits, matcha, kokoro). The default value initialization ensures backward compatibility.Also applies to: 35-35, 41-41
sherpa-onnx/csrc/offline-tts-kitten-model.h (1)
17-36: LGTM! Well-designed model class interface.The
OfflineTtsKittenModelclass declaration follows established patterns:
- Uses Pimpl idiom for implementation hiding
- Provides both direct and template constructors for resource management flexibility
Runmethod signature aligns with TTS model expectations (tensor input, speaker ID, speed control)- Const reference return from
GetMetaData()provides safe metadata accessThe interface design is consistent with other TTS model classes in the codebase.
sherpa-onnx/csrc/piper-phonemize-lexicon.h (3)
13-13: LGTM! Appropriate include for Kitten metadata.The include directive follows the established pattern for other model metadata headers.
31-32: LGTM! Consistent constructor declarations.The constructor declarations for both regular and template versions follow the established patterns for other model types (Vits, Matcha, Kokoro). The template version enables resource manager flexibility for Android/OHOS platforms.
Also applies to: 49-52
70-70: LGTM! Proper member additions.The
kitten_meta_data_member andis_kitten_flag follow the established naming conventions and patterns used by other model types in the class.Also applies to: 73-73
sherpa-onnx/csrc/piper-phonemize-lexicon.cc (5)
183-183: LGTM! Appropriate function renaming for model unification.Renaming
PiperPhonemesToIdsKokorotoPiperPhonemesToIdsKokoroOrKittenmakes sense since the same phoneme processing logic can be shared between Kokoro and Kitten models.
360-370: LGTM! Consistent constructor implementation.The Kitten model constructor follows the established pattern from other model constructors:
- Properly initializes
kitten_meta_data_and setsis_kitten_flag- Uses the same
ReadTokensfunction for token loading- Calls
InitEspeakfor phoneme processing setup
406-421: LGTM! Consistent template constructor implementation.The template constructor for resource manager support follows the same pattern as other model types:
- Uses
ReadFilewith manager to load tokens from assets/resources- Creates
std::istrstreamfor token parsing- Maintains the same initialization flow as the file-based constructor
428-432: LGTM! Proper branching logic for Kitten model.The addition of the
is_kitten_branch follows the established pattern and correctly useskitten_meta_data_.max_token_lenparameter with the unified conversion function.
463-463: LGTM! Consistent function renaming and usage.The function rename from
ConvertTextToTokenIdsKokorotoConvertTextToTokenIdsKokoroOrKittenproperly reflects the unified functionality, and the call site is updated appropriately to use the renamed function.Also applies to: 480-481
sherpa-onnx/csrc/offline-tts-kitten-model.cc (1)
204-206: No alignment issue with reinterpret_cast hereThe
voices_datapointer always comes fromstd::vector<char>::data(), which uses the default allocator (calling the globaloperator new), and that guarantees alignment to at leastalignof(std::max_align_t). Sincealignof(float) ≤ alignof(std::max_align_t), the castreinterpret_cast<const float *>(voices_data)is safe on all standard-conforming C++ implementations. No runtime alignment check or change to
memcpyis required here.
| SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d", | ||
| static_cast<int32_t>(x_shape[0])); | ||
| exit(-1); | ||
| } |
There was a problem hiding this comment.
Replace exit(-1) with exception for better error handling
Using exit(-1) terminates the entire program abruptly. Consider throwing an exception to allow callers to handle the error gracefully.
if (x_shape[0] != 1) {
- SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d",
- static_cast<int32_t>(x_shape[0]));
- exit(-1);
+ std::string error_msg = "Support only batch_size == 1. Given: " +
+ std::to_string(x_shape[0]);
+ SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+ throw std::runtime_error(error_msg);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d", | |
| static_cast<int32_t>(x_shape[0])); | |
| exit(-1); | |
| } | |
| if (x_shape[0] != 1) { | |
| std::string error_msg = "Support only batch_size == 1. Given: " + | |
| std::to_string(x_shape[0]); | |
| SHERPA_ONNX_LOGE("%s", error_msg.c_str()); | |
| throw std::runtime_error(error_msg); | |
| } |
🤖 Prompt for AI Agents
In sherpa-onnx/csrc/offline-tts-kitten-model.cc around lines 64 to 67, replace
the call to exit(-1) with throwing an appropriate exception to signal the error.
This change will allow the caller to catch and handle the error instead of
terminating the program abruptly. Use a standard exception type or define a
custom one if needed, and include a descriptive error message indicating that
only batch_size == 1 is supported.
| if (model_type != "kitten-tts") { | ||
| SHERPA_ONNX_LOGE( | ||
| "Please download the kitten tts model from us containing meta data"); | ||
| SHERPA_ONNX_EXIT(-1); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider using exceptions instead of SHERPA_ONNX_EXIT(-1)
Multiple validation failures use SHERPA_ONNX_EXIT(-1) which terminates the program. Consider throwing exceptions for better error handling and recovery options.
Example refactor for line 137-141:
if (model_type != "kitten-tts") {
- SHERPA_ONNX_LOGE(
- "Please download the kitten tts model from us containing meta data");
- SHERPA_ONNX_EXIT(-1);
+ std::string error_msg =
+ "Please download the kitten tts model from us containing meta data";
+ SHERPA_ONNX_LOGE("%s", error_msg.c_str());
+ throw std::runtime_error(error_msg);
}Also applies to: 150-152, 173-181, 201-201
🤖 Prompt for AI Agents
In sherpa-onnx/csrc/offline-tts-kitten-model.cc lines 137 to 141, replace the
use of SHERPA_ONNX_EXIT(-1) with throwing a suitable exception to allow better
error handling and recovery. Change the code to throw an exception after logging
the error instead of terminating the program immediately. Apply the same
refactor pattern to the similar cases at lines 150-152, 173-181, and 201.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (5)
sherpa-onnx/csrc/offline-tts-kitten-model.cc (3)
110-110: Debug output incorrectly identifies this as 'kitten model' when it should match the class nameThis could be confusing when debugging multiple model types.
- os << "---kitten model---\n"; + os << "---OfflineTtsKittenModel---\n";
63-67: ReplaceSHERPA_ONNX_EXIT(-1)with exception for better error handlingUsing
SHERPA_ONNX_EXIT(-1)terminates the entire program abruptly. Consider throwing an exception to allow callers to handle the error gracefully.if (x_shape[0] != 1) { - SHERPA_ONNX_LOGE("Support only batch_size == 1. Given: %d", - static_cast<int32_t>(x_shape[0])); - SHERPA_ONNX_EXIT(-1); + std::string error_msg = "Support only batch_size == 1. Given: " + + std::to_string(x_shape[0]); + SHERPA_ONNX_LOGE("%s", error_msg.c_str()); + throw std::runtime_error(error_msg); }
137-141: Consider using exceptions instead ofSHERPA_ONNX_EXIT(-1)Multiple validation failures use
SHERPA_ONNX_EXIT(-1)which terminates the program. Consider throwing exceptions for better error handling and recovery options.Example refactor for lines 137-141:
if (model_type != "kitten-tts") { - SHERPA_ONNX_LOGE( - "Please download the kitten tts model from us containing meta data"); - SHERPA_ONNX_EXIT(-1); + std::string error_msg = + "Please download the kitten tts model from us containing meta data"; + SHERPA_ONNX_LOGE("%s", error_msg.c_str()); + throw std::runtime_error(error_msg); }Apply similar changes to lines 150-152, 173-181, and 201.
Also applies to: 150-152, 173-181, 201-201
sherpa-onnx/csrc/offline-tts-kitten-impl.h (2)
11-11: Replace deprecated<strstream>with<sstream>The header
<strstream>has been deprecated since C++98. Use<sstream>instead for better portability and future compatibility.-#include <strstream> +#include <sstream>
105-105: Replace deprecatedstd::istrstreamwithstd::istringstreamThe templated constructor uses deprecated
std::istrstream. Replace withstd::istringstreamfor modern C++ compliance.auto buf = ReadFile(mgr, f); - std::istrstream is(buf.data(), buf.size()); + std::istringstream is(std::string(buf.data(), buf.size())); tn_list_.push_back(std::make_unique<kaldifst::TextNormalizer>(is));And similarly on line 127:
std::unique_ptr<std::istream> s( - new std::istrstream(buf.data(), buf.size())); + new std::istringstream(std::string(buf.data(), buf.size())));Also applies to: 127-127
🧹 Nitpick comments (3)
sherpa-onnx/csrc/offline-tts-kitten-model.cc (2)
72-72: Make pointerconstas indicated by the commentThe comment
/*const*/suggests this pointer should be const-qualified.- /*const*/ float *p = styles_.data() + sid * dim1; + const float *p = styles_.data() + sid * dim1;
184-185: Simplify calculation sincestyle_dim_[0]is always 1Since
style_dim_[0]is validated to be 1 (lines 178-181), the multiplication is unnecessary.int32_t expected_num_floats = - style_dim_[0] * style_dim_[1] * meta_data_.num_speakers; + style_dim_[1] * meta_data_.num_speakers;sherpa-onnx/csrc/offline-tts-kitten-impl.h (1)
151-327: Consider refactoring the longGeneratemethod for better maintainabilityThe
Generatemethod spans 176 lines with multiple responsibilities: validation, text normalization, tokenization, batching, and audio processing. Consider extracting helper methods for better readability and testability.Consider extracting methods like:
ValidateSpeakerId(sid, num_speakers)for speaker validation logic (lines 157-184)NormalizeText(text)for text normalization (lines 209-220)ProcessBatches(x, sid, speed, callback)for batch processing logic (lines 287-324)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
sherpa-onnx/csrc/offline-tts-kitten-impl.h(1 hunks)sherpa-onnx/csrc/offline-tts-kitten-model-config.cc(1 hunks)sherpa-onnx/csrc/offline-tts-kitten-model.cc(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- sherpa-onnx/csrc/offline-tts-kitten-model-config.cc
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: the sherpa-onnx jni library files are stored in hugging face repository at https://huggingface.co/cs...
Learnt from: litongjava
PR: k2-fsa/sherpa-onnx#2440
File: sherpa-onnx/java-api/src/main/java/com/k2fsa/sherpa/onnx/core/Core.java:4-6
Timestamp: 2025-08-06T04:23:50.237Z
Learning: The sherpa-onnx JNI library files are stored in Hugging Face repository at https://huggingface.co/csukuangfj/sherpa-onnx-libs under versioned directories like jni/1.12.7/, and the actual Windows JNI library filename is "sherpa-onnx-jni.dll" as defined in Core.java constants.
Applied to files:
sherpa-onnx/csrc/offline-tts-kitten-model.cc
Fixes #2450
CC @BarfingLemurs @Ashoka74
RTF comparison among kitten-tts, piper tts, matcha tts, and kokoro tts on my MacBook Pro
(num_threads == 1 is used)
info about my mac

Usage
1. Download the model
2. Build sherpa-onnx
3. Run it
The mapping between speaker IDs (sid) and speaker names is
Generated audios are given below
sid 0 (expr-voice-2-m)
kitten-0.mov
sid 1 (expr-voice-2-f)
kitten-1.mov
sid 2 (expr-voice-3-m)
kitten-2.mov
sid 3 (expr-voice-3-f)
kitten-3.mov
sid 4 (expr-voice-4-m)
kitten-4.mov
sid 5 (expr-voice-4-f)
kitten-5.mov
sid 6 (expr-voice-5-m)
kitten-6.mov
sid 7 (expr-voice-5-f)
kitten-7.mov
Summary by CodeRabbit
New Features
Bug Fixes
Documentation