kokoro TTS Supported axcl backend#3270
Conversation
📝 WalkthroughWalkthroughThe changes add AXCL accelerator support for offline text-to-speech generation with the Kokoro model. This includes new model implementations using AXCL components, an additional AxclModel API overload for uint8_t input data, conditional CMake configuration, and integration with the offline TTS factory system. Changes
Sequence DiagramsequenceDiagram
participant Client
participant OfflineTtsImpl as OfflineTtsImpl<br/>(Factory)
participant OfflineTtsKokoroAxclImpl as OfflineTtsKokoroAxclImpl<br/>(Backend)
participant TextNormalizer
participant OfflineTtsFrontend as OfflineTtsFrontend<br/>(Tokenizer)
participant OfflineTtsKokoroModelAxcl as OfflineTtsKokoroModelAxcl<br/>(Model)
participant AxclModel as AxclModel<br/>(AXCL Inference)
participant OnnxSession as OnnxSession<br/>(ONNX Runtime)
Client->>OfflineTtsImpl: Create(config with provider="axcl")
OfflineTtsImpl->>OfflineTtsKokoroAxclImpl: new OfflineTtsKokoroAxclImpl(config)
OfflineTtsKokoroAxclImpl->>OfflineTtsKokoroAxclImpl: InitFrontend()
Client->>OfflineTtsKokoroAxclImpl: Generate(text, sid, speed)
OfflineTtsKokoroAxclImpl->>TextNormalizer: Normalize text
TextNormalizer-->>OfflineTtsKokoroAxclImpl: normalized_text
OfflineTtsKokoroAxclImpl->>OfflineTtsFrontend: Convert to tokens
OfflineTtsFrontend-->>OfflineTtsKokoroAxclImpl: token_ids
OfflineTtsKokoroAxclImpl->>OfflineTtsKokoroAxclImpl: Batch tokens
OfflineTtsKokoroAxclImpl->>OfflineTtsKokoroModelAxcl: Run(input_ids, sid, speed)
OfflineTtsKokoroModelAxcl->>AxclModel: SetInputTensorData(uint8_t*)
AxclModel->>AxclModel: Run() [Model1: Duration/Alignment]
AxclModel->>AxclModel: Run() [Model2: Refinement]
AxclModel->>OnnxSession: Run() [Model4: F0/Embedding]
AxclModel->>AxclModel: Run() [Model3: Final Processing]
AxclModel-->>OfflineTtsKokoroModelAxcl: raw_audio
OfflineTtsKokoroModelAxcl->>OfflineTtsKokoroModelAxcl: Post-process (iSTFT, trim, adjust)
OfflineTtsKokoroModelAxcl-->>OfflineTtsKokoroAxclImpl: audio_waveform
OfflineTtsKokoroAxclImpl-->>Client: GeneratedAudio
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the Text-to-Speech capabilities by integrating support for the AXCL backend with Kokoro models. This change allows Kokoro TTS models to leverage AXCL hardware for potentially faster and more efficient inference, broadening the range of supported deployment environments. The underlying implementation has been made more flexible through templating, ensuring maintainability and extensibility for future backend integrations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for the Kokoro TTS model on the AXCL backend, introducing a new OfflineTtsKokoroModelAxcl implementation and modifying OfflineTtsImpl. A critical security concern is the lack of sufficient validation for user-supplied parameters such as sid (speaker ID) and speed, which can lead to out-of-bounds memory access and Denial of Service (DoS) via Out-Of-Memory (OOM) crashes. Furthermore, the AXCL model implementation appears to be a direct port with several magic numbers and areas for performance optimization, alongside some minor style issues.
| const float *p = styles_.data() + sid * style_dim0 * style_dim1 + | ||
| phoneme_len * style_dim1; | ||
| std::copy(p, p + style_dim1, ref_s.begin()); | ||
| } else { | ||
| int32_t idx = style_dim0 / 2; | ||
| const float *p = | ||
| styles_.data() + sid * style_dim0 * style_dim1 + idx * style_dim1; | ||
| std::copy(p, p + style_dim1, ref_s.begin()); |
There was a problem hiding this comment.
The sid (speaker ID) parameter is used to calculate a pointer offset into the styles_ vector without any bounds checking. If an invalid sid is provided (e.g., a value greater than or equal to num_speakers), it will result in an out-of-bounds read. This can lead to a process crash (Denial of Service) or potentially leak sensitive information from the process memory into the generated audio output. Additionally, if sid is large enough, the calculation sid * style_dim0 * style_dim1 could overflow the int32_t type, leading to unpredictable memory access.
| sum /= speed; | ||
| pred_dur_original[i] = static_cast<int32_t>(std::max(1.f, std::round(sum))); |
There was a problem hiding this comment.
The speed parameter is used as a divisor in ProcessDuration without validation. If speed is zero, a division by zero occurs, resulting in an infinite value. If speed is an extremely small positive value, the resulting duration values (pred_dur_original) will be very large. This can lead to an excessively large total_frames value, which is subsequently used to allocate the en vector on line 298. An attacker providing a maliciously small speed value could trigger an Out-Of-Memory (OOM) condition, leading to a Denial of Service (DoS) crash.
|
|
||
| bool SetInputTensorData(const std::string &name, const int32_t *p, | ||
| int32_t n) const; | ||
|
|
| std::string model1_path = model_path + "/kokoro_part1.axmodel"; | ||
| std::string model2_path = model_path + "/kokoro_part2.axmodel"; | ||
| std::string model3_path = model_path + "/kokoro_part3.axmodel"; | ||
| std::string model4_path = model_path + "/kokoro_part4.onnx"; |
| int32_t style_dim0 = 510; // max_token_len | ||
| int32_t style_dim1 = 256; |
There was a problem hiding this comment.
There are several magic numbers used in this file (e.g., 510, 256 here, and others like 640, 50, 20, 5, 32 elsewhere). Using named constants would improve readability and maintainability. For instance, style_dim0 is commented as max_token_len, so you could use meta_data_.max_token_len which is initialized to 510 in LoadVoices. The value 256 should also be a named constant.
| for (int32_t j = 0; j < d_shape_2; ++j) { | ||
| for (int32_t k = 0; k < total_frames; ++k) { | ||
| float sum = 0.0f; | ||
| for (int32_t i = 0; i < max_seq_len_; ++i) { | ||
| sum += d[i * d_shape_2 + j] * pred_aln_trg[i * total_frames + k]; | ||
| } | ||
| en[j * total_frames + k] = sum; | ||
| } | ||
| } |
There was a problem hiding this comment.
| } // for (const auto &f : files) | ||
| } // if (!config.rule_fars.empty()) |
| std::unique_ptr<OfflineTtsImpl> OfflineTtsImpl::Create( | ||
| const OfflineTtsConfig &config) { | ||
| if (config.model.provider == "axcl") { | ||
| #if SHERPA_ONNX_ENABLE_AXCL |
There was a problem hiding this comment.
Please also update line 94.
If axcl does not support Android, please just print an error message and return nullptr.
| @@ -0,0 +1,37 @@ | |||
| // sherpa-onnx/csrc/axcl/offline-tts-kokoro-model-axcl.h | |||
| // | |||
| // Copyright (c) 2025 Xiaomi Corporation | |||
There was a problem hiding this comment.
Please update it with your info.
| void Init(const OfflineTtsModelConfig &config) { | ||
| std::string model_path = config.kokoro.model; | ||
|
|
||
| // kokoro axcl mode we assume config.kokoro.model is a directory containing models |
There was a problem hiding this comment.
Please update the kokoro model config validation function to validate thate the required files exist when axcl is used.
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
sherpa-onnx/csrc/offline-tts-impl.cc (1)
93-105:⚠️ Potential issue | 🟠 MajorMirror the AXCL provider branch in the manager overload.
Create(mgr, config)never checksconfig.model.provider, so on Line 104 it always buildsOfflineTtsKokoroModeleven though the non-manager overload routesprovider == "axcl"toOfflineTtsKokoroModelAxcl. The AXCL model already has a manager constructor, so the Android/OHOS resource-manager path is currently ignoring the AXCL backend entirely.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sherpa-onnx/csrc/offline-tts-impl.cc` around lines 93 - 105, The manager-overload OfflineTtsImpl::Create (template<Manager>) ignores config.model.provider when selecting the kokoro branch and always constructs OfflineTtsKokoroImpl<OfflineTtsKokoroModel>; update that branch to mirror the non-manager overload by checking config.model.provider (e.g., provider == "axcl") and construct OfflineTtsKokoroImpl<OfflineTtsKokoroModelAxcl> when provider is "axcl", otherwise construct OfflineTtsKokoroImpl<OfflineTtsKokoroModel>; ensure you use the existing manager-capable constructors for OfflineTtsKokoroImpl so the AXCL backend is reachable via the manager path.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-model-axcl.cc`:
- Around line 279-282: The SetInputTensorData calls (e.g.,
model1_->SetInputTensorData("input_ids", ...),
model1_->SetInputTensorData("ref_s", ...),
model1_->SetInputTensorData("text_mask", ...)) can return false on name/size
mismatch and must be checked; modify these blocks to short-circuit and abort
inference (return false / propagate error) if any call returns false, and apply
the identical check-and-abort pattern to the corresponding upload calls on
model2_ and model3_ (the groups at lines handling
model2_->SetInputTensorData(...) and model3_->SetInputTensorData(...)) so a
failed upload does not leave device buffers partially updated. Ensure you return
or propagate the error from the current function immediately when any
SetInputTensorData returns false.
- Around line 194-210: LoadVoices currently hardcodes meta_data_ fields and
infers num_speakers by a brittle division; instead parse the kokoro bundle
header from the start of voices_data to populate meta_data_. Specifically, in
LoadVoices read the bundle's metadata/header (sample_rate, version, voice
string, max_token_len and the per-speaker embedding length) and validate
voices_data_length against the expected float count to detect truncation, then
compute num_speakers from the validated float count and set
meta_data_.num_speakers/meta_data_.sample_rate/meta_data_.version/meta_data_.voice/meta_data_.max_token_len
accordingly before constructing styles_, and retain the existing use of
model1_->TensorShape("input_ids") to set max_seq_len_; update error handling to
bail or log if the header is malformed or the blob is too short.
- Around line 235-239: The code currently truncates input_ids when actual_len >
max_seq_len_, producing partial speech; instead, do not resize input_ids—either
(A) return a clear error/status to the caller when actual_len > max_seq_len_
(log via SHERPA_ONNX_LOGE including actual_len and max_seq_len_ and return an
appropriate failure code/object from the surrounding function), or (B) implement
chunking: split input_ids into contiguous chunks of size max_seq_len_, run the
existing TTS generation routine for each chunk and concatenate the resulting
waveforms before returning; update callers to handle the error/concatenated
output. Reference the symbols in the diff: input_ids, max_seq_len_, actual_len
(and the enclosing function in offline-tts-kokoro-model-axcl.cc) when making the
change.
---
Outside diff comments:
In `@sherpa-onnx/csrc/offline-tts-impl.cc`:
- Around line 93-105: The manager-overload OfflineTtsImpl::Create
(template<Manager>) ignores config.model.provider when selecting the kokoro
branch and always constructs OfflineTtsKokoroImpl<OfflineTtsKokoroModel>; update
that branch to mirror the non-manager overload by checking config.model.provider
(e.g., provider == "axcl") and construct
OfflineTtsKokoroImpl<OfflineTtsKokoroModelAxcl> when provider is "axcl",
otherwise construct OfflineTtsKokoroImpl<OfflineTtsKokoroModel>; ensure you use
the existing manager-capable constructors for OfflineTtsKokoroImpl so the AXCL
backend is reachable via the manager path.
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 8ddfb20e-5c39-4d30-b615-ca31ce07720d
📒 Files selected for processing (7)
sherpa-onnx/csrc/CMakeLists.txtsherpa-onnx/csrc/axcl/axcl-model.ccsherpa-onnx/csrc/axcl/axcl-model.hsherpa-onnx/csrc/axcl/offline-tts-kokoro-model-axcl.ccsherpa-onnx/csrc/axcl/offline-tts-kokoro-model-axcl.hsherpa-onnx/csrc/offline-tts-impl.ccsherpa-onnx/csrc/offline-tts-kokoro-impl.h
| void LoadVoices(const char *voices_data, size_t voices_data_length) { | ||
| // get metadata | ||
| meta_data_.sample_rate = 24000; | ||
| meta_data_.num_speakers = voices_data_length / (sizeof(float) * 256 * 510); | ||
| meta_data_.version = 1; | ||
| meta_data_.has_espeak = 1; | ||
| meta_data_.voice = "en-us"; | ||
| meta_data_.max_token_len = 510; | ||
|
|
||
| int32_t actual_num_floats = voices_data_length / sizeof(float); | ||
| styles_ = std::vector<float>( | ||
| reinterpret_cast<const float *>(voices_data), | ||
| reinterpret_cast<const float *>(voices_data) + actual_num_floats); | ||
|
|
||
| auto shape1 = model1_->TensorShape("input_ids"); | ||
| max_seq_len_ = shape1[1]; | ||
| } |
There was a problem hiding this comment.
Derive Kokoro metadata from the bundle instead of hardcoding it.
Lines 196-201 pin sample_rate, version, voice, and max_token_len, while Line 197 floor-divides the voices blob to infer num_speakers. OfflineTtsKokoroImpl::InitFrontend() branches on version and voice, so this AXCL path will mis-handle multilingual/v2 exports and also won't detect a truncated voices file.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-model-axcl.cc` around lines 194 -
210, LoadVoices currently hardcodes meta_data_ fields and infers num_speakers by
a brittle division; instead parse the kokoro bundle header from the start of
voices_data to populate meta_data_. Specifically, in LoadVoices read the
bundle's metadata/header (sample_rate, version, voice string, max_token_len and
the per-speaker embedding length) and validate voices_data_length against the
expected float count to detect truncation, then compute num_speakers from the
validated float count and set
meta_data_.num_speakers/meta_data_.sample_rate/meta_data_.version/meta_data_.voice/meta_data_.max_token_len
accordingly before constructing styles_, and retain the existing use of
model1_->TensorShape("input_ids") to set max_seq_len_; update error handling to
bail or log if the header is malformed or the blob is too short.
| if (actual_len > max_seq_len_) { | ||
| SHERPA_ONNX_LOGE("Input ids length %d exceeds max_seq_len %d, truncating.", | ||
| actual_len, max_seq_len_); | ||
| input_ids.resize(max_seq_len_); | ||
| actual_len = max_seq_len_; |
There was a problem hiding this comment.
Don’t silently drop over-length text.
Lines 236-239 resize input_ids to max_seq_len_, so the returned waveform omits the tail of the utterance. The existing ORT Kokoro path fails fast for this case; AXCL should either chunk the sentence or return an error instead of generating partial speech.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-model-axcl.cc` around lines 235 -
239, The code currently truncates input_ids when actual_len > max_seq_len_,
producing partial speech; instead, do not resize input_ids—either (A) return a
clear error/status to the caller when actual_len > max_seq_len_ (log via
SHERPA_ONNX_LOGE including actual_len and max_seq_len_ and return an appropriate
failure code/object from the surrounding function), or (B) implement chunking:
split input_ids into contiguous chunks of size max_seq_len_, run the existing
TTS generation routine for each chunk and concatenate the resulting waveforms
before returning; update callers to handle the error/concatenated output.
Reference the symbols in the diff: input_ids, max_seq_len_, actual_len (and the
enclosing function in offline-tts-kokoro-model-axcl.cc) when making the change.
| model1_->SetInputTensorData("input_ids", input_ids_i32.data(), | ||
| input_ids_i32.size()); | ||
| model1_->SetInputTensorData("ref_s", ref_s.data(), ref_s.size()); | ||
| model1_->SetInputTensorData("text_mask", text_mask_u8.data(), text_mask_u8.size()); |
There was a problem hiding this comment.
Abort when an AXCL input upload fails.
Each SetInputTensorData() call here can return false on a name or size mismatch, but the result is ignored and inference still continues. That can leave device buffers partially updated from a previous utterance and produce corrupted audio.
Minimal fix sketch
- model1_->SetInputTensorData("input_ids", input_ids_i32.data(),
- input_ids_i32.size());
- model1_->SetInputTensorData("ref_s", ref_s.data(), ref_s.size());
- model1_->SetInputTensorData("text_mask", text_mask_u8.data(), text_mask_u8.size());
+ if (!model1_->SetInputTensorData("input_ids", input_ids_i32.data(),
+ input_ids_i32.size()) ||
+ !model1_->SetInputTensorData("ref_s", ref_s.data(), ref_s.size()) ||
+ !model1_->SetInputTensorData("text_mask", text_mask_u8.data(),
+ text_mask_u8.size())) {
+ return false;
+ }Please apply the same short-circuit to the model2_ and model3_ uploads on Lines 312-316 and Lines 344-348.
Also applies to: 312-316, 344-348
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-model-axcl.cc` around lines 279 -
282, The SetInputTensorData calls (e.g.,
model1_->SetInputTensorData("input_ids", ...),
model1_->SetInputTensorData("ref_s", ...),
model1_->SetInputTensorData("text_mask", ...)) can return false on name/size
mismatch and must be checked; modify these blocks to short-circuit and abort
inference (return false / propagate error) if any call returns false, and apply
the identical check-and-abort pattern to the corresponding upload calls on
model2_ and model3_ (the groups at lines handling
model2_->SetInputTensorData(...) and model3_->SetInputTensorData(...)) so a
failed upload does not leave device buffers partially updated. Ensure you return
or propagate the error from the current function immediately when any
SetInputTensorData returns false.
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
sherpa-onnx/csrc/offline-tts-impl.cc (1)
92-110:⚠️ Potential issue | 🟠 MajorMirror the AXCL provider dispatch in the manager overload.
Create(const OfflineTtsConfig&)now special-casesprovider == "axcl", butCreate(Manager *, const OfflineTtsConfig &)still ignoresconfig.model.provider. OnAAssetManager/NativeResourceManagerbuilds, an AXCL Kokoro config will fall through toOfflineTtsKokoroImplinstead of the new AXCL path, or continue on platforms where AXCL should be rejected.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sherpa-onnx/csrc/offline-tts-impl.cc` around lines 92 - 110, The Manager overload of OfflineTtsImpl::Create currently ignores config.model.provider so AXCL configs misroute; update OfflineTtsImpl::Create(Manager*, const OfflineTtsConfig&) to mirror the provider == "axcl" branch from the other overload: check config.model.provider == "axcl" early and return the AXCL-specific implementation (e.g., std::make_unique<OfflineTtsAxclImpl>(mgr, config)) or otherwise reproduce the same accept/reject logic used by Create(const OfflineTtsConfig&), ensuring the new AXCL path is taken consistently for AAssetManager/NativeResourceManager builds.
🧹 Nitpick comments (1)
sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.h (1)
253-272: Collapse the fake batching path or implement real batching.This method logs that
max_num_sentencesis ignored, hard-codesbatch_size = 1, andProcess()still flattens every token list into a single sequence. The remainder block at Lines 313-331 is dead today, and the current API shape makes it look like multi-sentence batches are supported when they are not.Also applies to: 294-331, 389-418
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.h` around lines 253 - 272, The batching code is fake: config_.max_num_sentences is logged as ignored but batch_x, batch_size and Process() still assume/flatten a single sequence; either remove the dormant batching path or implement real batching. Fix by removing the misleading log and all batch_x/batch_size dead logic if you do not support multi-sentence batching, or implement real batching: honor config_.max_num_sentences when computing batch_size, split the input token vectors into multiple batches (populate batch_x accordingly), update Process() to accept and iterate over these batches instead of flattening every token list, and ensure any downstream code expecting single-sequence input is adjusted; touch symbols config_.max_num_sentences, batch_x, batch_size and Process() (and the surrounding blocks noted) so behavior and API match (either single-sentence only or true multi-sentence batching).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.h`:
- Around line 163-175: The branch that checks if (num_speakers == 0 && sid != 0)
logs that a non-zero sid is ignored but never actually normalizes it; update
that branch in offline-tts-kokoro-axcl-impl.h to set sid = 0 after logging so
callers (and subsequent code like Process()) can't use an out-of-contract
speaker id; locate the check using the symbols num_speakers and sid and add the
assignment to reset sid to zero inside that conditional (preserve the existing
platform-specific logging).
- Around line 75-83: The code calls fst::FarReader<fst::StdArc>::Open(f) and
then dereferences reader (reader->Done()) without checking for null; replace
this by guarding the result of FarReader::Open(f) (the local variable reader)
before the for-loop: if Open returned nullptr, emit a clear error (e.g., process
logger/error stream or throw std::runtime_error) indicating the archive could
not be opened and abort initialization, otherwise iterate with reader->Done(),
reader->Next(), use reader->GetFst() and construct kaldifst::TextNormalizer
instances to push into tn_list_ as before.
---
Duplicate comments:
In `@sherpa-onnx/csrc/offline-tts-impl.cc`:
- Around line 92-110: The Manager overload of OfflineTtsImpl::Create currently
ignores config.model.provider so AXCL configs misroute; update
OfflineTtsImpl::Create(Manager*, const OfflineTtsConfig&) to mirror the provider
== "axcl" branch from the other overload: check config.model.provider == "axcl"
early and return the AXCL-specific implementation (e.g.,
std::make_unique<OfflineTtsAxclImpl>(mgr, config)) or otherwise reproduce the
same accept/reject logic used by Create(const OfflineTtsConfig&), ensuring the
new AXCL path is taken consistently for AAssetManager/NativeResourceManager
builds.
---
Nitpick comments:
In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.h`:
- Around line 253-272: The batching code is fake: config_.max_num_sentences is
logged as ignored but batch_x, batch_size and Process() still assume/flatten a
single sequence; either remove the dormant batching path or implement real
batching. Fix by removing the misleading log and all batch_x/batch_size dead
logic if you do not support multi-sentence batching, or implement real batching:
honor config_.max_num_sentences when computing batch_size, split the input token
vectors into multiple batches (populate batch_x accordingly), update Process()
to accept and iterate over these batches instead of flattening every token list,
and ensure any downstream code expecting single-sequence input is adjusted;
touch symbols config_.max_num_sentences, batch_x, batch_size and Process() (and
the surrounding blocks noted) so behavior and API match (either single-sentence
only or true multi-sentence batching).
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 29f8db44-6f5c-4191-b9aa-8624a07c4575
📒 Files selected for processing (2)
sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.hsherpa-onnx/csrc/offline-tts-impl.cc
| std::unique_ptr<fst::FarReader<fst::StdArc>> reader( | ||
| fst::FarReader<fst::StdArc>::Open(f)); | ||
| for (; !reader->Done(); reader->Next()) { | ||
| std::unique_ptr<fst::StdConstFst> r( | ||
| fst::CastOrConvertToConstFst(reader->GetFst()->Copy())); | ||
|
|
||
| tn_list_.push_back( | ||
| std::make_unique<kaldifst::TextNormalizer>(std::move(r))); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's locate and examine the file
find . -name "offline-tts-kokoro-axcl-impl.h" -type fRepository: k2-fsa/sherpa-onnx
Length of output: 116
🏁 Script executed:
# Once we find it, let's look at the specific lines (75-83)
cat -n sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.h | sed -n '70,90p'Repository: k2-fsa/sherpa-onnx
Length of output: 852
🌐 Web query:
OpenFst FarReader::Open() return value NULL error handling
💡 Result:
fst::FarReader<Arc>::Open(...) is specified to return NULL on error. In practice (per far.h), it returns 0 when the input can’t be recognized/opened as an FAR in any supported format (STTable, STList, or single Fst).[1] The intended pattern is:
std::unique_ptr<fst::FarReader<fst::StdArc>> reader(
fst::FarReader<fst::StdArc>::Open("in.far"));
if (!reader) {
// Open failed: file missing/unreadable, wrong/unknown format, etc.
// (Diagnostics may have been logged to stderr via OpenFst logging.)
return 1;
}
// After Open succeeds, still check for read/iteration errors:
if (reader->Error()) return 1; // or check during/after iterationKey points for error handling:
- Always null-check the return; OpenFst explicitly treats “functions/methods that return
NULL” as non-fatal error signaling even whenFLAGS_fst_error_fatalis true.[2] - If
Open()succeeds, use the reader’sError()method to detect failures during reading/iteration.[3] - Make sure FAR support is actually built/linked (
--enable-far,libfstfar), otherwise your program may not be able to read FARs as expected.[4]
Sources:
[1] fst/extensions/far/far.h (FarReader::Open returns NULL when no format matches) (android.googlesource.com)
[2] OpenFst “Advanced Usage” – Error handling (NULL return as error signal; FLAGS_fst_error_fatal) (openfst.org)
[3] OpenFst doxygen – FarReader-derived classes expose Error() (openfst.org)
[4] OpenFst “Extensions” – FAR extension build/link notes (openfst.org)
Citations:
- 1: https://android.googlesource.com/platform/external/openfst/%2B/dfd8b83/src/include/fst/extensions/far/far.h
- 2: https://www.openfst.org/twiki/bin/view/FST/FstAdvancedUsage
- 3: https://openfst.org/doxygen/fst/html/classfst_1_1STListFarReader.html
- 4: https://www.openfst.org/twiki/bin/view/FST/FstExtensions?utm_source=openai
Guard FarReader::Open() before the loop.
reader is dereferenced immediately without null checking. OpenFst's FarReader<Arc>::Open() returns NULL when the archive cannot be opened (missing file, unrecognized format, etc.). Calling reader->Done() on line 77 will crash with a null pointer dereference instead of surfacing a config error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.h` around lines 75 - 83,
The code calls fst::FarReader<fst::StdArc>::Open(f) and then dereferences reader
(reader->Done()) without checking for null; replace this by guarding the result
of FarReader::Open(f) (the local variable reader) before the for-loop: if Open
returned nullptr, emit a clear error (e.g., process logger/error stream or throw
std::runtime_error) indicating the archive could not be opened and abort
initialization, otherwise iterate with reader->Done(), reader->Next(), use
reader->GetFst() and construct kaldifst::TextNormalizer instances to push into
tn_list_ as before.
| if (num_speakers == 0 && sid != 0) { | ||
| #if __OHOS__ | ||
| SHERPA_ONNX_LOGE( | ||
| "This is a single-speaker model and supports only sid 0. Given sid: " | ||
| "%{public}d. sid is ignored", | ||
| static_cast<int32_t>(sid)); | ||
| #else | ||
| SHERPA_ONNX_LOGE( | ||
| "This is a single-speaker model and supports only sid 0. Given sid: " | ||
| "%d. sid is ignored", | ||
| static_cast<int32_t>(sid)); | ||
| #endif | ||
| } |
There was a problem hiding this comment.
Reset sid in the single-speaker branch too.
The log says non-zero sid is ignored, but this branch never normalizes it. Since the base API allows single-speaker models to report num_speakers == 0, you can still pass an out-of-contract speaker id into Process().
Suggested fix
if (num_speakers == 0 && sid != 0) {
`#if` __OHOS__
SHERPA_ONNX_LOGE(
"This is a single-speaker model and supports only sid 0. Given sid: "
"%{public}d. sid is ignored",
static_cast<int32_t>(sid));
`#else`
SHERPA_ONNX_LOGE(
"This is a single-speaker model and supports only sid 0. Given sid: "
"%d. sid is ignored",
static_cast<int32_t>(sid));
`#endif`
+ sid = 0;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if (num_speakers == 0 && sid != 0) { | |
| #if __OHOS__ | |
| SHERPA_ONNX_LOGE( | |
| "This is a single-speaker model and supports only sid 0. Given sid: " | |
| "%{public}d. sid is ignored", | |
| static_cast<int32_t>(sid)); | |
| #else | |
| SHERPA_ONNX_LOGE( | |
| "This is a single-speaker model and supports only sid 0. Given sid: " | |
| "%d. sid is ignored", | |
| static_cast<int32_t>(sid)); | |
| #endif | |
| } | |
| if (num_speakers == 0 && sid != 0) { | |
| `#if` __OHOS__ | |
| SHERPA_ONNX_LOGE( | |
| "This is a single-speaker model and supports only sid 0. Given sid: " | |
| "%{public}d. sid is ignored", | |
| static_cast<int32_t>(sid)); | |
| `#else` | |
| SHERPA_ONNX_LOGE( | |
| "This is a single-speaker model and supports only sid 0. Given sid: " | |
| "%d. sid is ignored", | |
| static_cast<int32_t>(sid)); | |
| `#endif` | |
| sid = 0; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.h` around lines 163 - 175,
The branch that checks if (num_speakers == 0 && sid != 0) logs that a non-zero
sid is ignored but never actually normalizes it; update that branch in
offline-tts-kokoro-axcl-impl.h to set sid = 0 after logging so callers (and
subsequent code like Process()) can't use an out-of-contract speaker id; locate
the check using the symbols num_speakers and sid and add the assignment to reset
sid to zero inside that conditional (preserve the existing platform-specific
logging).
| @@ -0,0 +1,450 @@ | |||
| // sherpa-onnx/csrc/axcl/offline-tts-kokoro-axcl-impl.h | |||
| // | |||
| // Copyright (c) 2025 Xiaomi Corporation | |||
There was a problem hiding this comment.
Please update it with your info.
| Ort::MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeDefault); | ||
|
|
||
| std::array<int64_t, 2> x_shape = {1, static_cast<int32_t>(x.size())}; | ||
| Ort::Value x_tensor = Ort::Value::CreateTensor( |
There was a problem hiding this comment.
Please remove everything related to onnxrutnime.
Summary by CodeRabbit
Release Notes