Skip to content

[WIP] add CTC prefix beam search / hotwords #1439

Open
pkufool wants to merge 8 commits intok2-fsa:masterfrom
pkufool:ctc-prefix-beam-search
Open

[WIP] add CTC prefix beam search / hotwords #1439
pkufool wants to merge 8 commits intok2-fsa:masterfrom
pkufool:ctc-prefix-beam-search

Conversation

@pkufool
Copy link
Copy Markdown
Contributor

@pkufool pkufool commented Oct 17, 2024

This PR implements the core part (c++/python) of CTC prefix beam search related decoding methods, including hotwords and rnnlm shallow fussion.

  • offline prefix beam search
  • offline hotwords
  • online prefix beam search
  • online hotwords

BTW we release our recent progress on CTC models, see https://arxiv.org/pdf/2410.05101 for details.

Summary by CodeRabbit

  • New Features

    • Added CTC prefix beam-search decoding (offline and online) with configurable max_active_paths and blank handling.
    • Hotword/context support for prefix beam search (file-based and per-stream), including BPE hotword encoding and per-call hotword APIs.
  • Documentation

    • CLI, C API docs and Python factory methods updated to list prefix_beam_search and expose max_active_paths, hotwords, and related options.

@fuyanzhe
Copy link
Copy Markdown

请问这个request计划什么时候合并

@dohe0342
Copy link
Copy Markdown

Hello. I trained CR-CTC model and decoded streaming CTC model and got token repetition (ex. ref: 안녕하세요 / hyp: 안녕녕하세요)

So, I really need online prefix beam search.... Do you have any plans to release online ctc prefix beam search? Thank you!

@pkufool pkufool force-pushed the ctc-prefix-beam-search branch from af07894 to a2b64a2 Compare April 8, 2026 11:15
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Apr 8, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 8, 2026

📝 Walkthrough

Walkthrough

Adds offline and online CTC prefix-beam-search decoders, extends decoder interfaces to accept optional stream contexts, adds CTC-aware hypothesis scoring (blank/non-blank probs), and wires hotword / ContextGraph support into offline and online recognizers and example CLIs.

Changes

Cohort / File(s) Summary
Build Integration
sherpa-onnx/csrc/CMakeLists.txt
Added new source files for prefix-beam-search decoders to sherpa-onnx-core.
Hypothesis Scoring API
sherpa-onnx/csrc/hypothesis.h, sherpa-onnx/csrc/hypothesis.cc
Added CTC fields log_prob_b/log_prob_nb, new LogProb(bool use_ctc)/TotalLogProb(bool), and updated Hypotheses::Add, GetMostProbable, GetTopK to accept use_ctc.
Decoder Interface & Overrides
sherpa-onnx/csrc/offline-ctc-decoder.h, sherpa-onnx/csrc/offline-ctc-fst-decoder.*, sherpa-onnx/csrc/offline-ctc-greedy-search-decoder.*
Extended pure virtual Decode to accept OfflineStream **ss = nullptr, int32_t n = 0; updated FST and greedy overrides to match.
Offline Prefix Beam Search
sherpa-onnx/csrc/offline-ctc-prefix-beam-search-decoder.h, sherpa-onnx/csrc/offline-ctc-prefix-beam-search-decoder.cc
New OfflineCtcPrefixBeamSearchDecoder with per-step StepWorker, CTC blank/non-blank bookkeeping, optional ContextGraph LM scoring, batch/time handling, and result conversion.
Online Prefix Beam Search
sherpa-onnx/csrc/online-ctc-prefix-beam-search-decoder.h, sherpa-onnx/csrc/online-ctc-prefix-beam-search-decoder.cc
New OnlineCtcPrefixBeamSearchDecoder implementing online batched decode, hypothesis restore/extend logic, context state advancement, and result updates.
Online/Offline Recognizer Integration
sherpa-onnx/csrc/offline-recognizer-ctc-impl.h, sherpa-onnx/csrc/offline-recognizer.cc, sherpa-onnx/csrc/online-recognizer-ctc-impl.h, sherpa-onnx/csrc/online-recognizer.cc
Added prefix_beam_search support; construct prefix-decoders via max_active_paths; added BPE encoder, hotword init, CreateStream(hotwords) overloads; pass stream array into decoder calls; validation rules updated.
Online CTC Result Type
sherpa-onnx/csrc/online-ctc-decoder.h
Extended OnlineCtcDecoderResult with Hypotheses hyps to store prefix-beam hypotheses.
Greedy/FST unchanged logic
sherpa-onnx/csrc/offline-ctc-fst-decoder.*, sherpa-onnx/csrc/offline-ctc-greedy-search-decoder.*
Signatures updated to accept optional streams; internal decoding logic unchanged in shown diffs.
Python API & Examples
sherpa-onnx/python/sherpa_onnx/offline_recognizer.py, sherpa-onnx/python/sherpa_onnx/online_recognizer.py, python-api-examples/*, c-api-examples/decode-file-c-api.c
Added max_active_paths to many factory methods, propagated decoding_method support for prefix_beam_search, forwarded hotword/BPE args, and updated CLI/help texts.
Other headers / includes
sherpa-onnx/csrc/offline-ctc-decoder.h, sherpa-onnx/csrc/online-ctc-decoder.h, ...hypothesis.h
Added includes and small public API docs/comments adjustments to support new types and options.

Sequence Diagram

sequenceDiagram
    participant Client
    participant Recognizer as Recognizer\n(Offline/Online)
    participant Decoder as CTCPrefixBeamDecoder
    participant StepWorker
    participant ContextGraph
    participant Hypotheses

    Client->>Recognizer: decode(log_probs, log_probs_length, hotwords?)
    Recognizer->>Decoder: Decode(..., ss[], n)
    loop per time frame (t)
        Decoder->>StepWorker: top-k candidates from frame t
        loop per hypothesis
            StepWorker->>Hypotheses: evaluate blank/repeat/extend cases
            alt ContextGraph present
                StepWorker->>ContextGraph: ForwardOneStep(token, ctx_state)
                ContextGraph-->>StepWorker: lm_logprob, new_ctx_state
                StepWorker->>Hypotheses: add/update hypothesis (update lm and ctx_state)
            else
                StepWorker->>Hypotheses: add/update hypothesis (CTC probs)
            end
        end
        Decoder->>Hypotheses: prune/select active paths (max_active_paths)
    end
    Decoder->>Hypotheses: GetMostProbable(use_ctc=true)
    Hypotheses-->>Decoder: best hypothesis
    Decoder-->>Recognizer: results
    Recognizer-->>Client: return results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

size:XL

Suggested reviewers

  • csukuangfj

Poem

🐰
I hopped through frames and counted beams,
Blank and token split in twos and dreams,
Context nudged my tiny path anew,
Hotwords hummed and scores came through—
A rabbit claps: prefix beams, hooray!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.49% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title '[WIP] add CTC prefix beam search / hotwords' directly describes the main changes: implementing CTC prefix beam search decoding with hotword support for both offline and online recognition.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
sherpa-onnx/csrc/offline-recognizer.cc (1)

84-89: ⚠️ Potential issue | 🟡 Minor

Update the hotwords validation error message to match the new allowed method.

Line [87]-Line [89] still says only modified_beam_search is valid, but Line [84]-Line [85] now also allows prefix_beam_search. This will mislead users during config errors.

Proposed fix
   if (!hotwords_file.empty() && (decoding_method != "modified_beam_search" &&
                                  decoding_method != "prefix_beam_search")) {
     SHERPA_ONNX_LOGE(
-        "Please use --decoding-method=modified_beam_search if you"
+        "Please use --decoding-method=modified_beam_search or "
+        "--decoding-method=prefix_beam_search if you"
         " provide --hotwords-file. Given --decoding-method='%s'",
         decoding_method.c_str());
     return false;
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/offline-recognizer.cc` around lines 84 - 89, The error
message for hotwords validation is out of sync with the allowed decoding
methods: update the SHERPA_ONNX_LOGE message in offline-recognizer.cc (the block
that checks hotwords_file and decoding_method) to mention both allowed methods
("modified_beam_search" and "prefix_beam_search") and include the actual
provided decoding_method via decoding_method.c_str() as it already does; adjust
the wording so it instructs users to use either
--decoding-method=modified_beam_search or --decoding-method=prefix_beam_search
when supplying --hotwords-file.
🧹 Nitpick comments (3)
sherpa-onnx/csrc/hypothesis.h (1)

53-57: Minor inconsistency: Using float infinity for double variable.

log_prob_nb is declared as double but initialized with std::numeric_limits<float>::infinity(). While this works (float infinity converts to double infinity), using std::numeric_limits<double>::infinity() would be more consistent with the type declaration.

♻️ Suggested fix for type consistency
   // The total score of ys which ends with non blank token in log space
-  double log_prob_nb = -std::numeric_limits<float>::infinity();
+  double log_prob_nb = -std::numeric_limits<double>::infinity();
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/hypothesis.h` around lines 53 - 57, The variable log_prob_nb
in hypothesis.h is declared as double but initialized using
std::numeric_limits<float>::infinity(); change the initializer to use
std::numeric_limits<double>::infinity() to match the declared type (locate the
log_prob_nb declaration and replace the float infinity with double infinity).
sherpa-onnx/csrc/offline-ctc-prefix-beam-search-decoder.cc (1)

113-117: Consider removing or gating the commented debug code.

Debug logging code is commented out. Consider removing it before merging or wrapping it in a debug preprocessor macro if needed for future development.

🧹 Suggested cleanup
        cur[b] = StepWorker(p_log_probs, cur[b], blank_id_, vocab_size,
                            max_active_paths_, context_graphs[b].get());
-        // for (auto &x : cur[b]) {
-        //   SHERPA_ONNX_LOGE("step : %d, key : %s, ac : %f, lm : %f", t,
-        //                    x.Key().c_str(), x.LogProb(true), x.lm_log_prob);
-        // }
-        // SHERPA_ONNX_LOGE("\n");
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/offline-ctc-prefix-beam-search-decoder.cc` around lines 113
- 117, Remove or gate the commented-out debug logging in
offline-ctc-prefix-beam-search-decoder.cc: either delete the block that iterates
over cur[b] and calls SHERPA_ONNX_LOGE for Key(), LogProb(true) and lm_log_prob,
or wrap it with a compile-time macro or runtime log-level check (e.g.,
DEBUG_BEAM_SEARCH or use the existing logging verbosity) so the snippet around
cur[b], t, x.Key(), x.LogProb(true), and x.lm_log_prob is only included when
debug logging is enabled.
sherpa-onnx/csrc/offline-recognizer-ctc-impl.h (1)

399-417: Use SHERPA_ONNX_EXIT(-1) for consistency with the rest of the codebase.

Line 406 uses exit(-1) directly, while line 230 uses SHERPA_ONNX_EXIT(-1). The macro is used consistently elsewhere for controlled exit handling.

♻️ Suggested fix
    if (!is) {
      SHERPA_ONNX_LOGE("Open hotwords file failed: %s",
                       config_.hotwords_file.c_str());
-      exit(-1);
+      SHERPA_ONNX_EXIT(-1);
    }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/offline-recognizer-ctc-impl.h` around lines 399 - 417, In
InitHotwords(), replace the direct call to exit(-1) with the project exit macro
for consistency: call SHERPA_ONNX_EXIT(-1) instead of exit(-1) in the error
branch that detects failure to open config_.hotwords_file; update the code
around the if (!is) block in InitHotwords to use SHERPA_ONNX_EXIT so it matches
other exits (e.g., the one referenced near line 230) and preserves controlled
shutdown behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sherpa-onnx/csrc/offline-recognizer-ctc-impl.h`:
- Around line 419-442: The istringstream validity check in InitHotwords is dead
code; instead verify ReadFile succeeded by checking the returned buffer (auto
buf from ReadFile) before constructing std::istringstream. If buf is empty or
ReadFile indicates failure, log the error using SHERPA_ONNX_LOGE with
config_.hotwords_file and call SHERPA_ONNX_EXIT(-1). Only after confirming buf
contains data, create std::istringstream, then call EncodeHotwords(modeling_unit
via config_.model_config.modeling_unit, symbol_table_, bpe_encoder_.get(),
&hotwords_, &boost_scores_) and proceed to create hotwords_graph_ as currently
done.

---

Outside diff comments:
In `@sherpa-onnx/csrc/offline-recognizer.cc`:
- Around line 84-89: The error message for hotwords validation is out of sync
with the allowed decoding methods: update the SHERPA_ONNX_LOGE message in
offline-recognizer.cc (the block that checks hotwords_file and decoding_method)
to mention both allowed methods ("modified_beam_search" and
"prefix_beam_search") and include the actual provided decoding_method via
decoding_method.c_str() as it already does; adjust the wording so it instructs
users to use either --decoding-method=modified_beam_search or
--decoding-method=prefix_beam_search when supplying --hotwords-file.

---

Nitpick comments:
In `@sherpa-onnx/csrc/hypothesis.h`:
- Around line 53-57: The variable log_prob_nb in hypothesis.h is declared as
double but initialized using std::numeric_limits<float>::infinity(); change the
initializer to use std::numeric_limits<double>::infinity() to match the declared
type (locate the log_prob_nb declaration and replace the float infinity with
double infinity).

In `@sherpa-onnx/csrc/offline-ctc-prefix-beam-search-decoder.cc`:
- Around line 113-117: Remove or gate the commented-out debug logging in
offline-ctc-prefix-beam-search-decoder.cc: either delete the block that iterates
over cur[b] and calls SHERPA_ONNX_LOGE for Key(), LogProb(true) and lm_log_prob,
or wrap it with a compile-time macro or runtime log-level check (e.g.,
DEBUG_BEAM_SEARCH or use the existing logging verbosity) so the snippet around
cur[b], t, x.Key(), x.LogProb(true), and x.lm_log_prob is only included when
debug logging is enabled.

In `@sherpa-onnx/csrc/offline-recognizer-ctc-impl.h`:
- Around line 399-417: In InitHotwords(), replace the direct call to exit(-1)
with the project exit macro for consistency: call SHERPA_ONNX_EXIT(-1) instead
of exit(-1) in the error branch that detects failure to open
config_.hotwords_file; update the code around the if (!is) block in InitHotwords
to use SHERPA_ONNX_EXIT so it matches other exits (e.g., the one referenced near
line 230) and preserves controlled shutdown behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bfba1fc1-640a-4181-b394-42ab0c71a974

📥 Commits

Reviewing files that changed from the base of the PR and between 07c119a and a2b64a2.

📒 Files selected for processing (12)
  • sherpa-onnx/csrc/CMakeLists.txt
  • sherpa-onnx/csrc/hypothesis.cc
  • sherpa-onnx/csrc/hypothesis.h
  • sherpa-onnx/csrc/offline-ctc-decoder.h
  • sherpa-onnx/csrc/offline-ctc-fst-decoder.cc
  • sherpa-onnx/csrc/offline-ctc-fst-decoder.h
  • sherpa-onnx/csrc/offline-ctc-greedy-search-decoder.cc
  • sherpa-onnx/csrc/offline-ctc-greedy-search-decoder.h
  • sherpa-onnx/csrc/offline-ctc-prefix-beam-search-decoder.cc
  • sherpa-onnx/csrc/offline-ctc-prefix-beam-search-decoder.h
  • sherpa-onnx/csrc/offline-recognizer-ctc-impl.h
  • sherpa-onnx/csrc/offline-recognizer.cc

Comment on lines +419 to +442
#if __ANDROID_API__ >= 9
void InitHotwords(AAssetManager *mgr) {
// each line in hotwords_file contains space-separated words

auto buf = ReadFile(mgr, config_.hotwords_file);

std::istringstream is(std::string(buf.begin(), buf.end()));

if (!is) {
SHERPA_ONNX_LOGE("Open hotwords file failed: %s",
config_.hotwords_file.c_str());
exit(-1);
}

if (!EncodeHotwords(is, config_.model_config.modeling_unit, symbol_table_,
bpe_encoder_.get(), &hotwords_, &boost_scores_)) {
SHERPA_ONNX_LOGE(
"Failed to encode some hotwords, skip them already, see logs above "
"for details.");
}
hotwords_graph_ = std::make_shared<ContextGraph>(
hotwords_, config_.hotwords_score, boost_scores_);
}
#endif
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Dead code: istringstream state check is always true after construction.

The !is check on line 427 is unreachable. An istringstream constructed from a string is always in a valid state. If ReadFile fails to read the file, you should check the return value of ReadFile instead.

Also, use SHERPA_ONNX_EXIT(-1) for consistency.

🐛 Suggested fix
 `#if` __ANDROID_API__ >= 9
   void InitHotwords(AAssetManager *mgr) {
     // each line in hotwords_file contains space-separated words
 
     auto buf = ReadFile(mgr, config_.hotwords_file);
-
-    std::istringstream is(std::string(buf.begin(), buf.end()));
-
-    if (!is) {
+    if (buf.empty()) {
       SHERPA_ONNX_LOGE("Open hotwords file failed: %s",
                        config_.hotwords_file.c_str());
-      exit(-1);
+      SHERPA_ONNX_EXIT(-1);
     }
+
+    std::istringstream is(std::string(buf.begin(), buf.end()));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#if __ANDROID_API__ >= 9
void InitHotwords(AAssetManager *mgr) {
// each line in hotwords_file contains space-separated words
auto buf = ReadFile(mgr, config_.hotwords_file);
std::istringstream is(std::string(buf.begin(), buf.end()));
if (!is) {
SHERPA_ONNX_LOGE("Open hotwords file failed: %s",
config_.hotwords_file.c_str());
exit(-1);
}
if (!EncodeHotwords(is, config_.model_config.modeling_unit, symbol_table_,
bpe_encoder_.get(), &hotwords_, &boost_scores_)) {
SHERPA_ONNX_LOGE(
"Failed to encode some hotwords, skip them already, see logs above "
"for details.");
}
hotwords_graph_ = std::make_shared<ContextGraph>(
hotwords_, config_.hotwords_score, boost_scores_);
}
#endif
`#if` __ANDROID_API__ >= 9
void InitHotwords(AAssetManager *mgr) {
// each line in hotwords_file contains space-separated words
auto buf = ReadFile(mgr, config_.hotwords_file);
if (buf.empty()) {
SHERPA_ONNX_LOGE("Open hotwords file failed: %s",
config_.hotwords_file.c_str());
SHERPA_ONNX_EXIT(-1);
}
std::istringstream is(std::string(buf.begin(), buf.end()));
if (!EncodeHotwords(is, config_.model_config.modeling_unit, symbol_table_,
bpe_encoder_.get(), &hotwords_, &boost_scores_)) {
SHERPA_ONNX_LOGE(
"Failed to encode some hotwords, skip them already, see logs above "
"for details.");
}
hotwords_graph_ = std::make_shared<ContextGraph>(
hotwords_, config_.hotwords_score, boost_scores_);
}
`#endif`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/offline-recognizer-ctc-impl.h` around lines 419 - 442, The
istringstream validity check in InitHotwords is dead code; instead verify
ReadFile succeeded by checking the returned buffer (auto buf from ReadFile)
before constructing std::istringstream. If buf is empty or ReadFile indicates
failure, log the error using SHERPA_ONNX_LOGE with config_.hotwords_file and
call SHERPA_ONNX_EXIT(-1). Only after confirming buf contains data, create
std::istringstream, then call EncodeHotwords(modeling_unit via
config_.model_config.modeling_unit, symbol_table_, bpe_encoder_.get(),
&hotwords_, &boost_scores_) and proceed to create hotwords_graph_ as currently
done.

Copilot AI and others added 3 commits April 8, 2026 17:48
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
sherpa-onnx/python/sherpa_onnx/offline_recognizer.py (1)

1741-1806: Changes look good; consider fixing pre-existing docstring URL.

The max_active_paths addition follows the same correct pattern. However, note that the docstring references whisper models documentation (line 1752) instead of wenet models. This is a pre-existing issue, not introduced by this PR, but could be addressed as a quick fix while you're here.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/python/sherpa_onnx/offline_recognizer.py` around lines 1741 -
1806, The docstring in offline_recognizer.py (the classmethod that constructs
model_config with OfflineWenetCtcModelConfig / OfflineRecognizerConfig)
incorrectly references the Whisper pretrained models URL and wording; update
that docstring to point to the appropriate WeNet/Sherpa ONNX wenet pretrained
models page and replace occurrences of "whisper" with "wenet" (or the correct
model family name) so the link and description match the wenet_ctc model being
constructed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@sherpa-onnx/python/sherpa_onnx/offline_recognizer.py`:
- Around line 1741-1806: The docstring in offline_recognizer.py (the classmethod
that constructs model_config with OfflineWenetCtcModelConfig /
OfflineRecognizerConfig) incorrectly references the Whisper pretrained models
URL and wording; update that docstring to point to the appropriate WeNet/Sherpa
ONNX wenet pretrained models page and replace occurrences of "whisper" with
"wenet" (or the correct model family name) so the link and description match the
wenet_ctc model being constructed.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3111ab47-b381-425a-85c6-dacb477c9a34

📥 Commits

Reviewing files that changed from the base of the PR and between a2b64a2 and 3408053.

📒 Files selected for processing (2)
  • python-api-examples/offline-decode-files.py
  • sherpa-onnx/python/sherpa_onnx/offline_recognizer.py
✅ Files skipped from review due to trivial changes (1)
  • python-api-examples/offline-decode-files.py

@pkufool pkufool changed the title [WIP] add CTC prefix beam search / hotwords / shallow fussion [WIP] add CTC prefix beam search / hotwords Apr 10, 2026
@dosubot dosubot Bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels Apr 10, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
python-api-examples/online-decode-files.py (1)

171-178: ⚠️ Potential issue | 🟡 Minor

Validate prefix_beam_search against the selected model.

This option is now advertised globally, but the transducer branch still passes args.decoding_method through unchanged. --encoder/--decoder/--joiner --decoding-method=prefix_beam_search will fail late instead of being rejected up front in the example script.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@python-api-examples/online-decode-files.py` around lines 171 - 178, The
example script currently accepts "--decoding-method=prefix_beam_search" globally
but does not validate it for transducer models; update the argument validation
after parsing (using parser/args) to detect when a transducer model is being
used (e.g., presence of args.encoder/args.decoder/args.joiner or whatever branch
handles the transducer) and raise a clear error or exit if args.decoding_method
== "prefix_beam_search" while the transducer branch will be used; ensure other
branches (CTC, non-transducer) still allow prefix_beam_search and keep the
transducer branch logic that previously passed args.decoding_method unchanged
except for this upfront validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@sherpa-onnx/csrc/online-ctc-prefix-beam-search-decoder.cc`:
- Around line 50-59: When expanding prefixes where code pushes new_token into
new_hyp.ys (e.g., in the branches that set new_hyp.log_prob_nb and
new_hyp.log_prob_b), also push the corresponding frame index into
new_hyp.timestamps (use the current frame/time variable used in Decode, e.g., t
or frame_idx); likewise ensure other similar blocks (the ones around the other
push_back sites and the ones noted at 58-63 and 138-140) update timestamps in
lockstep with ys. Finally, when finishing Decode(), copy best_hyp.timestamps
into r.timestamps so the returned Result preserves the token-to-timestamp
correspondence (ensuring tokens.size() == timestamps.size()).
- Around line 24-78: The loop creates new Hypothesis objects and updates
log_prob_b/log_prob_nb but calls next_hyps.Add(...) which uses the default
(non-CTC) merge behavior; change those Add calls to the CTC-aware merge variant
(e.g., Hypotheses::AddCTC or the Add overload that enables CTC merging) so
identical token sequences (same hyp.ys) are merged using CTC rules: combine
log_prob_b and log_prob_nb via log-sum-exp, preserve/update num_trailing_blanks
correctly, and carry the proper lm/context_state; update every place calling
next_hyps.Add(std::move(new_hyp)) in this function (blank case, same-token case,
and update_prefix branch) to use the CTC-aware add so prefix merging is correct.

In `@sherpa-onnx/csrc/online-recognizer.cc`:
- Around line 156-163: The current validation accepts --hotwords-file with
--decoding-method=prefix_beam_search even when a CTC graph is configured,
causing hotword biasing to be ignored; update the check that currently inspects
hotwords_file and decoding_method to also reject when
ctc_fst_decoder_config.graph is set. Specifically, in the validation branch that
looks at hotwords_file and decoding_method (symbols: hotwords_file,
decoding_method, "modified_beam_search", "prefix_beam_search"), add a condition
to fail (log error and return false) if ctc_fst_decoder_config.graph is
non-empty (symbol: ctc_fst_decoder_config.graph) so that supplying hotwords with
a configured CTC FST graph is rejected up front. Ensure the error message
mentions both hotwords_file and the presence of a CTC graph to make the conflict
clear.

---

Outside diff comments:
In `@python-api-examples/online-decode-files.py`:
- Around line 171-178: The example script currently accepts
"--decoding-method=prefix_beam_search" globally but does not validate it for
transducer models; update the argument validation after parsing (using
parser/args) to detect when a transducer model is being used (e.g., presence of
args.encoder/args.decoder/args.joiner or whatever branch handles the transducer)
and raise a clear error or exit if args.decoding_method == "prefix_beam_search"
while the transducer branch will be used; ensure other branches (CTC,
non-transducer) still allow prefix_beam_search and keep the transducer branch
logic that previously passed args.decoding_method unchanged except for this
upfront validation.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f929b52d-b6fe-4ee9-8ca3-99fecb1cbce5

📥 Commits

Reviewing files that changed from the base of the PR and between 3408053 and 8b75d4c.

📒 Files selected for processing (10)
  • c-api-examples/decode-file-c-api.c
  • python-api-examples/online-decode-files.py
  • sherpa-onnx/c-api/c-api.h
  • sherpa-onnx/csrc/CMakeLists.txt
  • sherpa-onnx/csrc/online-ctc-decoder.h
  • sherpa-onnx/csrc/online-ctc-prefix-beam-search-decoder.cc
  • sherpa-onnx/csrc/online-ctc-prefix-beam-search-decoder.h
  • sherpa-onnx/csrc/online-recognizer-ctc-impl.h
  • sherpa-onnx/csrc/online-recognizer.cc
  • sherpa-onnx/python/sherpa_onnx/online_recognizer.py
✅ Files skipped from review due to trivial changes (2)
  • c-api-examples/decode-file-c-api.c
  • sherpa-onnx/c-api/c-api.h
🚧 Files skipped from review as they are similar to previous changes (1)
  • sherpa-onnx/csrc/CMakeLists.txt

Comment on lines +24 to +78
Hypotheses next_hyps;
for (auto &hyp : hyps) {
for (auto k : topk) {
Hypothesis new_hyp = hyp;
int32_t new_token = k;
float log_prob = p_log_probs[k];
bool update_prefix = false;
if (new_token == blank_id) {
// Case 0: *a + ε => *a
// *aε + ε => *a
// Prefix does not change, update log_prob of blank
new_hyp.log_prob_nb = -std::numeric_limits<float>::infinity();
new_hyp.log_prob_b = hyp.LogProb(true) + log_prob;
new_hyp.num_trailing_blanks = hyp.num_trailing_blanks + 1;
next_hyps.Add(std::move(new_hyp));
} else if (hyp.ys.size() > 0 && hyp.ys.back() == new_token) {
// Case 1: *a + a => *a
// Prefix does not change, update log_prob of non_blank
new_hyp.log_prob_nb = hyp.log_prob_nb + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
next_hyps.Add(std::move(new_hyp));

// Case 2: *aε + a => *aa
// Prefix changes, update log_prob of blank
new_hyp = hyp;
new_hyp.ys.push_back(new_token);
new_hyp.log_prob_nb = hyp.log_prob_b + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
update_prefix = true;
} else {
// Case 3: *a + b => *ab, *aε + b => *ab
// Prefix changes, update log_prob of non_blank
new_hyp.ys.push_back(new_token);
new_hyp.log_prob_nb = hyp.LogProb(true) + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
update_prefix = true;
}

if (update_prefix) {
float lm_log_prob = hyp.lm_log_prob;
if (context_graph != nullptr && hyp.context_state != nullptr) {
auto context_res =
context_graph->ForwardOneStep(hyp.context_state, new_token);
lm_log_prob = lm_log_prob + std::get<0>(context_res);
new_hyp.context_state = std::get<1>(context_res);
}
new_hyp.lm_log_prob = lm_log_prob;
next_hyps.Add(std::move(new_hyp));
}
}
}
return next_hyps.GetTopK(max_active_paths, false, true);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Use CTC-aware hypothesis merging here.

StepWorker() updates log_prob_b/log_prob_nb, but every next_hyps.Add(...) call still uses the default non-CTC merge path. That breaks prefix merging whenever the same token sequence is reached through different blank/non-blank transitions.

Suggested fix
-        next_hyps.Add(std::move(new_hyp));
+        next_hyps.Add(std::move(new_hyp), /*use_ctc=*/true);
...
-        next_hyps.Add(std::move(new_hyp));
+        next_hyps.Add(std::move(new_hyp), /*use_ctc=*/true);
...
-        next_hyps.Add(std::move(new_hyp));
+        next_hyps.Add(std::move(new_hyp), /*use_ctc=*/true);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Hypotheses next_hyps;
for (auto &hyp : hyps) {
for (auto k : topk) {
Hypothesis new_hyp = hyp;
int32_t new_token = k;
float log_prob = p_log_probs[k];
bool update_prefix = false;
if (new_token == blank_id) {
// Case 0: *a + ε => *a
// *aε + ε => *a
// Prefix does not change, update log_prob of blank
new_hyp.log_prob_nb = -std::numeric_limits<float>::infinity();
new_hyp.log_prob_b = hyp.LogProb(true) + log_prob;
new_hyp.num_trailing_blanks = hyp.num_trailing_blanks + 1;
next_hyps.Add(std::move(new_hyp));
} else if (hyp.ys.size() > 0 && hyp.ys.back() == new_token) {
// Case 1: *a + a => *a
// Prefix does not change, update log_prob of non_blank
new_hyp.log_prob_nb = hyp.log_prob_nb + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
next_hyps.Add(std::move(new_hyp));
// Case 2: *aε + a => *aa
// Prefix changes, update log_prob of blank
new_hyp = hyp;
new_hyp.ys.push_back(new_token);
new_hyp.log_prob_nb = hyp.log_prob_b + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
update_prefix = true;
} else {
// Case 3: *a + b => *ab, *aε + b => *ab
// Prefix changes, update log_prob of non_blank
new_hyp.ys.push_back(new_token);
new_hyp.log_prob_nb = hyp.LogProb(true) + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
update_prefix = true;
}
if (update_prefix) {
float lm_log_prob = hyp.lm_log_prob;
if (context_graph != nullptr && hyp.context_state != nullptr) {
auto context_res =
context_graph->ForwardOneStep(hyp.context_state, new_token);
lm_log_prob = lm_log_prob + std::get<0>(context_res);
new_hyp.context_state = std::get<1>(context_res);
}
new_hyp.lm_log_prob = lm_log_prob;
next_hyps.Add(std::move(new_hyp));
}
}
}
return next_hyps.GetTopK(max_active_paths, false, true);
Hypotheses next_hyps;
for (auto &hyp : hyps) {
for (auto k : topk) {
Hypothesis new_hyp = hyp;
int32_t new_token = k;
float log_prob = p_log_probs[k];
bool update_prefix = false;
if (new_token == blank_id) {
// Case 0: *a + ε => *a
// *aε + ε => *a
// Prefix does not change, update log_prob of blank
new_hyp.log_prob_nb = -std::numeric_limits<float>::infinity();
new_hyp.log_prob_b = hyp.LogProb(true) + log_prob;
new_hyp.num_trailing_blanks = hyp.num_trailing_blanks + 1;
next_hyps.Add(std::move(new_hyp), /*use_ctc=*/true);
} else if (hyp.ys.size() > 0 && hyp.ys.back() == new_token) {
// Case 1: *a + a => *a
// Prefix does not change, update log_prob of non_blank
new_hyp.log_prob_nb = hyp.log_prob_nb + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
next_hyps.Add(std::move(new_hyp), /*use_ctc=*/true);
// Case 2: *aε + a => *aa
// Prefix changes, update log_prob of blank
new_hyp = hyp;
new_hyp.ys.push_back(new_token);
new_hyp.log_prob_nb = hyp.log_prob_b + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
update_prefix = true;
} else {
// Case 3: *a + b => *ab, *aε + b => *ab
// Prefix changes, update log_prob of non_blank
new_hyp.ys.push_back(new_token);
new_hyp.log_prob_nb = hyp.LogProb(true) + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
update_prefix = true;
}
if (update_prefix) {
float lm_log_prob = hyp.lm_log_prob;
if (context_graph != nullptr && hyp.context_state != nullptr) {
auto context_res =
context_graph->ForwardOneStep(hyp.context_state, new_token);
lm_log_prob = lm_log_prob + std::get<0>(context_res);
new_hyp.context_state = std::get<1>(context_res);
}
new_hyp.lm_log_prob = lm_log_prob;
next_hyps.Add(std::move(new_hyp), /*use_ctc=*/true);
}
}
}
return next_hyps.GetTopK(max_active_paths, false, true);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/online-ctc-prefix-beam-search-decoder.cc` around lines 24 -
78, The loop creates new Hypothesis objects and updates log_prob_b/log_prob_nb
but calls next_hyps.Add(...) which uses the default (non-CTC) merge behavior;
change those Add calls to the CTC-aware merge variant (e.g., Hypotheses::AddCTC
or the Add overload that enables CTC merging) so identical token sequences (same
hyp.ys) are merged using CTC rules: combine log_prob_b and log_prob_nb via
log-sum-exp, preserve/update num_trailing_blanks correctly, and carry the proper
lm/context_state; update every place calling next_hyps.Add(std::move(new_hyp))
in this function (blank case, same-token case, and update_prefix branch) to use
the CTC-aware add so prefix merging is correct.

Comment on lines +50 to +59
new_hyp.ys.push_back(new_token);
new_hyp.log_prob_nb = hyp.log_prob_b + log_prob;
new_hyp.log_prob_b = -std::numeric_limits<float>::infinity();
new_hyp.num_trailing_blanks = 0;
update_prefix = true;
} else {
// Case 3: *a + b => *ab, *aε + b => *ab
// Prefix changes, update log_prob of non_blank
new_hyp.ys.push_back(new_token);
new_hyp.log_prob_nb = hyp.LogProb(true) + log_prob;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Record and return token timestamps.

The prefix-expansion paths append to ys, but never append the corresponding frame index, and Decode() never writes best_hyp.timestamps back into r.timestamps. That leaves online prefix-beam results without timing metadata and breaks the tokens.size() == timestamps.size() contract.

Suggested direction
-static std::vector<Hypothesis> StepWorker(const float *p_log_probs,
-                                          std::vector<Hypothesis> &hyps,
-                                          int32_t blank_id, int32_t vocab_size,
-                                          int32_t max_active_paths,
-                                          const ContextGraph *context_graph) {
+static std::vector<Hypothesis> StepWorker(const float *p_log_probs,
+                                          std::vector<Hypothesis> &hyps,
+                                          int32_t blank_id, int32_t vocab_size,
+                                          int32_t max_active_paths,
+                                          int32_t frame_index,
+                                          const ContextGraph *context_graph) {
...
         new_hyp.ys.push_back(new_token);
+        new_hyp.timestamps.push_back(frame_index);
         new_hyp.log_prob_nb = hyp.log_prob_b + log_prob;
...
         new_hyp.ys.push_back(new_token);
+        new_hyp.timestamps.push_back(frame_index);
         new_hyp.log_prob_nb = hyp.LogProb(true) + log_prob;
...
-      cur[b] = StepWorker(p, cur[b], blank_id_, vocab_size,
-                          max_active_paths_, context_graphs[b].get());
+      cur[b] = StepWorker(p, cur[b], blank_id_, vocab_size,
+                          max_active_paths_,
+                          (*results)[b].frame_offset + t,
+                          context_graphs[b].get());
...
     r.tokens = best_hyp.ys;
+    r.timestamps = best_hyp.timestamps;
     r.frame_offset += num_frames;

Also applies to: 58-63, 138-140

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/online-ctc-prefix-beam-search-decoder.cc` around lines 50 -
59, When expanding prefixes where code pushes new_token into new_hyp.ys (e.g.,
in the branches that set new_hyp.log_prob_nb and new_hyp.log_prob_b), also push
the corresponding frame index into new_hyp.timestamps (use the current
frame/time variable used in Decode, e.g., t or frame_idx); likewise ensure other
similar blocks (the ones around the other push_back sites and the ones noted at
58-63 and 138-140) update timestamps in lockstep with ys. Finally, when
finishing Decode(), copy best_hyp.timestamps into r.timestamps so the returned
Result preserves the token-to-timestamp correspondence (ensuring tokens.size()
== timestamps.size()).

Comment on lines +156 to 163
if (!hotwords_file.empty() && decoding_method != "modified_beam_search" &&
decoding_method != "prefix_beam_search") {
SHERPA_ONNX_LOGE(
"Please use --decoding-method=modified_beam_search if you"
"Please use --decoding-method=modified_beam_search or "
"--decoding-method=prefix_beam_search if you"
" provide --hotwords-file. Given --decoding-method=%s",
decoding_method.c_str());
return false;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reject hotwords_file when a CTC graph is configured.

This new validation now accepts hotwords_file with prefix_beam_search, but in sherpa-onnx/csrc/online-recognizer-ctc-impl.h the ctc_fst_decoder_config.graph path is chosen before prefix_beam_search, so the hotword biasing is silently ignored. Please fail fast for that combination instead of accepting a no-op config.

Suggested validation
+  if (!hotwords_file.empty() && !ctc_fst_decoder_config.graph.empty()) {
+    SHERPA_ONNX_LOGE(
+        "--hotwords-file is not supported when a CTC graph is configured.");
+    return false;
+  }
+
   if (!hotwords_file.empty() && decoding_method != "modified_beam_search" &&
       decoding_method != "prefix_beam_search") {
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if (!hotwords_file.empty() && decoding_method != "modified_beam_search" &&
decoding_method != "prefix_beam_search") {
SHERPA_ONNX_LOGE(
"Please use --decoding-method=modified_beam_search if you"
"Please use --decoding-method=modified_beam_search or "
"--decoding-method=prefix_beam_search if you"
" provide --hotwords-file. Given --decoding-method=%s",
decoding_method.c_str());
return false;
if (!hotwords_file.empty() && !ctc_fst_decoder_config.graph.empty()) {
SHERPA_ONNX_LOGE(
"--hotwords-file is not supported when a CTC graph is configured.");
return false;
}
if (!hotwords_file.empty() && decoding_method != "modified_beam_search" &&
decoding_method != "prefix_beam_search") {
SHERPA_ONNX_LOGE(
"Please use --decoding-method=modified_beam_search or "
"--decoding-method=prefix_beam_search if you"
" provide --hotwords-file. Given --decoding-method=%s",
decoding_method.c_str());
return false;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@sherpa-onnx/csrc/online-recognizer.cc` around lines 156 - 163, The current
validation accepts --hotwords-file with --decoding-method=prefix_beam_search
even when a CTC graph is configured, causing hotword biasing to be ignored;
update the check that currently inspects hotwords_file and decoding_method to
also reject when ctc_fst_decoder_config.graph is set. Specifically, in the
validation branch that looks at hotwords_file and decoding_method (symbols:
hotwords_file, decoding_method, "modified_beam_search", "prefix_beam_search"),
add a condition to fail (log error and return false) if
ctc_fst_decoder_config.graph is non-empty (symbol: ctc_fst_decoder_config.graph)
so that supplying hotwords with a configured CTC FST graph is rejected up front.
Ensure the error message mentions both hotwords_file and the presence of a CTC
graph to make the conflict clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants