Summary
The Moonshine backend's _guard_repetition() fails to catch two classes of
hallucinated repetition that occur in practice, producing garbage output that
can be hundreds of characters long.
Reproduction
Run the roundtrip benchmark with difficult phrases:
.venv312/bin/python scripts/tts_roundtrip.py \
--asr-backend moonshine --moonshine-model-name moonshine/base \
--phrase "The sixth sick sheik's sixth sheep's sick" \
--phrase "We still have issues with recording cutting out on long sentences, and we need deterministic regression tests to catch regressions before they ship"
Observed output
Hyphenated repetition (base model, tongue twister):
REF: The sixth sick sheik's sixth sheep's sick
HYP: The six-six-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-hake-...
similarity: 0.067
Clause-level repetition (base model, long sentence):
REF: We still have issues with recording cutting out on long sentences...
HYP: We still have issues with recording cutting out on long sentences. and we still have issues with recording cutting out on long sentences, and we still have issues with recording cutting out on long sentences, and we need, and we need, and we need, and we need, and we need, and we need, and
similarity: 0.342
Numeric hallucination (tiny model):
REF: Invoice 4827 totals one hundred and fifty three dollars and twelve cents
HYP: Invoice 4827 totals 153 dollars and 12700000000000000000000000000000000000000000000000000000...
similarity: 0.232
Root cause
The repetition guard in asr_moonshine.py:_guard_repetition() has two blind spots:
-
Hyphenated tokens bypass word-level n-gram detection.
"hake-hake-hake-hake" is a single word when split by spaces, so the
n-gram loop (which splits on whitespace) never sees a repeating pattern.
The word-count cap (_MAX_WORDS_PER_SEC = 6.0) can't fire either because
there's only 1 "word".
-
Clause-level repetition exceeds the n-gram window (max 4 words).
A repeating unit like "and we still have issues with recording cutting out on long sentences" is ~11 words. The guard only checks patterns of 1–4
words, so the full-clause loop is invisible to it.
-
Repeated digits/characters within a single token (e.g. 127000000...)
are not words at all — they're character-level repetition inside one token.
Relevant code
shuvoice/asr_moonshine.py — _guard_repetition() (line ~266)
_REPETITION_THRESHOLD = 4 (line 61)
_MAX_WORDS_PER_SEC = 6.0 (line 60)
Proposed fixes
1. Character-level repetition detection
Before the word-level checks, scan for any character or short substring
repeating more than N times consecutively:
# Catch "hake-hake-hake..." and "000000000..."
import re
char_rep = re.search(r'(.{1,10}?)\1{5,}', text)
if char_rep:
# Truncate at first repetition run
text = text[:char_rep.start() + len(char_rep.group(1))]
2. Expand n-gram window or use substring matching
Either increase the max pattern length from 4 to ~15 words, or use a
suffix-based approach that detects when the last N words appeared earlier
in the output.
3. Output length cap relative to input duration
The existing _MAX_WORDS_PER_SEC cap works for word-count but not
character-count. Add a parallel character-count cap:
max_chars = max(100, int(audio_seconds * 40)) # ~40 chars/sec generous cap
if len(text) > max_chars:
text = text[:max_chars].rsplit(' ', 1)[0]
Acceptance criteria
Summary
The Moonshine backend's
_guard_repetition()fails to catch two classes ofhallucinated repetition that occur in practice, producing garbage output that
can be hundreds of characters long.
Reproduction
Run the roundtrip benchmark with difficult phrases:
Observed output
Hyphenated repetition (base model, tongue twister):
Clause-level repetition (base model, long sentence):
Numeric hallucination (tiny model):
Root cause
The repetition guard in
asr_moonshine.py:_guard_repetition()has two blind spots:Hyphenated tokens bypass word-level n-gram detection.
"hake-hake-hake-hake"is a single word when split by spaces, so then-gram loop (which splits on whitespace) never sees a repeating pattern.
The word-count cap (
_MAX_WORDS_PER_SEC = 6.0) can't fire either becausethere's only 1 "word".
Clause-level repetition exceeds the n-gram window (max 4 words).
A repeating unit like
"and we still have issues with recording cutting out on long sentences"is ~11 words. The guard only checks patterns of 1–4words, so the full-clause loop is invisible to it.
Repeated digits/characters within a single token (e.g.
127000000...)are not words at all — they're character-level repetition inside one token.
Relevant code
shuvoice/asr_moonshine.py—_guard_repetition()(line ~266)_REPETITION_THRESHOLD = 4(line 61)_MAX_WORDS_PER_SEC = 6.0(line 60)Proposed fixes
1. Character-level repetition detection
Before the word-level checks, scan for any character or short substring
repeating more than N times consecutively:
2. Expand n-gram window or use substring matching
Either increase the max pattern length from 4 to ~15 words, or use a
suffix-based approach that detects when the last N words appeared earlier
in the output.
3. Output length cap relative to input duration
The existing
_MAX_WORDS_PER_SECcap works for word-count but notcharacter-count. Add a parallel character-count cap:
Acceptance criteria
000000...)test_asr.pytests still pass