Merged
Conversation
The _receive_messages method had its own while-True reconnect loop, duplicating the reconnection handling already provided by WebsocketService._receive_task_handler (exponential backoff, max retries, error reporting). Flatten to just the inner message loop and let the base class handle reconnection.
Replace the process_frame override with a _handle_vad_user_stopped_speaking override, which is the proper hook provided by STTService. Move start_processing_metrics() into run_stt (matching Gladia's pattern). Remove unused FrameDirection and VADUserStartedSpeakingFrame imports.
Enable the base class keepalive mechanism (10s timeout, 5s interval) and override _send_keepalive to wrap silence in Gradium's audio message format. Prevents idle connection timeouts, especially behind a ServiceSwitcher.
Inline _process_response into _receive_messages, add required model_name field to the setup message per Gradium docs, and improve _handle_text docstring.
Replace silence-based flushing with Gradium flush/flushed protocol. Accumulate word-level text fragments as InterimTranscriptionFrames and emit a single TranscriptionFrame on flush completion. Align VAD handling with CartesiaSTTService pattern using process_frame override. Remove keepalive (not supported by Gradium) and pass language to transcription frames.
…kens Gradium flushed response can arrive before all text tokens have been delivered. Instead of finalizing immediately on flushed, start a short timer (100ms) that allows trailing tokens to accumulate before pushing the final TranscriptionFrame.
markbackman
added a commit
that referenced
this pull request
Mar 18, 2026
f4c3d1e to
c6945c5
Compare
Codecov Report❌ Patch coverage is
... and 13 files with indirect coverage changes 🚀 New features to boost your workflow:
|
a7d331d to
b0f77bc
Compare
filipi87
reviewed
Mar 18, 2026
| # and pushed as a TranscriptionFrame. | ||
| self._accumulated_text: list[str] = [] | ||
| self._flush_counter = 0 | ||
| self._transcript_aggregation_delay = 0.1 # seconds to wait after flushed |
Contributor
There was a problem hiding this comment.
Is this something we would like to allow users to change ? Otherwise, I think this could be a constant.
filipi87
reviewed
Mar 18, 2026
Comment on lines
+203
to
+204
| self._accumulated_text: list[str] = [] | ||
| self._flush_counter = 0 |
Contributor
There was a problem hiding this comment.
Do we need to reset these values when we disconnect the webSocket? For example, in case of a reconnection.
filipi87
approved these changes
Mar 18, 2026
Contributor
filipi87
left a comment
There was a problem hiding this comment.
Looks good. I just added a few possible improvements, but nothing major.
fb0da4a to
ef794ff
Compare
The encoding parameter now takes just the base type (pcm, wav, opus) and the sample rate is derived from the pipeline audio_in_sample_rate, assembled dynamically via input_format_from_encoding(). This fixes the mismatch where SAMPLE_RATE=24000 was passed to the base class while encoding defaulted to pcm_16000.
ef794ff to
4d9d8af
Compare
markbackman
added a commit
that referenced
this pull request
Mar 21, 2026
* Remove duplicate reconnection logic from Gradium STT The _receive_messages method had its own while-True reconnect loop, duplicating the reconnection handling already provided by WebsocketService._receive_task_handler (exponential backoff, max retries, error reporting). Flatten to just the inner message loop and let the base class handle reconnection. * Align Gradium STT VAD handling with base class patterns Replace the process_frame override with a _handle_vad_user_stopped_speaking override, which is the proper hook provided by STTService. Move start_processing_metrics() into run_stt (matching Gladia's pattern). Remove unused FrameDirection and VADUserStartedSpeakingFrame imports. * Add transcript aggregation delay after flushed to capture trailing tokens Gradium flushed response can arrive before all text tokens have been delivered. Instead of finalizing immediately on flushed, start a short timer (100ms) that allows trailing tokens to accumulate before pushing the final TranscriptionFrame. * Add changelog for PR #4066 * Change default encoding to pcm_16000 * Decouple encoding from sample_rate in Gradium STT The encoding parameter now takes just the base type (pcm, wav, opus) and the sample rate is derived from the pipeline audio_in_sample_rate, assembled dynamically via input_format_from_encoding(). This fixes the mismatch where SAMPLE_RATE=24000 was passed to the base class while encoding defaulted to pcm_16000.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
GradiumSTTServicetranscription completeness by switching from silence-frame flushing to the flush API with text accumulation. Previously, trailing words could be dropped when the server'sflushedresponse arrived before all text tokens were delivered.encodingparameter now takes a base type ("pcm","wav","opus") and the sample rate is derived from the pipeline'saudio_in_sample_rate, assembled dynamically viainput_format_from_encoding(). This fixes the mismatch whereSAMPLE_RATE=24000was passed to the base class while encoding defaulted to"pcm_16000".model_nameto the WebSocket setup message.Breaking Changes
GradiumSTTServiceencodingparameter default changed from"pcm_16000"to"pcm". If you were passingencoding="pcm_16000"explicitly, change it toencoding="pcm"or omit it entirely.Testing
uv run python examples/foundational/07zf-interruptible-gradium.py