GradiumSTTService improvements by markbackman · Pull Request #4066 · pipecat-ai/pipecat

markbackman · 2026-03-18T03:08:48Z

Summary

Improved GradiumSTTService transcription completeness by switching from silence-frame flushing to the flush API with text accumulation. Previously, trailing words could be dropped when the server's flushed response arrived before all text tokens were delivered.
Added a transcript aggregation delay (100ms) after flush to capture trailing tokens before finalizing the transcription.
Decoupled audio encoding from sample rate. The encoding parameter now takes a base type ("pcm", "wav", "opus") and the sample rate is derived from the pipeline's audio_in_sample_rate, assembled dynamically via input_format_from_encoding(). This fixes the mismatch where SAMPLE_RATE=24000 was passed to the base class while encoding defaulted to "pcm_16000".
Added model_name to the WebSocket setup message.
Aligned VAD handling with base class patterns and removed duplicate reconnection logic.

Breaking Changes

GradiumSTTService encoding parameter default changed from "pcm_16000" to "pcm". If you were passing encoding="pcm_16000" explicitly, change it to encoding="pcm" or omit it entirely.

Testing

Run uv run python examples/foundational/07zf-interruptible-gradium.py
Verify complete utterances appear in LLM context (no dropped trailing words)
Verify the WebSocket stays connected during pauses in speech

The _receive_messages method had its own while-True reconnect loop, duplicating the reconnection handling already provided by WebsocketService._receive_task_handler (exponential backoff, max retries, error reporting). Flatten to just the inner message loop and let the base class handle reconnection.

Replace the process_frame override with a _handle_vad_user_stopped_speaking override, which is the proper hook provided by STTService. Move start_processing_metrics() into run_stt (matching Gladia's pattern). Remove unused FrameDirection and VADUserStartedSpeakingFrame imports.

Enable the base class keepalive mechanism (10s timeout, 5s interval) and override _send_keepalive to wrap silence in Gradium's audio message format. Prevents idle connection timeouts, especially behind a ServiceSwitcher.

Inline _process_response into _receive_messages, add required model_name field to the setup message per Gradium docs, and improve _handle_text docstring.

Replace silence-based flushing with Gradium flush/flushed protocol. Accumulate word-level text fragments as InterimTranscriptionFrames and emit a single TranscriptionFrame on flush completion. Align VAD handling with CartesiaSTTService pattern using process_frame override. Remove keepalive (not supported by Gradium) and pass language to transcription frames.

…kens Gradium flushed response can arrive before all text tokens have been delivered. Instead of finalizing immediately on flushed, start a short timer (100ms) that allows trailing tokens to accumulate before pushing the final TranscriptionFrame.

codecov · 2026-03-18T03:35:34Z

Codecov Report

❌ Patch coverage is 21.42857% with 55 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/pipecat/services/gradium/stt.py	21.42%	55 Missing ⚠️

Files with missing lines	Coverage Δ
src/pipecat/services/gradium/stt.py	`31.25% <21.42%> (-1.24%)`	⬇️

... and 13 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

filipi87 · 2026-03-18T17:56:24Z

+        # and pushed as a TranscriptionFrame.
+        self._accumulated_text: list[str] = []
+        self._flush_counter = 0
+        self._transcript_aggregation_delay = 0.1  # seconds to wait after flushed


Is this something we would like to allow users to change ? Otherwise, I think this could be a constant.

filipi87 · 2026-03-18T17:58:21Z

+        self._accumulated_text: list[str] = []
+        self._flush_counter = 0


Do we need to reset these values when we disconnect the webSocket? For example, in case of a reconnection.

Yes, updating.

filipi87

Looks good. I just added a few possible improvements, but nothing major.

The encoding parameter now takes just the base type (pcm, wav, opus) and the sample rate is derived from the pipeline audio_in_sample_rate, assembled dynamically via input_format_from_encoding(). This fixes the mismatch where SAMPLE_RATE=24000 was passed to the base class while encoding defaulted to pcm_16000.

* Remove duplicate reconnection logic from Gradium STT The _receive_messages method had its own while-True reconnect loop, duplicating the reconnection handling already provided by WebsocketService._receive_task_handler (exponential backoff, max retries, error reporting). Flatten to just the inner message loop and let the base class handle reconnection. * Align Gradium STT VAD handling with base class patterns Replace the process_frame override with a _handle_vad_user_stopped_speaking override, which is the proper hook provided by STTService. Move start_processing_metrics() into run_stt (matching Gladia's pattern). Remove unused FrameDirection and VADUserStartedSpeakingFrame imports. * Add transcript aggregation delay after flushed to capture trailing tokens Gradium flushed response can arrive before all text tokens have been delivered. Instead of finalizing immediately on flushed, start a short timer (100ms) that allows trailing tokens to accumulate before pushing the final TranscriptionFrame. * Add changelog for PR #4066 * Change default encoding to pcm_16000 * Decouple encoding from sample_rate in Gradium STT The encoding parameter now takes just the base type (pcm, wav, opus) and the sample rate is derived from the pipeline audio_in_sample_rate, assembled dynamically via input_format_from_encoding(). This fixes the mismatch where SAMPLE_RATE=24000 was passed to the base class while encoding defaulted to pcm_16000.

markbackman added 6 commits March 16, 2026 21:43

Add keepalive support to Gradium STT service

d12f3bd

Enable the base class keepalive mechanism (10s timeout, 5s interval) and override _send_keepalive to wrap silence in Gradium's audio message format. Prevents idle connection timeouts, especially behind a ServiceSwitcher.

Clean up Gradium STT message handling and add model_name to setup

c8c2ed4

Inline _process_response into _receive_messages, add required model_name field to the setup message per Gradium docs, and improve _handle_text docstring.

markbackman added a commit that referenced this pull request Mar 18, 2026

Add changelog for PR #4066

f4c3d1e

Add changelog for PR #4066

c6945c5

markbackman force-pushed the mb/gradium-stt-improvements branch from f4c3d1e to c6945c5 Compare March 18, 2026 03:12

markbackman requested a review from aconchillo March 18, 2026 03:25

markbackman requested review from filipi87 and kompfner March 18, 2026 13:01

Change default encoding to pcm_16000

b0f77bc

markbackman force-pushed the mb/gradium-stt-improvements branch from a7d331d to b0f77bc Compare March 18, 2026 13:02

filipi87 reviewed Mar 18, 2026

View reviewed changes

filipi87 approved these changes Mar 18, 2026

View reviewed changes

Code review feedback

4d55a8e

markbackman force-pushed the mb/gradium-stt-improvements branch from fb0da4a to ef794ff Compare March 18, 2026 19:52

markbackman force-pushed the mb/gradium-stt-improvements branch from ef794ff to 4d9d8af Compare March 18, 2026 19:53

markbackman merged commit 4b704e6 into main Mar 18, 2026
6 checks passed

markbackman deleted the mb/gradium-stt-improvements branch March 18, 2026 19:57

markbackman mentioned this pull request Mar 18, 2026

docs: update for pipecat PR #4066 pipecat-ai/docs#630

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GradiumSTTService improvements#4066

GradiumSTTService improvements#4066
markbackman merged 10 commits intomainfrom
mb/gradium-stt-improvements

markbackman commented Mar 18, 2026 •

edited

Loading

Uh oh!

codecov Bot commented Mar 18, 2026 •

edited

Loading

Uh oh!

filipi87 Mar 18, 2026

Uh oh!

markbackman Mar 18, 2026

Uh oh!

filipi87 Mar 18, 2026

Uh oh!

markbackman Mar 18, 2026

Uh oh!

filipi87 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self._accumulated_text: list[str] = []
		self._flush_counter = 0

Conversation

markbackman commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking Changes

Testing

Uh oh!

codecov Bot commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

filipi87 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

markbackman Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

filipi87 Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

markbackman Mar 18, 2026

Choose a reason for hiding this comment

Uh oh!

filipi87 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

markbackman commented Mar 18, 2026 •

edited

Loading

codecov Bot commented Mar 18, 2026 •

edited

Loading