feat: handle server_content.interrupted for faster interruptions by lukepayyapilli · Pull Request #3429 · pipecat-ai/pipecat

lukepayyapilli · 2026-01-13T17:47:19Z

Summary

Handle server_content.interrupted signal in GeminiLiveLLMService message loop.
Provides ~700-1100ms faster interruptions when not using local VAD.

Problem

When using GeminiLiveLLMService without local VAD (Silero), interruptions were delayed because the service waited for input_transcription. Gemini sends server_content.interrupted instantly when its VAD detects speech.

Approach

Added inline handling in the message loop - no new methods or config flags.

Alternatives considered:

Add config flag (use_native_vad_interruptions) - rejected per YAGNI, no one has asked for it
Create separate handler method - rejected, 3 lines of code doesn't warrant abstraction

Why this is safe: _handle_interruption() is idempotent, so duplicate signals (from both local VAD and Gemini) are harmless.

Fixes #3381

codecov · 2026-01-13T17:49:30Z

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/pipecat/services/google/gemini_live/llm.py	0.00%	4 Missing ⚠️

Files with missing lines	Coverage Δ
src/pipecat/services/google/gemini_live/llm.py	`19.75% <0.00%> (-0.09%)`	⬇️

... and 23 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

kompfner · 2026-01-14T16:56:09Z

-                        if message.server_content and message.server_content.model_turn:
+                        if message.server_content and message.server_content.interrupted:
+                            logger.debug("Gemini VAD: interrupted signal received")
+                            await self._handle_interruption()


I think we ought to use the standard interruption-triggering pattern here:

await self.push_interruption_task_frame_and_wait()

(We never push InterruptionFrames directly from services anymore, we always use this mechanism instead; this'll guarantees that the whole pipeline handles the interruption and makes it easier for us to maintain the interruption mechanism going forward).

As a nice side-effect of moving to this pattern, I think we'd no longer need to call await self._handle_interruption(), as GeminiLiveLLMService would receive and process an InterruptionFrame as usual.

I also think we need to precede the interruption with:

await self.broadcast_frame(UserStartedSpeakingFrame)

This is an important signal for the context-management system (i.e. LLMContextAggregatorPair) that records user and assistant messages in context.

Although...this is reminding me of a similar contribution (which appears never to have gotten merged) for leveraging AWS Nova Sonic's built-in VAD: #2431. There, we discussed the importance of also sending UserStoppedSpeakingFrame, if there's no local VAD in the pipeline, as that signal is also essential to context recording.

Can we find a way to broadcast a UserStoppedSpeakingFrame from this service, too?

cc @aconchillo, who last touched UserStartedSpeakingFrame/UserStoppedSpeakingFrame in other speech-to-speech services (OpenAI, Grok)...what was the effect of having the speech-to-speech service emit these frames in the case where the pipeline also had local VAD configured?

Ah, the recommendation if you want to have context management in your pipeline without local VAD (i.e. LLMContextAggregatorPair) is that you'd configure the user aggregator with ExternalUserTurnStrategies(). Though...it may not hurt to just turn off local VAD without updating the user aggregator in that way...

Thanks for the thorough review @kompfner! Updated to address your feedback:

Using push_interruption_task_frame_and_wait() instead of pushing InterruptionFrame directly.

Added broadcast_frame(UserStartedSpeakingFrame()) before the interruption for context management.

Regarding UserStoppedSpeakingFrame - I'll defer to @aconchillo on whether that's needed here, given the open question about interaction with local VAD.

@kompfner bumping this for another review when you have a moment - thanks!

Ah sorry for not being clearer—regardless of the open question about interaction with local VAD, if we're emitting UserStartedSpeakingFrame from this service, we'll also need to emit UserStoppedSpeakingFrame.

I can help with some testing to ensure that the duplicate signal (in the case where we also have local VAD) doesn't cause issues. I suspect we'd be OK (OpenAI Realtime already always emits interruptions + user started/stopped speaking).

kompfner · 2026-01-23T15:33:45Z

+                        if message.server_content and message.server_content.interrupted:
+                            logger.debug("Gemini VAD: interrupted signal received")
+                            await self.broadcast_frame(UserStartedSpeakingFrame())
+                            await self.push_interruption_task_frame_and_wait()


Ah just realized that this condition (server_content.interrupted) only occurs for a barge-in, not for a "normal" user utterance that follows the assistant response.

If this service is responsible for firing UserStartedSpeakingFrame (and UserStoppedSpeakingFrame) it should be able to do so in all circumstances, barge-in or not.

OK, let's go with the simplest thing for now: just the await self.push_interruption_task_frame_and_wait() (no UserStartedSpeakingFrame).

Maybe add this comment above that line:

# NOTE: while the service triggers interruptions in # the specific case of barge-ins, it does *not* # emit UserStarted/StoppedSpeakingFrames, as the # Gemini Live API does not give us broadly reliable # signals to base those off of. Pipelines that # require turn tracking (like those using context # aggregators) still need an independent way to # track turns, such as local Silero VAD in # combination with the context aggregator default # turn strategies.

Done - implemented as suggested. Thanks!

lukepayyapilli · 2026-01-23T15:37:33Z

Good point @kompfner - interrupted only fires for barge-ins. For this PR, the intent was specifically to improve barge-in latency by leveraging Gemini's signal. Full VAD responsibility (handling normal utterances too) feels like a larger architectural decision that might warrant a separate discussion/issue. Would it be acceptable to keep this PR focused on the barge-in case, with the understanding that local VAD is still needed for complete coverage?

kompfner · 2026-01-23T16:07:13Z

Good point @kompfner - interrupted only fires for barge-ins. For this PR, the intent was specifically to improve barge-in latency by leveraging Gemini's signal. Full VAD responsibility (handling normal utterances too) feels like a larger architectural decision that might warrant a separate discussion/issue. Would it be acceptable to keep this PR focused on the barge-in case, with the understanding that local VAD is still needed for complete coverage?

Trying to think through the ramifications of a service only firing events for barge-in and not for "regular" back and forth...I believe that inconsistency could be a problem because then users wouldn't get a clear sense of whether or not they needed to also have independent VAD-based turn detection in their pipeline. So I do think it might unfortunately be an all-or-nothing thing—either a service is responsible for emitting turn-related events (user started/stopped) or it's not.

Based on some brief research, it looks to me like Gemini Live sadly doesn't have timely events we can listen to to detect when it thinks user speech has actually started...

...but maybe for the purpose of context management (LLMContextAggregatorPair) working properly, it doesn't have to be that timely?

Except, shoot, no, the user started/stopped speaking frames also drive "on_user_turn_started" events, not to mention that folks might have custom processors or observers in their pipeline that expect user started/stopped frames to actually correspond in time to when the user has actually started and stopped speaking...

OK, here's my current thinking: maybe we just say, for now, that this service does not act as a turn controller (i.e. it doesn't emit user started/stopped speaking frames), and if you do need turn tracking in your pipeline (for context recording, say) then you need to also BYO turn tracking (like enabling local VAD + using context aggregator defaults). I can help do some testing to ensure that emitting just the interruption frame doesn't have any adverse effects, in pipelines with context recording.

Pardon the circling around on this question, it's proving a bit tricky!

kompfner · 2026-01-23T17:26:50Z

@@ -0,0 +1 @@
+- Added handling for `server_content.interrupted` signal in Gemini Live services for faster interruption response.


Suggested change

- Added handling for `server_content.interrupted` signal in Gemini Live services for faster interruption response.

- Added handling for `server_content.interrupted` signal in the Gemini Live service for faster interruption response in the case where there isn't already turn tracking in the pipeline, e.g. local VAD + context aggregators. When there is already turn tracking in the pipeline, the additional interruption does no harm.

Updated with your suggested text. Thanks!

kompfner · 2026-01-23T17:34:33Z

OK, after the last few suggestions (README update, removing UserStartedSpeakingFrame, and the code comment), let's get this thing in. Thanks for your patience, took a while to reason through this one.

lukepayyapilli · 2026-01-26T15:09:48Z

@kompfner Implemented the suggested changes - removed UserStartedSpeakingFrame, added the explanatory comment, and updated the changelog. Thanks for the clear guidance!

kompfner

Thanks for the contribution! And for your patience with my back-and-forth suggestions 🙏

lukepayyapilli mentioned this pull request Jan 13, 2026

GeminiLiveLLMService doesn't handle server_content.interrupted - causes delayed interruptions #3381

Closed

kompfner reviewed Jan 14, 2026

View reviewed changes

lukepayyapilli requested a review from kompfner January 15, 2026 13:35

kompfner mentioned this pull request Jan 23, 2026

Bugfix/aws nova sonic interruption propagation #2431

Closed

kompfner reviewed Jan 23, 2026

View reviewed changes

lukepayyapilli force-pushed the fix/gemini-live-interrupted-signal branch from f89ae45 to c65a89c Compare January 23, 2026 15:39

feat: handle server_content.interrupted for faster barge-in response

cadced3

lukepayyapilli force-pushed the fix/gemini-live-interrupted-signal branch from c65a89c to cadced3 Compare January 23, 2026 15:41

lukepayyapilli requested a review from kompfner January 23, 2026 15:42

kompfner mentioned this pull request Jan 23, 2026

push UserStartedSpeakingFrame before interruption #3298

Merged

kompfner reviewed Jan 23, 2026

View reviewed changes

Address review: remove UserStartedSpeakingFrame, add explanatory comment

b9390cc

lukepayyapilli requested a review from kompfner January 26, 2026 15:10

kompfner approved these changes Jan 28, 2026

View reviewed changes

kompfner merged commit 312caab into pipecat-ai:main Jan 28, 2026
6 checks passed

lukepayyapilli deleted the fix/gemini-live-interrupted-signal branch January 28, 2026 16:58

		@@ -0,0 +1 @@
		- Added handling for `server_content.interrupted` signal in Gemini Live services for faster interruption response.

Conversation

lukepayyapilli commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Approach

Uh oh!

codecov Bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

kompfner Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

kompfner Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kompfner Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukepayyapilli Jan 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukepayyapilli Jan 21, 2026

Choose a reason for hiding this comment

Uh oh!

kompfner Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

kompfner Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

kompfner Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

lukepayyapilli Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

lukepayyapilli commented Jan 23, 2026

Uh oh!

kompfner commented Jan 23, 2026

Uh oh!

kompfner Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukepayyapilli Jan 26, 2026

Choose a reason for hiding this comment

Uh oh!

kompfner commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lukepayyapilli commented Jan 26, 2026

Uh oh!

kompfner left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

lukepayyapilli commented Jan 13, 2026 •

edited

Loading

codecov Bot commented Jan 13, 2026 •

edited

Loading

kompfner Jan 14, 2026 •

edited

Loading

kompfner Jan 14, 2026 •

edited

Loading

lukepayyapilli Jan 15, 2026 •

edited

Loading

kompfner Jan 23, 2026 •

edited

Loading

kompfner commented Jan 23, 2026 •

edited

Loading