Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog/3429.added.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
- Added handling for `server_content.interrupted` signal in the Gemini Live service for faster interruption response in the case where there isn't already turn tracking in the pipeline, e.g. local VAD + context aggregators. When there is already turn tracking in the pipeline, the additional interruption does no harm.
15 changes: 14 additions & 1 deletion src/pipecat/services/google/gemini_live/llm.py
Original file line number Diff line number Diff line change
Expand Up @@ -1198,7 +1198,20 @@ async def _connection_task_handler(self, config: LiveConnectConfig):
# Reset failure counter if connection has been stable
self._check_and_reset_failure_counter()

if message.server_content and message.server_content.model_turn:
if message.server_content and message.server_content.interrupted:
# NOTE: while the service triggers interruptions in
# the specific case of barge-ins, it does *not*
# emit UserStarted/StoppedSpeakingFrames, as the
# Gemini Live API does not give us broadly reliable
# signals to base those off of. Pipelines that
# require turn tracking (like those using context
# aggregators) still need an independent way to
# track turns, such as local Silero VAD in
# combination with the context aggregator default
# turn strategies.
logger.debug("Gemini VAD: interrupted signal received")
await self.push_interruption_task_frame_and_wait()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah just realized that this condition (server_content.interrupted) only occurs for a barge-in, not for a "normal" user utterance that follows the assistant response.

If this service is responsible for firing UserStartedSpeakingFrame (and UserStoppedSpeakingFrame) it should be able to do so in all circumstances, barge-in or not.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, let's go with the simplest thing for now: just the await self.push_interruption_task_frame_and_wait() (no UserStartedSpeakingFrame).

Maybe add this comment above that line:

# NOTE: while the service triggers interruptions in
# the specific case of barge-ins, it does *not*
# emit UserStarted/StoppedSpeakingFrames, as the
# Gemini Live API does not give us broadly reliable
# signals to base those off of. Pipelines that
# require turn tracking (like those using context
# aggregators) still need an independent way to
# track turns, such as local Silero VAD in
# combination with the context aggregator default
# turn strategies.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - implemented as suggested. Thanks!

elif message.server_content and message.server_content.model_turn:
await self._handle_msg_model_turn(message)
elif (
message.server_content
Expand Down