Problem
When using GeminiLiveLLMService with Gemini Live API, interruptions are noticeably delayed compared to using LiveKit Agents SDK directly. After investigation, I found the root cause.
Root Cause
Gemini Live API sends two signals when a user starts speaking:
server_content.interrupted = True - fires instantly when VAD detects speech
server_content.input_transcription - fires after speech is transcribed (500-2000ms later)
Currently, GeminiLiveLLMService only handles input_transcription for interruption detection (via emulated VAD). The instant interrupted signal is ignored.
Current behavior in _connection_task_handler:
# Handled:
✅ message.server_content.model_turn
✅ message.server_content.turn_complete
✅ message.server_content.input_transcription
# Not handled:
❌ message.server_content.interrupted
Impact
| Signal Used |
Interruption Latency |
User Experience |
interrupted |
~50-100ms |
Natural, snappy |
input_transcription |
~500-2000ms |
Bot talks over user |
The 1-2 second delay makes conversations feel unnatural. Users have to finish their phrase before the bot stops talking.
Proposed Fix
Add handling for server_content.interrupted in the message loop:
async for message in turn:
# Check interrupted signal FIRST - instant VAD detection
if message.server_content and getattr(message.server_content, 'interrupted', False):
logger.debug("Gemini VAD: interrupted signal received")
await self.push_frame(StartInterruptionFrame())
continue
# ... rest of existing handlers
Reference
LiveKit Agents SDK handles this signal directly, which is why it feels snappier:
if server_content.interrupted:
self.speaker_audio_buffer.clear()
Environment
- pipecat-ai version: 0.0.90
- Model:
gemini-2.5-flash-native-audio-preview-12-2025
- Transport: LiveKit
Problem
When using
GeminiLiveLLMServicewith Gemini Live API, interruptions are noticeably delayed compared to using LiveKit Agents SDK directly. After investigation, I found the root cause.Root Cause
Gemini Live API sends two signals when a user starts speaking:
server_content.interrupted = True- fires instantly when VAD detects speechserver_content.input_transcription- fires after speech is transcribed (500-2000ms later)Currently,
GeminiLiveLLMServiceonly handlesinput_transcriptionfor interruption detection (via emulated VAD). The instantinterruptedsignal is ignored.Current behavior in
_connection_task_handler:Impact
interruptedinput_transcriptionThe 1-2 second delay makes conversations feel unnatural. Users have to finish their phrase before the bot stops talking.
Proposed Fix
Add handling for
server_content.interruptedin the message loop:Reference
LiveKit Agents SDK handles this signal directly, which is why it feels snappier:
Environment
gemini-2.5-flash-native-audio-preview-12-2025