pipecat version
pipecat-ai-small-webrtc-prebuilt>=2.2.0
pipecat-ai[cartesia,daily,deepgram,google,silero,tracing,webrtc]>=0.0.102
pipecatcloud>=0.2.20
Python version
3.13
Operating System
macOS Tahoe
Related issue
This is a follow-up to #3350 — the issue persists.
Issue description
Using the Gemini Live model without a separate STT service, only the assistant transcription is returned. User transcription is not surfaced to the client, even though the Gemini model does appear to receive and process user audio.
Apologies if this turns out to be an issue on Gemini's side.
When the transcription option is enabled, only assistant transcription comes back from the Gemini model. Sometimes it works fine, but most of the time user transcription is missing. I also tried to capture user transcription using a custom frame processor, but it does not appear to be returned at all.
Is there a way to enable a "transcribe audio" option similar to the previous Gemini multimodal library version?
Reproduction steps
Running the foundational example 26a-gemini-live-transcription.py as-is.
Expected behavior
Both user and assistant transcriptions should appear in the client UI:
assistant: Hello! Do you want to hear a joke?
user: Yeah, sure.
assistant: Why don't scientists trust atoms?
user: I'm not sure.
assistant: Because they make everything up!
Actual behavior
Only the assistant transcription is displayed in the client UI. The user messages are missing:
(Screenshot: The client UI shows only "assistant" messages with no user messages visible.)
Logs
Note: the server-side logs do show [Transcription:user] debug lines from Gemini, but these do not appear to propagate to the client.
2026-02-19 23:05:06.077 | INFO | __main__:run_bot:55 - Starting bot
2026-02-19 23:05:06.093 | DEBUG | pipecat.audio.vad.silero:__init__:147 - Loading Silero VAD model...
2026-02-19 23:05:06.127 | DEBUG | pipecat.audio.vad.silero:__init__:169 - Loaded Silero VAD
2026-02-19 23:05:06.128 | DEBUG | pipecat.audio.turn.smart_turn.local_smart_turn_v3:__init__:74 - Loading Local Smart Turn v3.x model from /Users/will/dev/phil/.venv/lib/python3.13/site-packages/pipecat/audio/turn/smart_turn/data/smart-turn-v3.2-cpu.onnx...
2026-02-19 23:05:06.151 | DEBUG | pipecat.audio.turn.smart_turn.local_smart_turn_v3:__init__:85 - Loaded Local Smart Turn v3.x
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking Pipeline#0::Source -> SmallWebRTCInputTransport#0
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking SmallWebRTCInputTransport#0 -> LLMUserAggregator#0
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking LLMUserAggregator#0 -> GeminiLiveLLMService#0
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking GeminiLiveLLMService#0 -> SmallWebRTCOutputTransport#0
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking SmallWebRTCOutputTransport#0 -> LLMAssistantAggregator#0
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking LLMAssistantAggregator#0 -> Pipeline#0::Sink
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking PipelineTask#0::Source -> RTVIProcessor#0
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking RTVIProcessor#0 -> Pipeline#0
2026-02-19 23:05:06.151 | DEBUG | pipecat.processors.frame_processor:link:561 - Linking Pipeline#0 -> PipelineTask#0::Sink
2026-02-19 23:05:06.151 | DEBUG | pipecat.pipeline.runner:run:71 - Runner PipelineRunner#0 started running PipelineTask#0
2026-02-19 23:05:06.152 | DEBUG | pipecat.pipeline.task:_wait_for_pipeline_start:718 - PipelineTask#0: Starting. Waiting for StartFrame#0 to reach the end of the pipeline...
2026-02-19 23:05:06.153 | INFO | pipecat.services.google.gemini_live.llm:_connect:1072 - Connecting to Gemini service
2026-02-19 23:05:06.228 | INFO | pipecat.services.google.gemini_live.llm:_connection_task_handler:1187 - Connected to Gemini service
2026-02-19 23:05:06.228 | DEBUG | pipecat.services.google.gemini_live.llm:_create_initial_response:1390 - Creating initial response
2026-02-19 23:05:07.088 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:131 - GeminiLiveLLMService#0 TTFB: 0.8599798679351807
2026-02-19 23:05:07.636 | DEBUG | pipecat.transports.base_output:_bot_started_speaking:608 - Bot started speaking
2026-02-19 23:05:09.828 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:173 - GeminiLiveLLMService#0 prompt tokens: 355, completion tokens: 60, reasoning tokens: 35
2026-02-19 23:05:09.871 | INFO | __main__:on_assistant_turn_stopped:133 - Transcript: [2026-02-19T23:05:07.089+00:00] assistant: Hello! Do you want to hear a joke?
2026-02-19 23:05:10.220 | DEBUG | pipecat.transports.base_output:_bot_stopped_speaking:630 - Bot stopped speaking
2026-02-19 23:05:10.725 | DEBUG | pipecat.processors.aggregators.llm_response_universal:_on_user_turn_started:685 - LLMUserAggregator#0: User started speaking
2026-02-19 23:05:11.556 | DEBUG | pipecat.services.google.gemini_live.llm:_handle_msg_input_transcription:1683 - [Transcription:user] [Yeah, sure.]
2026-02-19 23:05:11.576 | DEBUG | pipecat.audio.turn.smart_turn.base_smart_turn:analyze_end_of_turn:162 - End of Turn result: EndOfTurnState.INCOMPLETE
2026-02-19 23:05:14.500 | DEBUG | pipecat.processors.aggregators.llm_response_universal:_on_user_turn_stopped:703 - LLMUserAggregator#0: User stopped speaking
2026-02-19 23:05:14.501 | INFO | __main__:on_user_turn_stopped:127 - Transcript: [2026-02-19T23:05:10.725+00:00] user: Yeah, sure.
2026-02-19 23:05:15.699 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:173 - GeminiLiveLLMService#0 prompt tokens: 471, completion tokens: 77, reasoning tokens: 23
2026-02-19 23:05:15.715 | INFO | __main__:on_assistant_turn_stopped:133 - Transcript: [2026-02-19T23:05:12.284+00:00] assistant: Why don't scientists trust atoms?
2026-02-19 23:05:16.063 | DEBUG | pipecat.transports.base_output:_bot_stopped_speaking:630 - Bot stopped speaking
2026-02-19 23:05:17.220 | DEBUG | pipecat.processors.aggregators.llm_response_universal:_on_user_turn_started:685 - LLMUserAggregator#0: User started speaking
2026-02-19 23:05:18.033 | DEBUG | pipecat.services.google.gemini_live.llm:_handle_msg_input_transcription:1683 - [Transcription:user] [I'm not sure.]
2026-02-19 23:05:18.149 | DEBUG | pipecat.audio.turn.smart_turn.base_smart_turn:analyze_end_of_turn:162 - End of Turn result: EndOfTurnState.COMPLETE
2026-02-19 23:05:18.149 | DEBUG | pipecat.processors.aggregators.llm_response_universal:_on_user_turn_stopped:703 - LLMUserAggregator#0: User stopped speaking
2026-02-19 23:05:18.149 | INFO | __main__:on_user_turn_stopped:127 - Transcript: [2026-02-19T23:05:17.220+00:00] user: I'm not sure.
2026-02-19 23:05:18.775 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:131 - GeminiLiveLLMService#0 TTFB: 0.6261801719665527
2026-02-19 23:05:19.237 | DEBUG | pipecat.transports.base_output:_bot_started_speaking:608 - Bot started speaking
2026-02-19 23:05:21.747 | DEBUG | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:173 - GeminiLiveLLMService#0 prompt tokens: 601, completion tokens: 69, reasoning tokens: 28
2026-02-19 23:05:21.752 | INFO | __main__:on_assistant_turn_stopped:133 - Transcript: [2026-02-19T23:05:18.777+00:00] assistant: Because they make everything up!
2026-02-19 23:05:22.101 | DEBUG | pipecat.transports.base_output:_bot_stopped_speaking:630 - Bot stopped speaking
2026-02-19 23:05:22.505 | DEBUG | pipecat.transports.smallwebrtc.connection:_handle_new_connection_state:564 - Connection state changed to: closed
2026-02-19 23:05:22.506 | INFO | __main__:on_client_disconnected:120 - Client disconnected
2026-02-19 23:05:22.508 | INFO | pipecat.services.google.gemini_live.llm:_disconnect:1292 - Disconnecting from Gemini service
2026-02-19 23:05:22.523 | DEBUG | pipecat.pipeline.task:run:616 - Pipeline task PipelineTask#0 has finished
Key observation: The server logs show [Transcription:user] debug lines from Gemini (e.g., [Transcription:user] [Yeah, sure.]), and on_user_turn_stopped fires with the correct text. However, the user transcription does not appear in the client UI — only assistant messages are displayed.
pipecat version
Python version
3.13
Operating System
macOS Tahoe
Related issue
This is a follow-up to #3350 — the issue persists.
Issue description
Using the Gemini Live model without a separate STT service, only the assistant transcription is returned. User transcription is not surfaced to the client, even though the Gemini model does appear to receive and process user audio.
Apologies if this turns out to be an issue on Gemini's side.
When the transcription option is enabled, only assistant transcription comes back from the Gemini model. Sometimes it works fine, but most of the time user transcription is missing. I also tried to capture user transcription using a custom frame processor, but it does not appear to be returned at all.
Is there a way to enable a "transcribe audio" option similar to the previous Gemini multimodal library version?
Reproduction steps
Running the foundational example
26a-gemini-live-transcription.pyas-is.Expected behavior
Both user and assistant transcriptions should appear in the client UI:
Actual behavior
Only the assistant transcription is displayed in the client UI. The user messages are missing:
(Screenshot: The client UI shows only "assistant" messages with no user messages visible.)
Logs
Note: the server-side logs do show
[Transcription:user]debug lines from Gemini, but these do not appear to propagate to the client.Key observation: The server logs show
[Transcription:user]debug lines from Gemini (e.g.,[Transcription:user] [Yeah, sure.]), andon_user_turn_stoppedfires with the correct text. However, the user transcription does not appear in the client UI — only assistant messages are displayed.