Skip to content

Gemini Live: User transcription not returned (only assistant transcription) #3780

@WillPowellUk

Description

@WillPowellUk

pipecat version

pipecat-ai-small-webrtc-prebuilt>=2.2.0
pipecat-ai[cartesia,daily,deepgram,google,silero,tracing,webrtc]>=0.0.102
pipecatcloud>=0.2.20

Python version

3.13

Operating System

macOS Tahoe

Related issue

This is a follow-up to #3350 — the issue persists.

Issue description

Using the Gemini Live model without a separate STT service, only the assistant transcription is returned. User transcription is not surfaced to the client, even though the Gemini model does appear to receive and process user audio.

Apologies if this turns out to be an issue on Gemini's side.

When the transcription option is enabled, only assistant transcription comes back from the Gemini model. Sometimes it works fine, but most of the time user transcription is missing. I also tried to capture user transcription using a custom frame processor, but it does not appear to be returned at all.

Is there a way to enable a "transcribe audio" option similar to the previous Gemini multimodal library version?

Reproduction steps

Running the foundational example 26a-gemini-live-transcription.py as-is.

Expected behavior

Both user and assistant transcriptions should appear in the client UI:

assistant: Hello! Do you want to hear a joke?
user: Yeah, sure.
assistant: Why don't scientists trust atoms?
user: I'm not sure.
assistant: Because they make everything up!

Actual behavior

Only the assistant transcription is displayed in the client UI. The user messages are missing:

Image

(Screenshot: The client UI shows only "assistant" messages with no user messages visible.)

Logs

Note: the server-side logs do show [Transcription:user] debug lines from Gemini, but these do not appear to propagate to the client.

2026-02-19 23:05:06.077 | INFO     | __main__:run_bot:55 - Starting bot
2026-02-19 23:05:06.093 | DEBUG    | pipecat.audio.vad.silero:__init__:147 - Loading Silero VAD model...
2026-02-19 23:05:06.127 | DEBUG    | pipecat.audio.vad.silero:__init__:169 - Loaded Silero VAD
2026-02-19 23:05:06.128 | DEBUG    | pipecat.audio.turn.smart_turn.local_smart_turn_v3:__init__:74 - Loading Local Smart Turn v3.x model from /Users/will/dev/phil/.venv/lib/python3.13/site-packages/pipecat/audio/turn/smart_turn/data/smart-turn-v3.2-cpu.onnx...
2026-02-19 23:05:06.151 | DEBUG    | pipecat.audio.turn.smart_turn.local_smart_turn_v3:__init__:85 - Loaded Local Smart Turn v3.x
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking Pipeline#0::Source -> SmallWebRTCInputTransport#0
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking SmallWebRTCInputTransport#0 -> LLMUserAggregator#0
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking LLMUserAggregator#0 -> GeminiLiveLLMService#0
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking GeminiLiveLLMService#0 -> SmallWebRTCOutputTransport#0
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking SmallWebRTCOutputTransport#0 -> LLMAssistantAggregator#0
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking LLMAssistantAggregator#0 -> Pipeline#0::Sink
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking PipelineTask#0::Source -> RTVIProcessor#0
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking RTVIProcessor#0 -> Pipeline#0
2026-02-19 23:05:06.151 | DEBUG    | pipecat.processors.frame_processor:link:561 - Linking Pipeline#0 -> PipelineTask#0::Sink
2026-02-19 23:05:06.151 | DEBUG    | pipecat.pipeline.runner:run:71 - Runner PipelineRunner#0 started running PipelineTask#0
2026-02-19 23:05:06.152 | DEBUG    | pipecat.pipeline.task:_wait_for_pipeline_start:718 - PipelineTask#0: Starting. Waiting for StartFrame#0 to reach the end of the pipeline...
2026-02-19 23:05:06.153 | INFO     | pipecat.services.google.gemini_live.llm:_connect:1072 - Connecting to Gemini service
2026-02-19 23:05:06.228 | INFO     | pipecat.services.google.gemini_live.llm:_connection_task_handler:1187 - Connected to Gemini service
2026-02-19 23:05:06.228 | DEBUG    | pipecat.services.google.gemini_live.llm:_create_initial_response:1390 - Creating initial response
2026-02-19 23:05:07.088 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:131 - GeminiLiveLLMService#0 TTFB: 0.8599798679351807
2026-02-19 23:05:07.636 | DEBUG    | pipecat.transports.base_output:_bot_started_speaking:608 - Bot started speaking
2026-02-19 23:05:09.828 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:173 - GeminiLiveLLMService#0 prompt tokens: 355, completion tokens: 60, reasoning tokens: 35
2026-02-19 23:05:09.871 | INFO     | __main__:on_assistant_turn_stopped:133 - Transcript: [2026-02-19T23:05:07.089+00:00] assistant: Hello! Do you want to hear a joke?
2026-02-19 23:05:10.220 | DEBUG    | pipecat.transports.base_output:_bot_stopped_speaking:630 - Bot stopped speaking
2026-02-19 23:05:10.725 | DEBUG    | pipecat.processors.aggregators.llm_response_universal:_on_user_turn_started:685 - LLMUserAggregator#0: User started speaking
2026-02-19 23:05:11.556 | DEBUG    | pipecat.services.google.gemini_live.llm:_handle_msg_input_transcription:1683 - [Transcription:user] [Yeah, sure.]
2026-02-19 23:05:11.576 | DEBUG    | pipecat.audio.turn.smart_turn.base_smart_turn:analyze_end_of_turn:162 - End of Turn result: EndOfTurnState.INCOMPLETE
2026-02-19 23:05:14.500 | DEBUG    | pipecat.processors.aggregators.llm_response_universal:_on_user_turn_stopped:703 - LLMUserAggregator#0: User stopped speaking
2026-02-19 23:05:14.501 | INFO     | __main__:on_user_turn_stopped:127 - Transcript: [2026-02-19T23:05:10.725+00:00] user: Yeah, sure.
2026-02-19 23:05:15.699 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:173 - GeminiLiveLLMService#0 prompt tokens: 471, completion tokens: 77, reasoning tokens: 23
2026-02-19 23:05:15.715 | INFO     | __main__:on_assistant_turn_stopped:133 - Transcript: [2026-02-19T23:05:12.284+00:00] assistant: Why don't scientists trust atoms?
2026-02-19 23:05:16.063 | DEBUG    | pipecat.transports.base_output:_bot_stopped_speaking:630 - Bot stopped speaking
2026-02-19 23:05:17.220 | DEBUG    | pipecat.processors.aggregators.llm_response_universal:_on_user_turn_started:685 - LLMUserAggregator#0: User started speaking
2026-02-19 23:05:18.033 | DEBUG    | pipecat.services.google.gemini_live.llm:_handle_msg_input_transcription:1683 - [Transcription:user] [I'm not sure.]
2026-02-19 23:05:18.149 | DEBUG    | pipecat.audio.turn.smart_turn.base_smart_turn:analyze_end_of_turn:162 - End of Turn result: EndOfTurnState.COMPLETE
2026-02-19 23:05:18.149 | DEBUG    | pipecat.processors.aggregators.llm_response_universal:_on_user_turn_stopped:703 - LLMUserAggregator#0: User stopped speaking
2026-02-19 23:05:18.149 | INFO     | __main__:on_user_turn_stopped:127 - Transcript: [2026-02-19T23:05:17.220+00:00] user: I'm not sure.
2026-02-19 23:05:18.775 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:stop_ttfb_metrics:131 - GeminiLiveLLMService#0 TTFB: 0.6261801719665527
2026-02-19 23:05:19.237 | DEBUG    | pipecat.transports.base_output:_bot_started_speaking:608 - Bot started speaking
2026-02-19 23:05:21.747 | DEBUG    | pipecat.processors.metrics.frame_processor_metrics:start_llm_usage_metrics:173 - GeminiLiveLLMService#0 prompt tokens: 601, completion tokens: 69, reasoning tokens: 28
2026-02-19 23:05:21.752 | INFO     | __main__:on_assistant_turn_stopped:133 - Transcript: [2026-02-19T23:05:18.777+00:00] assistant: Because they make everything up!
2026-02-19 23:05:22.101 | DEBUG    | pipecat.transports.base_output:_bot_stopped_speaking:630 - Bot stopped speaking
2026-02-19 23:05:22.505 | DEBUG    | pipecat.transports.smallwebrtc.connection:_handle_new_connection_state:564 - Connection state changed to: closed
2026-02-19 23:05:22.506 | INFO     | __main__:on_client_disconnected:120 - Client disconnected
2026-02-19 23:05:22.508 | INFO     | pipecat.services.google.gemini_live.llm:_disconnect:1292 - Disconnecting from Gemini service
2026-02-19 23:05:22.523 | DEBUG    | pipecat.pipeline.task:run:616 - Pipeline task PipelineTask#0 has finished

Key observation: The server logs show [Transcription:user] debug lines from Gemini (e.g., [Transcription:user] [Yeah, sure.]), and on_user_turn_stopped fires with the correct text. However, the user transcription does not appear in the client UI — only assistant messages are displayed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions