Skip to content

Assemblyai u3 rt pro#3856

Merged
markbackman merged 23 commits intopipecat-ai:mainfrom
zkleb-aai:assemblyai-u3-rt-pro
Mar 3, 2026
Merged

Assemblyai u3 rt pro#3856
markbackman merged 23 commits intopipecat-ai:mainfrom
zkleb-aai:assemblyai-u3-rt-pro

Conversation

@zkleb-aai
Copy link
Copy Markdown
Contributor

Summary

Add support for AssemblyAI's u3-rt-pro streaming model with enhanced features including two-mode turn detection, dynamic parameter updates, speaker diarization,
and comprehensive debugging capabilities.

Key Features

🎯 u3-rt-pro Model Support

  • New u3-rt-pro speech model option (set as default)
  • Optimized for low-latency voice agent applications
  • Support for custom prompting and keyterms boosting

🔄 Two-Mode Turn Detection

Pipecat Mode (vad_force_turn_endpoint=True, default):

  • VAD + Smart Turn analyzer controls turn endings
  • ForceEndpoint message sent on VAD stop
  • max_turn_silence synchronized with min_end_of_turn_silence_when_confident to avoid double turn detection
  • Best for voice agents requiring precise interruption control

STT Mode (vad_force_turn_endpoint=False, u3-rt-pro only):

  • AssemblyAI model controls turn detection
  • Respects all timing parameters as configured
  • Emits UserStartedSpeakingFrame/UserStoppedSpeakingFrame from STT
  • Uses SpeechStarted events for fast barge-in

🔄 Dynamic Parameter Updates

Update configuration mid-stream without reconnection via STTUpdateSettingsFrame:

  • keyterms_prompt - Boost specific words/names (when API supports it)
  • prompt - Custom transcription prompts
  • max_turn_silence - Maximum silence before forcing turn end
  • min_end_of_turn_silence_when_confident - Silence threshold for confident turn endings

🎤 Speaker Diarization

  • Enable with speaker_labels=True in connection params
  • Optional speaker_format parameter for custom formatting (e.g., "<{speaker}>{text}</{speaker}>")
  • Speaker labels included in final transcripts

🌍 Language Detection

  • Automatic language detection support (when enabled with multilingual models)
  • Language confidence scoring
  • Automatic fallback to English for low-confidence detections

Bug Fixes

  • Speaker diarization: Add field alias mapping for speaker_labelspeaker in TurnMessage model
  • STT mode imports: Add missing UserStartedSpeakingFrame and UserStoppedSpeakingFrame imports
  • Dynamic updates: Fix _update_settings to properly send UpdateConfiguration messages to AssemblyAI

Improvements

Enhanced Warnings

  • Warn when min_end_of_turn_silence_when_confident is not set to optimal 100ms
  • Warn when custom prompts are used (recommend testing with defaults first)
  • Warn when max_turn_silence is overridden in Pipecat mode
  • Better error messages for prompt + keyterms conflicts

Comprehensive Logging

  • Log final connection parameters after modifications
  • Log WebSocket URL and parsed parameters
  • Log all transcript details (text, timing, confidence, speaker)
  • Log text sent to LLM with speaker formatting
  • Debug logging for turn detection mode behavior

Documentation

  • Detailed docstrings for all new parameters
  • Clear explanation of two-mode turn detection
  • Examples for dynamic updates and speaker formatting

Models Support

Model Pipecat Mode STT Mode Notes
u3-rt-pro Recommended, supports all features
universal-streaming-english No SpeechStarted events
universal-streaming-multilingual No SpeechStarted events

Breaking Changes

None - all changes are backward compatible. Default behavior unchanged for existing users.

Testing

Extensively tested with 23-test comprehensive suite covering:

  • Basic configuration variations
  • Custom prompting and keyterms with difficult names
  • Speaker diarization with and without formatting
  • Dynamic single and multiple parameter updates
  • Mode comparisons (Pipecat vs STT)
  • STT mode timing experiments
  • Edge cases (very short/long silence thresholds)

Example Usage

Basic u3-rt-pro:

stt = AssemblyAISTTService(                                                                                                                                        
    api_key="your-key",                                                                                                                                            
    connection_params=AssemblyAIConnectionParams(                                                                                                                  
        speech_model="u3-rt-pro",                                                                                                                                  
        min_end_of_turn_silence_when_confident=100,                                                                                                                
    )                                                                                                                                                              
)                                                                                                                                                                  
                                                                                                                                                                   
With speaker diarization:                                                                                                                                          
stt = AssemblyAISTTService(                                                                                                                                        
    api_key="your-key",                                                                                                                                            
    connection_params=AssemblyAIConnectionParams(                                                                                                                  
        speech_model="u3-rt-pro",                                                                                                                                  
        speaker_labels=True,                                                                                                                                       
    ),                                                                                                                                                             
    speaker_format="[{speaker}] {text}"                                                                                                                            
)                                                                                                                                                                  
                                                                                                                                                                   
STT mode (model controls turns):                                                                                                                                   
stt = AssemblyAISTTService(                                                                                                                                        
    api_key="your-key",                                                                                                                                            
    connection_params=AssemblyAIConnectionParams(                                                                                                                  
        speech_model="u3-rt-pro",                                                                                                                                  
        min_end_of_turn_silence_when_confident=100,                                                                                                                
        max_turn_silence=5000,                                                                                                                                     
    ),                                                                                                                                                             
    vad_force_turn_endpoint=False  # Use STT mode                                                                                                                  
)                                                                                                                                                                  
                                                                                                                                                                   
Dynamic updates:                                                                                                                                                   
from pipecat.services.assemblyai.stt import AssemblyAISTTSettings                                                                                                  
                                                                                                                                                                   
# Update keyterms mid-conversation                                                                                                                                 
await task.queue_frame(                                                                                                                                            
    STTUpdateSettingsFrame(                                                                                                                                        
        delta=AssemblyAISTTSettings(                                                                                                                               
            connection_params=AssemblyAIConnectionParams(                                                                                                          
                keyterms_prompt=["Xiomara", "Saoirse", "Pipecat"]                                                                                                  
            )                                                                                                                                                      
        )                                                                                                                                                          
    )                                                                                                                                                              
)                                                                                                                                                                  
                                                                                                                                                                   
Related Issues                                                                                                                                                     
                                                                                                                                                                   
Closes #[issue-number-if-applicable]                                                                                                                               
                                                                                                                                                                   
---                                                                                                                                                                
🤖 Generated with https://claude.com/claude-code  

- Fix speaker diarization: Add field alias for speaker_label → speaker
  mapping in TurnMessage model
- Add warning for non-optimal min_end_of_turn_silence_when_confident
  values (recommends 100ms for best latency)
- Improve max_turn_silence override warning message clarity
- Update custom prompt warning (remove 88% accuracy claim)
- Add comprehensive logging for debugging:
  - Log final connection params after modifications
  - Log WebSocket URL and parsed parameters
  - Log speaker field in transcripts
  - Log text sent to LLM with speaker formatting
- Support dynamic configuration updates via STTUpdateSettingsFrame:
  - keyterms_prompt (when AssemblyAI API supports it)
  - prompt
  - max_turn_silence
  - min_end_of_turn_silence_when_confident
@markbackman markbackman self-requested a review February 27, 2026 18:51
Copy link
Copy Markdown
Contributor

@markbackman markbackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This generally looks good.

You might want to add another foundational example (07 series) showing how to set up AssemblyAISTTService using the u3-rt-pro model, where it's acting as user turn controller.

The key is that you set up the LLMContextAggregatorPair as:

user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
    context,
    user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
)

Using ExternalUserTurnStrategies tells the aggregator to defer turn control to the STT (or external processor), which in this case is AssemblyAISTTService.

Reach out on Slack if you have any questions.


Also, don't forget two additional steps:

  1. Submit a changelog: https://github.com/pipecat-ai/pipecat/blob/main/CONTRIBUTING.md#changelog-entries
  2. Lint the code (uv run scripts/fix-ruff.sh or install the pre-commit hook: uv run pre-commit install)

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
import json
from dataclasses import dataclass, field
from typing import Any, AsyncGenerator, Dict, Optional
from typing import Any, AsyncGenerator, Dict, Mapping, Optional
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: unused, remove.

Suggested change
from typing import Any, AsyncGenerator, Dict, Mapping, Optional
from typing import Any, AsyncGenerator, Dict, Optional

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
self._vad_speaking = False

# Log final connection params after any modifications
logger.info(f"{self} Final connection params being sent to AssemblyAI:")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Out of curiosity, why are lines L215-217 info logs?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mark! I'll remove these, I was using these for debugging. Sorry about that!

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
logger.info(f" max_turn_silence: {self._settings.connection_params.max_turn_silence}")

# Warn if min_end_of_turn_silence_when_confident is not 100ms
if self._settings.connection_params.min_end_of_turn_silence_when_confident != 100:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this ever be set to anything other than 100ms? If so, do you have docs you can link to in order to educate the user? (Maybe in docstrings?)

If it should never be set to anything other than 100ms, maybe remove?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mark! Yes, setting to something higher than 100ms has the potential to improve accuracy for people who take larger gaps in speech. From our testing, 100ms is the optimal value, but we want to leave the parameter configurable in case anyone would like to change it for their use case.

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
- else → InterimTranscriptionFrame
"""
# Log transcript details
logger.info(f"{self} ===== TRANSCRIPT RECEIVED =====")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the info logs. If users want this info, they can use an observer to access the TranscriptionFrame.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% will remove!

if is_final_turn:
finalize_confirmed = bool(message.turn_is_formatted)
if finalize_confirmed:
self.confirm_finalize()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use confirm_finalize(), you need to also finalize the audio being sent using the request_finalize() method.

This pattern is appropriate when you have a message you send to Assembly to tell the service to finalize, which is when you call request_finalize(). Then, you have metadata in the transcript data returned from Assembly that indicate that the audio from the finalize request is received, which is when you call confirm_finalize(). Closing the loop in this way gives Pipecat confidence that the user's audio has been fully transcribed and it can proceed to the next processing step. This process is particularly important for services that emit multiple finals from user audio.

Not all services work like this though; others guarantee that an audio input equals an audio output (e.g. ElevenLabs with their commit process). Or, other stream tokens including an end token (e.g. Soniox, Speechmatics). Those services just finalize without the methods calls (e.g. set TranscriptionFrame.finalized=True).

You know best how your service works, so please follow one of these patterns. Or, ask questions if you're still unclear.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Mark, understood, basically confirm_finalize is used on transcript received but request_finalize is used when we force end of transcription. I'll make that update.


logger.debug(f"{self} Processing SpeechStarted in STT mode")
await self.start_processing_metrics()
await self.broadcast_frame(UserStartedSpeakingFrame)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In which scenarios is the SpeechStartedMessage message received?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, make sure there's only one broadcast of UserStartedSpeakingFrame and the paired UserStoppedSpeakingFrame for each case where Assembly is handling the role of "user turn controller" (e.g. emitting the User Speaking Frames).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SpeechStarted is only received in STT turn detection mode (vad_force_turn_endpoint=False) with u3-rt-pro only. It arrives before any transcripts. The transcript-based fallback was for older streaming models, but since those models aren't supported in STT mode (validated in init), I removed it to ensure clean pairing of UserStarted/StoppedSpeakingFrame.

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
logger.debug(f"{self} Transcript received in STT mode (_user_speaking={self._user_speaking})")
if not self._user_speaking:
logger.warning(f"{self} Transcript arrived before SpeechStarted, broadcasting fallback UserStartedSpeakingFrame")
await self.broadcast_frame(UserStartedSpeakingFrame)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see UserStartedSpeakingFrame broadcasted again here. Is there any risk of double broadcasting UserStartedSpeakingFrame?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also come with:

        if self._should_interrupt:
            await self.push_interruption_task_frame_and_wait()

The role of the User Turn controller is to:

  • On speech started:
    • Broadcast UserStartedSpeakingFrame
    • Call push_interruption_task_frame_and_wait()
  • On speech stopped:
    • Broadcast UserStoppedSpeakingFrame

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since u3-rt-pro guarantees SpeechStarted arrives before transcripts, I removed the fallback entirely. Now the pairing is clean.

@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 27, 2026

Codecov Report

❌ Patch coverage is 0% with 129 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/services/assemblyai/stt.py 0.00% 105 Missing ⚠️
src/pipecat/services/assemblyai/models.py 0.00% 24 Missing ⚠️
Files with missing lines Coverage Δ
src/pipecat/services/assemblyai/models.py 0.00% <0.00%> (ø)
src/pipecat/services/assemblyai/stt.py 0.00% <0.00%> (ø)

... and 46 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…zed flag in STT mode

- Add request_finalize() before sending ForceEndpoint in Pipecat mode
- Keep confirm_finalize() when receiving formatted finals in Pipecat mode
- Remove confirm_finalize() from STT mode (use finalized=True instead)

This follows Pipecat's two-step finalization pattern where request_finalize()
is called when sending a finalize request to the STT service, and
confirm_finalize() is called when receiving confirmation back.
u3-rt-pro guarantees SpeechStarted is always sent before transcripts,
so the fallback UserStartedSpeakingFrame broadcast is never needed.

This ensures clean pairing of UserStarted/StoppedSpeakingFrame:
- Start: Always from _handle_speech_started
- Stop: Always from _handle_transcription on final turn
@markbackman markbackman requested a review from kompfner February 27, 2026 20:15
@markbackman
Copy link
Copy Markdown
Contributor

@kompfner it could be worth you looking at this from the perspective of the STTSettings.

- Remove unused Mapping import
- Remove info logs at initialization (connection params)
- Remove info logs in _handle_transcription (transcript details, text sent to LLM)
- Remove info logs in _build_ws_url (WebSocket URL and params)
- Keep debug logs (less verbose, appropriate for development)
The request_finalize() method in STTService is synchronous (sets a flag),
but was being called with await in the VAD turn endpoint handling code.
This caused "object NoneType can't be used in 'await' expression" errors.

Also includes automatic formatting improvements from ruff.
- 07o-interruptible-assemblyai.py: Basic example using Pipecat VAD mode
- 07o-interruptible-assemblyai-stt.py: Advanced example using STT-controlled
  turn detection with comprehensive documentation on u3-rt-pro features
  (turn detection tuning, prompt-based enhancement, speaker diarization)
…nt to min_turn_silence

- Add "beta feature" note to custom prompt warning
- Rename min_end_of_turn_silence_when_confident parameter to min_turn_silence across all AssemblyAI code
- Update documentation, examples, and test files to use new parameter name
- Update 13d-assemblyai-transcription.py to explicitly use u3-rt-pro model
- Update 55d-update-settings-assemblyai-stt.py to demonstrate keyterms updates instead of language updates
- Add helpful logging to show before/after keyterms boosting effect
- Use difficult names (Xiomara, Saoirse, Krzystof) to demonstrate boosting effectiveness
… parameter

- Keep old parameter name for backward compatibility
- Add deprecation warning when old parameter is used
- Automatically migrate old parameter value to new min_turn_silence parameter
- Exclude deprecated parameter from WebSocket URL to avoid sending it to API
- New parameter takes precedence if both are set
- Makes deprecation warning visible in logs without needing Python warning flags
- Users will see the warning during normal operation
Copy link
Copy Markdown
Contributor

@markbackman markbackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!

Two big things:

  1. I'm finding the code a bit hard to follow and I think this is because the mapping of what each model supports is divided in different places in the codebase.

It would be helpful if this definition were reinforced in code, or if the model and param combinations were documented in one place.

  1. The concept of calling one STT mode and the other not is confusing. It's an STT service, so having an STT mode doesn't really make much sense, at least to me. Perhaps we talking about turn detection, right? Maybe that's the terminology that we need to be clear about.

I'll continue to review more but wanted to get this early feedback in.

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
Comment thread src/pipecat/services/assemblyai/stt.py Outdated
Only applies to Mode 2 (STT turn detection). In Mode 1, VAD +
smart turn analyzer handle interruptions via the aggregator.
"""
logger.debug(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this log.

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
logger.debug(f"{self} SpeechStarted ignored in Pipecat mode")
return # Mode 1: handled by aggregator

logger.debug(f"{self} Processing SpeechStarted in STT mode")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove log

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
if self._should_interrupt:
await self.push_interruption_task_frame_and_wait()
self._user_speaking = True
logger.debug(f"{self} _user_speaking set to True")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove log

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
await self._trace_transcription(transcript_text, True, language)
await self.stop_processing_metrics()
else:
logger.debug(f'{self} Interim transcript: "{transcript_text}"')
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or set to trace.

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
context = LLMContext(messages)
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
context,
user_params=LLMUserAggregatorParams(user_turn_strategies=ExternalUserTurnStrategies()),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's include the VADAnalyzer so we get TTFB measurements.

    user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
        context,
        user_params=LLMUserAggregatorParams(
            user_turn_strategies=ExternalUserTurnStrategies(),
            vad_analyzer=SileroVADAnalyzer(),
        ),
    )

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
def _configure_manual_turn_mode(
self, connection_params: AssemblyAIConnectionParams
self._user_speaking = False
self._vad_speaking = False
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused. Remove.

Comment thread src/pipecat/services/assemblyai/stt.py Outdated
old_conn_params = changed.get("connection_params")

# Check each potentially changed parameter
if hasattr(conn_params, "keyterms_prompt"):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These attributes always exist, so this will evaluate to True.

Instead, I think you want a simple check

if (                                                                                                                                                                              
    old_conn_params is None                                                                                                                                                       
    or conn_params.keyterms_prompt != old_conn_params.keyterms_prompt                                                                                                             
):                                                                   
    if conn_params.keyterms_prompt is not None:
        ...

Comment thread changelog/3856.changed.md Outdated
@markbackman
Copy link
Copy Markdown
Contributor

One more: I see end_of_turn_confidence_threshold flipped from 1.0 to 0.0 for universal-streaming. This was required to get a fast response. Do you still get an equally fast response with the universal-streaming response with this configuration?

zkleb-aai and others added 6 commits March 2, 2026 17:04
Co-authored-by: Mark Backman <m.backman@gmail.com>
Co-authored-by: Mark Backman <m.backman@gmail.com>
Co-authored-by: Mark Backman <m.backman@gmail.com>
…yAI turn detection'

- Rename 07o-interruptible-assemblyai-stt.py -> 07o-interruptible-assemblyai-turn-detection.py
- Replace 'STT mode' with 'AssemblyAI turn detection mode' throughout codebase
- Replace 'Mode 1'/'Mode 2' with descriptive 'Pipecat turn detection'/'AssemblyAI turn detection'
- Update changelog to use 'built-in turn detection' terminology
- Addresses PR feedback about confusing terminology
…sal-streaming

- u3-rt-pro: Does not set parameter (not used)
- universal-streaming models: Set to 1.0 to maintain fast response
- This ensures fast response time matches previous implementation
@zkleb-aai
Copy link
Copy Markdown
Contributor Author

Changes Made

Code Improvements

  • ✅ Removed all debug logs as requested
  • ✅ Fixed hasattr() logic - now properly checks if values changed instead of always evaluating to True
  • ✅ Removed unused _vad_speaking variable
  • ✅ Added SileroVADAnalyzer to the 07o example for TTFB measurements

Terminology Improvements

  • ✅ Renamed 07o-interruptible-assemblyai-stt.py → 07o-interruptible-assemblyai-turn-detection.py
  • ✅ Replaced all instances of "STT mode" with "AssemblyAI turn detection mode" for clarity
  • ✅ Replaced "Mode 1"/"Mode 2" with descriptive "Pipecat turn detection"/"AssemblyAI turn detection"
  • ✅ Updated all docstrings, comments, and examples to use the clearer terminology

Bug Fix

  • ✅ Fixed end_of_turn_confidence_threshold: Changed from 0.0 to 1.0 for universal-streaming models (caught a bug
    in my initial implementation)

Answers to Your Questions

Re: Terminology

The concept of calling one STT mode and the other not is confusing. It's an STT service, so having an STT mode
doesn't really make much sense, at least to me. Perhaps we talking about turn detection, right?

Absolutely right! I've updated all references to use "turn detection mode" terminology:

  • Pipecat turn detection mode - VAD + Smart Turn controls when user is done speaking
  • AssemblyAI turn detection mode - AssemblyAI's model controls turn detection using built-in logic

Re: end_of_turn_confidence_threshold

One more: I see end_of_turn_confidence_threshold flipped from 1.0 to 0.0 for universal-streaming. This was required
to get a fast response. Do you still get an equally fast response with the universal-streaming response with this
configuration?

Good catch - this was actually a bug! I've corrected it. The parameter behavior is now:

  • u3-rt-pro: Not used/not set (parameter doesn't affect this model)
  • universal-streaming models: Set to 1.0 (not 0.0) to maintain fast response

So yes, with the corrected value of 1.0, universal-streaming models maintain the same fast response. This disables
semantic turn detection for these legacy models, ensuring Pipecat's VAD-based turn detection controls the flow.

Re: Model/Parameter Mappings

I'm finding the code a bit hard to follow and I think this is because the mapping of what each model supports is
divided in different places in the codebase.

Great point. I've created comprehensive documentation with a model comparison table that shows exactly what
features each model supports in one place. I'm happy to add this to the PR or Pipecat docs if helpful - just let me
know where it would be most useful!

Copy link
Copy Markdown
Contributor

@markbackman markbackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for the fixes.

Can you please lint the code?
uv run scripts/fix-ruff.sh

@markbackman markbackman merged commit c79a739 into pipecat-ai:main Mar 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants