@@ -24,39 +24,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2424 A list of strategies can be specified for both strategies; strategies are
2525 evaluated in order until one evaluates to true.
2626
27- Available user turn start strategies:
28- - VADUserTurnStartStrategy
29- - TranscriptionUserTurnStartStrategy
30- - MinWordsUserTurnStartStrategy
31- - ExternalUserTurnStartStrategy
27+ Available user turn start strategies:
3228
33- Available user turn stop strategies:
34- - TranscriptionUserTurnStopStrategy
35- - TurnAnalyzerUserTurnStopStrategy
36- - ExternalUserTurnStopStrategy
29+ - VADUserTurnStartStrategy
30+ - TranscriptionUserTurnStartStrategy
31+ - MinWordsUserTurnStartStrategy
32+ - ExternalUserTurnStartStrategy
3733
38- The default strategies are :
34+ Available user turn stop strategies:
3935
40- - start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
41- - stop: [TranscriptionUserTurnStopStrategy]
36+ - TranscriptionUserTurnStopStrategy
37+ - TurnAnalyzerUserTurnStopStrategy
38+ - ExternalUserTurnStopStrategy
4239
43- urn strategies are configured when setting up `LLMContextAggregatorPair`.
40+ The default strategies are:
41+
42+ - start: [ VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
43+ - stop: [ TranscriptionUserTurnStopStrategy]
44+
45+ Turn strategies are configured when setting up ` LLMContextAggregatorPair ` .
4446 For example:
4547
46- ```python
47- context_aggregator = LLMContextAggregatorPair(
48- context,
49- user_params=LLMUserAggregatorParams(
50- user_turn_strategies=UserTurnStrategies(
51- stop=[
52- TurnAnalyzerUserTurnStopStrategy(
53- turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
54- )
55- ],
56- )
57- ),
58- )
59- ```
48+ ``` python
49+ context_aggregator = LLMContextAggregatorPair(
50+ context,
51+ user_params = LLMUserAggregatorParams(
52+ user_turn_strategies = UserTurnStrategies(
53+ stop = [
54+ TurnAnalyzerUserTurnStopStrategy(turn_analyzer = LocalSmartTurnAnalyzerV3(params = SmartTurnParams())
55+ )
56+ ],
57+ )
58+ ),
59+ )
60+ ```
6061
6162 In order to use the user turn strategies you must update to the new
6263 universal ` LLMContext ` and ` LLMContextAggregatorPair ` .
@@ -69,13 +70,13 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
6970- Added ` GrokRealtimeLLMService ` for xAI's Grok Voice Agent API with real-time
7071 voice conversations:
7172
72- - Support for real-time audio streaming with WebSocket connection
73- - Built-in server-side VAD (Voice Activity Detection)
74- - Multiple voice options: Ara, Rex, Sal, Eve, Leo
75- - Built-in tools support: web_search, x_search, file_search
76- - Custom function calling with standard Pipecat tools schema
77- - Configurable audio formats (PCM at 8kHz-48kHz)
78- (PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))
73+ - Support for real-time audio streaming with WebSocket connection
74+ - Built-in server-side VAD (Voice Activity Detection)
75+ - Multiple voice options: Ara, Rex, Sal, Eve, Leo
76+ - Built-in tools support: web_search, x_search, file_search
77+ - Custom function calling with standard Pipecat tools schema
78+ - Configurable audio formats (PCM at 8kHz-48kHz)
79+ (PR [ #3267 ] ( https://github.com/pipecat-ai/pipecat/pull/3267 ) )
7980
8081- Added an approximation of TTFB for Ultravox.
8182 (PR [ #3268 ] ( https://github.com/pipecat-ai/pipecat/pull/3268 ) )
@@ -86,11 +87,12 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
8687 (PR [ #3289 ] ( https://github.com/pipecat-ai/pipecat/pull/3289 ) )
8788
8889- ` LLMUserAggregator ` now exposes the following events:
89- - `on_user_turn_started`: triggered when a user turn starts
90- - `on_user_turn_stopped`: triggered when a user turn ends
91- - `on_user_turn_stop_timeout`: triggered when a user turn does not stop
92- and times out
93- (PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))
90+
91+ - ` on_user_turn_started ` : triggered when a user turn starts
92+ - ` on_user_turn_stopped ` : triggered when a user turn ends
93+ - ` on_user_turn_stop_timeout ` : triggered when a user turn does not stop
94+ and times out
95+ (PR [ #3291 ] ( https://github.com/pipecat-ai/pipecat/pull/3291 ) )
9496
9597- Introducing user mute strategies. User mute strategies indicate when user
9698 input should be muted based on the current system state.
@@ -104,29 +106,29 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
104106 frame is muted if any of the configured strategies indicates it should be
105107 muted.
106108
107- Available user mute strategies:
109+ Available user mute strategies:
108110
109- * `FirstSpeechUserMuteStrategy`
110- * `MuteUntilFirstBotCompleteUserMuteStrategy`
111- * `AlwaysUserMuteStrategy`
112- * `FunctionCallUserMuteStrategy`
111+ - ` FirstSpeechUserMuteStrategy `
112+ - ` MuteUntilFirstBotCompleteUserMuteStrategy `
113+ - ` AlwaysUserMuteStrategy `
114+ - ` FunctionCallUserMuteStrategy `
113115
114116 User mute strategies replace the legacy ` STTMuteFilter ` and provide a more
115117 flexible and composable approach to muting user input.
116118
117119 User mute strategies are configured when setting up the
118120 ` LLMContextAggregatorPair ` . For example:
119121
120- ```python
121- context_aggregator = LLMContextAggregatorPair(
122- context,
123- user_params=LLMUserAggregatorParams(
124- user_mute_strategies=[
125- FirstSpeechUserMuteStrategy(),
126- ]
127- ),
128- )
129- ```
122+ ``` python
123+ context_aggregator = LLMContextAggregatorPair(
124+ context,
125+ user_params = LLMUserAggregatorParams(
126+ user_mute_strategies = [
127+ FirstSpeechUserMuteStrategy(),
128+ ]
129+ ),
130+ )
131+ ```
130132
131133 In order to use user mute strategies you should update to the new universal
132134 ` LLMContext ` and ` LLMContextAggregatorPair ` .
@@ -159,16 +161,17 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
159161 (PR [ #3357 ] ( https://github.com/pipecat-ai/pipecat/pull/3357 ) )
160162
161163- Added image support to ` OpenAIRealtimeLLMService ` via ` InputImageRawFrame ` :
162- - New `start_video_paused` parameter to control initial video input state
163- - New `video_frame_detail` parameter to set image processing quality
164- ("auto",
165- "low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
166- parameter.
167- - `set_video_input_paused()` method to pause/resume video input at runtime
168- - `set_video_frame_detail()` method to adjust video frame quality
169- dynamically
170- - Automatic rate limiting (1 frame per second) to prevent API overload
171- (PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))
164+
165+ - New ` start_video_paused ` parameter to control initial video input state
166+ - New ` video_frame_detail ` parameter to set image processing quality
167+ ("auto",
168+ "low", or "high"). This corresponds to OpenAI Realtime's ` image_detail `
169+ parameter.
170+ - ` set_video_input_paused() ` method to pause/resume video input at runtime
171+ - ` set_video_frame_detail() ` method to adjust video frame quality
172+ dynamically
173+ - Automatic rate limiting (1 frame per second) to prevent API overload
174+ (PR [ #3360 ] ( https://github.com/pipecat-ai/pipecat/pull/3360 ) )
172175
173176- Added ` UserTurnProcessor ` , a frame processor built on ` UserTurnController `
174177 that pushes ` UserStartedSpeakingFrame ` and ` UserStoppedSpeakingFrame ` frames
@@ -188,11 +191,12 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
188191 (PR [ #3374 ] ( https://github.com/pipecat-ai/pipecat/pull/3374 ) )
189192
190193- ` LLMAssistantAggregator ` now exposes the following events:
191- - `on_assistant_turn_started`: triggered when the assistant turn starts
192- - `on_assistant_turn_stopped`: triggered when the assistant turn ends
193- - `on_assistant_thought`: triggered when there's an assistant thought
194- available
195- (PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
194+
195+ - ` on_assistant_turn_started ` : triggered when the assistant turn starts
196+ - ` on_assistant_turn_stopped ` : triggered when the assistant turn ends
197+ - ` on_assistant_thought ` : triggered when there's an assistant thought
198+ available
199+ (PR [ #3385 ] ( https://github.com/pipecat-ai/pipecat/pull/3385 ) )
196200
197201- Added ` KrispVivaTurn ` analyzer for end of turn detection using the Krisp VIVA
198202 SDK (requires ` krisp_audio ` ).
@@ -202,13 +206,14 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
202206 register custom pipeline task setup files by setting the
203207 ` PIPECAT_SETUP_FILES ` environment variable. This variable should contain a
204208 colon-separated list of Python files (e.g. `export
205- PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
209+ PIPECAT_SETUP_FILES="setup1.py: setup .py:..."`). Each file must define a
206210 function with the following signature:
207211
208- ```python
209- async def setup_pipeline_task(task: PipelineTask):
210- ...
211- ```
212+ ``` python
213+ async def setup_pipeline_task (task : PipelineTask):
214+ ...
215+ ```
216+
212217 (PR [ #3397 ] ( https://github.com/pipecat-ai/pipecat/pull/3397 ) )
213218
214219- Added a keepalive task for ` InworldTTSService ` to keep the service connected
@@ -238,29 +243,33 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
238243
239244- Updated ` ElevenLabsRealtimeSTTService ` to accept the
240245 ` include_language_detection ` parameter to detect language.
241- ```python
242- stt = ElevenLabsRealtimeSTTService(
243- api_key=os.getenv("ELEVENLABS_API_KEY"),
244- include_language_detection=True
245- )
246- ```
246+
247+ ``` python
248+ stt = ElevenLabsRealtimeSTTService(
249+ api_key = os.getenv(" ELEVENLABS_API_KEY" ),
250+ include_language_detection = True
251+ )
252+ ```
253+
247254 (PR [ #3216 ] ( https://github.com/pipecat-ai/pipecat/pull/3216 ) )
248255
249256- Updated ` SpeechmaticsSTTService ` to use new Python Voice SDK with improved
250257 VAD, Smart Turn capabilities, and brings dramatic improvements to latency
251258 without any impact on accuracy. Use the ` turn_detection_mode ` parameter to control
252259 the endpointing of speech, with ` TurnDetectionMode.EXTERNAL ` (default),
253260 ` TurnDetectionMode.ADAPTIVE ` , or ` TurnDetectionMode.SMART_TURN ` .
254- ```python
261+
262+ ``` python
255263 stt = SpeechmaticsSTTService(
256264 api_key = os.getenv(" SPEECHMATICS_API_KEY" ),
257265 params = SpeechmaticsSTTService.InputParams(
258266 language = Language.EN ,
259- turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
267+ turn_detection_mode = SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE ,
260268 speaker_active_format = " <{speaker_id} >{text} </{speaker_id} >" ,
261269 ),
262270 )
263- ```
271+ ```
272+
264273 (PR [ #3225 ] ( https://github.com/pipecat-ai/pipecat/pull/3225 ) )
265274
266275- ` daily-python ` updated to 0.23.0.
@@ -273,10 +282,15 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
273282
274283- Updates to Inworld TTS services:
275284
276- - Improved `InworldTTSService`'s websocket implementation to better flush
277- and close context to better handle long inputs.
278- - Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
279- (PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))
285+ - Improved ` InworldTTSService ` 's websocket implementation to better flush
286+ and close context to better handle long inputs.
287+ - Improved docstrings for ` InworldTTSService ` and ` InworldHttpTTSService ` .
288+ (PR [ #3288 ] ( https://github.com/pipecat-ai/pipecat/pull/3288 ) )
289+
290+ - Improved the error handling and reconnection logic for ` WebsocketServer ` by
291+ distinguishing between errors when disconnecting and websocket communication
292+ errors.
293+ (PR [ #3392 ] ( https://github.com/pipecat-ai/pipecat/pull/3392 ) )
280294
281295- Updated ` DeepgramSTTService ` to push user started/stopped speaking and
282296 interruption frames when ` vad_enabled ` is set to true. This centralizes the
@@ -308,7 +322,8 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
308322- Smart Turn now takes into account ` vad_start_seconds ` when buffering audio,
309323 meaning that the start of the turn audio is not cut off. This improves
310324 accuracy for short utterances.
311- - The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
325+
326+ - The default value of ` pre_speech_ms ` is now set to 500ms for Smart Turn.
312327 (PR [ #3377 ] ( https://github.com/pipecat-ai/pipecat/pull/3377 ) )
313328
314329- Improved Krisp SDK management to allow ` KrispVivaTurn ` and ` KrispVivaFilter `
@@ -376,17 +391,18 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
376391 From the developer's point of view, switching to using ` LLMContext `
377392 machinery will usually be a matter of going from this:
378393
379- ```python
380- context = OpenAILLMContext(messages, tools)
381- context_aggregator = llm.create_context_aggregator(context)
382- ```
394+ ``` python
395+ context = OpenAILLMContext(messages, tools)
396+ context_aggregator = llm.create_context_aggregator(context)
397+ ```
383398
384- To this:
399+ To this:
400+
401+ ```
402+ context = LLMContext(messages, tools)
403+ context_aggregator = LLMContextAggregatorPair(context)
404+ ```
385405
386- ```
387- context = LLMContext(messages, tools)
388- context_aggregator = LLMContextAggregatorPair(context)
389- ```
390406 (PR [ #3263 ] ( https://github.com/pipecat-ai/pipecat/pull/3263 ) )
391407
392408- ` STTMuteFilter ` is deprecated and will be removed in a future version. Use
@@ -401,16 +417,17 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
401417 ` LLMUserAggregator ` 's new parameter ` user_turn_strategies ` instead. For
402418 example, to disable interruptions but still get user turns you can do:
403419
404- ```python
405- context_aggregator = LLMContextAggregatorPair(
406- context,
407- user_params=LLMUserAggregatorParams(
408- user_turn_strategies=UserTurnStrategies(
409- start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
410- ),
411- ),
412- )
413- ```
420+ ``` python
421+ context_aggregator = LLMContextAggregatorPair(
422+ context,
423+ user_params = LLMUserAggregatorParams(
424+ user_turn_strategies = UserTurnStrategies(
425+ start = [TranscriptionUserTurnStartStrategy(enable_interruptions = False )],
426+ ),
427+ ),
428+ )
429+ ```
430+
414431 (PR [ #3297 ] ( https://github.com/pipecat-ai/pipecat/pull/3297 ) )
415432
416433- ` TranscriptProcessor ` and related data classes and frames
@@ -433,7 +450,8 @@ start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
433450### Fixed
434451
435452- Improved error handling in ` ElevenLabsRealtimeSTTService `
436- - Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
453+
454+ - Fixed an issue in ` ElevenLabsRealtimeSTTService ` causing an infinite loop
437455 that blocks the process if the websocket disconnects due to an error
438456 (PR [ #3233 ] ( https://github.com/pipecat-ai/pipecat/pull/3233 ) )
439457
@@ -446,13 +464,14 @@ start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
446464 (PR [ #3322 ] ( https://github.com/pipecat-ai/pipecat/pull/3322 ) )
447465
448466- Updated ` SpeechmaticsSTTService ` for version ` 0.0.99+ ` :
449- - Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
450- in order to finalize transcription.
451- - Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
452- detection.
453- - Only emit VAD + interruption frames if VAD is enabled within the plugin
454- (modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
455- (PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
467+
468+ - Fixed ` SpeechmaticsSTTService ` to listen for ` VADUserStoppedSpeakingFrame `
469+ in order to finalize transcription.
470+ - Default to ` TurnDetectionMode.FIXED ` for Pipecat-controlled end of turn
471+ detection.
472+ - Only emit VAD + interruption frames if VAD is enabled within the plugin
473+ (modes other than ` TurnDetectionMode.FIXED ` or ` TurnDetectionMode.EXTERNAL ` ).
474+ (PR [ #3328 ] ( https://github.com/pipecat-ai/pipecat/pull/3328 ) )
456475
457476- Fixed an issue with function calling where a handler failing to invoke its
458477 result callback could leave the context stuck in IN_PROGRESS, causing LLM
0 commit comments