Skip to content

Commit eab059c

Browse files
authored
Merge pull request pipecat-ai#3446 from pipecat-ai/mb/add-3392-changelog
Add PR 3392 to changelog, linting cleanup
2 parents a9bfb09 + 4aaff04 commit eab059c

1 file changed

Lines changed: 134 additions & 115 deletions

File tree

CHANGELOG.md

Lines changed: 134 additions & 115 deletions
Original file line numberDiff line numberDiff line change
@@ -24,39 +24,40 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2424
A list of strategies can be specified for both strategies; strategies are
2525
evaluated in order until one evaluates to true.
2626

27-
Available user turn start strategies:
28-
- VADUserTurnStartStrategy
29-
- TranscriptionUserTurnStartStrategy
30-
- MinWordsUserTurnStartStrategy
31-
- ExternalUserTurnStartStrategy
27+
Available user turn start strategies:
3228

33-
Available user turn stop strategies:
34-
- TranscriptionUserTurnStopStrategy
35-
- TurnAnalyzerUserTurnStopStrategy
36-
- ExternalUserTurnStopStrategy
29+
- VADUserTurnStartStrategy
30+
- TranscriptionUserTurnStartStrategy
31+
- MinWordsUserTurnStartStrategy
32+
- ExternalUserTurnStartStrategy
3733

38-
The default strategies are:
34+
Available user turn stop strategies:
3935

40-
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
41-
- stop: [TranscriptionUserTurnStopStrategy]
36+
- TranscriptionUserTurnStopStrategy
37+
- TurnAnalyzerUserTurnStopStrategy
38+
- ExternalUserTurnStopStrategy
4239

43-
urn strategies are configured when setting up `LLMContextAggregatorPair`.
40+
The default strategies are:
41+
42+
- start: [VADUserTurnStartStrategy, TranscriptionUserTurnStartStrategy]
43+
- stop: [TranscriptionUserTurnStopStrategy]
44+
45+
Turn strategies are configured when setting up `LLMContextAggregatorPair`.
4446
For example:
4547

46-
```python
47-
context_aggregator = LLMContextAggregatorPair(
48-
context,
49-
user_params=LLMUserAggregatorParams(
50-
user_turn_strategies=UserTurnStrategies(
51-
stop=[
52-
TurnAnalyzerUserTurnStopStrategy(
53-
turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
54-
)
55-
],
56-
)
57-
),
58-
)
59-
```
48+
```python
49+
context_aggregator = LLMContextAggregatorPair(
50+
context,
51+
user_params=LLMUserAggregatorParams(
52+
user_turn_strategies=UserTurnStrategies(
53+
stop=[
54+
TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
55+
)
56+
],
57+
)
58+
),
59+
)
60+
```
6061

6162
In order to use the user turn strategies you must update to the new
6263
universal `LLMContext` and `LLMContextAggregatorPair`.
@@ -69,13 +70,13 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
6970
- Added `GrokRealtimeLLMService` for xAI's Grok Voice Agent API with real-time
7071
voice conversations:
7172

72-
- Support for real-time audio streaming with WebSocket connection
73-
- Built-in server-side VAD (Voice Activity Detection)
74-
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
75-
- Built-in tools support: web_search, x_search, file_search
76-
- Custom function calling with standard Pipecat tools schema
77-
- Configurable audio formats (PCM at 8kHz-48kHz)
78-
(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))
73+
- Support for real-time audio streaming with WebSocket connection
74+
- Built-in server-side VAD (Voice Activity Detection)
75+
- Multiple voice options: Ara, Rex, Sal, Eve, Leo
76+
- Built-in tools support: web_search, x_search, file_search
77+
- Custom function calling with standard Pipecat tools schema
78+
- Configurable audio formats (PCM at 8kHz-48kHz)
79+
(PR [#3267](https://github.com/pipecat-ai/pipecat/pull/3267))
7980

8081
- Added an approximation of TTFB for Ultravox.
8182
(PR [#3268](https://github.com/pipecat-ai/pipecat/pull/3268))
@@ -86,11 +87,12 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
8687
(PR [#3289](https://github.com/pipecat-ai/pipecat/pull/3289))
8788

8889
- `LLMUserAggregator` now exposes the following events:
89-
- `on_user_turn_started`: triggered when a user turn starts
90-
- `on_user_turn_stopped`: triggered when a user turn ends
91-
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop
92-
and times out
93-
(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))
90+
91+
- `on_user_turn_started`: triggered when a user turn starts
92+
- `on_user_turn_stopped`: triggered when a user turn ends
93+
- `on_user_turn_stop_timeout`: triggered when a user turn does not stop
94+
and times out
95+
(PR [#3291](https://github.com/pipecat-ai/pipecat/pull/3291))
9496

9597
- Introducing user mute strategies. User mute strategies indicate when user
9698
input should be muted based on the current system state.
@@ -104,29 +106,29 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
104106
frame is muted if any of the configured strategies indicates it should be
105107
muted.
106108

107-
Available user mute strategies:
109+
Available user mute strategies:
108110

109-
* `FirstSpeechUserMuteStrategy`
110-
* `MuteUntilFirstBotCompleteUserMuteStrategy`
111-
* `AlwaysUserMuteStrategy`
112-
* `FunctionCallUserMuteStrategy`
111+
- `FirstSpeechUserMuteStrategy`
112+
- `MuteUntilFirstBotCompleteUserMuteStrategy`
113+
- `AlwaysUserMuteStrategy`
114+
- `FunctionCallUserMuteStrategy`
113115

114116
User mute strategies replace the legacy `STTMuteFilter` and provide a more
115117
flexible and composable approach to muting user input.
116118

117119
User mute strategies are configured when setting up the
118120
`LLMContextAggregatorPair`. For example:
119121

120-
```python
121-
context_aggregator = LLMContextAggregatorPair(
122-
context,
123-
user_params=LLMUserAggregatorParams(
124-
user_mute_strategies=[
125-
FirstSpeechUserMuteStrategy(),
126-
]
127-
),
128-
)
129-
```
122+
```python
123+
context_aggregator = LLMContextAggregatorPair(
124+
context,
125+
user_params=LLMUserAggregatorParams(
126+
user_mute_strategies=[
127+
FirstSpeechUserMuteStrategy(),
128+
]
129+
),
130+
)
131+
```
130132

131133
In order to use user mute strategies you should update to the new universal
132134
`LLMContext` and `LLMContextAggregatorPair`.
@@ -159,16 +161,17 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
159161
(PR [#3357](https://github.com/pipecat-ai/pipecat/pull/3357))
160162

161163
- Added image support to `OpenAIRealtimeLLMService` via `InputImageRawFrame`:
162-
- New `start_video_paused` parameter to control initial video input state
163-
- New `video_frame_detail` parameter to set image processing quality
164-
("auto",
165-
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
166-
parameter.
167-
- `set_video_input_paused()` method to pause/resume video input at runtime
168-
- `set_video_frame_detail()` method to adjust video frame quality
169-
dynamically
170-
- Automatic rate limiting (1 frame per second) to prevent API overload
171-
(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))
164+
165+
- New `start_video_paused` parameter to control initial video input state
166+
- New `video_frame_detail` parameter to set image processing quality
167+
("auto",
168+
"low", or "high"). This corresponds to OpenAI Realtime's `image_detail`
169+
parameter.
170+
- `set_video_input_paused()` method to pause/resume video input at runtime
171+
- `set_video_frame_detail()` method to adjust video frame quality
172+
dynamically
173+
- Automatic rate limiting (1 frame per second) to prevent API overload
174+
(PR [#3360](https://github.com/pipecat-ai/pipecat/pull/3360))
172175

173176
- Added `UserTurnProcessor`, a frame processor built on `UserTurnController`
174177
that pushes `UserStartedSpeakingFrame` and `UserStoppedSpeakingFrame` frames
@@ -188,11 +191,12 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
188191
(PR [#3374](https://github.com/pipecat-ai/pipecat/pull/3374))
189192

190193
- `LLMAssistantAggregator` now exposes the following events:
191-
- `on_assistant_turn_started`: triggered when the assistant turn starts
192-
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
193-
- `on_assistant_thought`: triggered when there's an assistant thought
194-
available
195-
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
194+
195+
- `on_assistant_turn_started`: triggered when the assistant turn starts
196+
- `on_assistant_turn_stopped`: triggered when the assistant turn ends
197+
- `on_assistant_thought`: triggered when there's an assistant thought
198+
available
199+
(PR [#3385](https://github.com/pipecat-ai/pipecat/pull/3385))
196200

197201
- Added `KrispVivaTurn` analyzer for end of turn detection using the Krisp VIVA
198202
SDK (requires `krisp_audio`).
@@ -202,13 +206,14 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
202206
register custom pipeline task setup files by setting the
203207
`PIPECAT_SETUP_FILES` environment variable. This variable should contain a
204208
colon-separated list of Python files (e.g. `export
205-
PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
209+
PIPECAT_SETUP_FILES="setup1.py:setup.py:..."`). Each file must define a
206210
function with the following signature:
207211

208-
```python
209-
async def setup_pipeline_task(task: PipelineTask):
210-
...
211-
```
212+
```python
213+
async def setup_pipeline_task(task: PipelineTask):
214+
...
215+
```
216+
212217
(PR [#3397](https://github.com/pipecat-ai/pipecat/pull/3397))
213218

214219
- Added a keepalive task for `InworldTTSService` to keep the service connected
@@ -238,29 +243,33 @@ turn_analyzer=LocalSmartTurnAnalyzerV3(params=SmartTurnParams())
238243

239244
- Updated `ElevenLabsRealtimeSTTService` to accept the
240245
`include_language_detection` parameter to detect language.
241-
```python
242-
stt = ElevenLabsRealtimeSTTService(
243-
api_key=os.getenv("ELEVENLABS_API_KEY"),
244-
include_language_detection=True
245-
)
246-
```
246+
247+
```python
248+
stt = ElevenLabsRealtimeSTTService(
249+
api_key=os.getenv("ELEVENLABS_API_KEY"),
250+
include_language_detection=True
251+
)
252+
```
253+
247254
(PR [#3216](https://github.com/pipecat-ai/pipecat/pull/3216))
248255

249256
- Updated `SpeechmaticsSTTService` to use new Python Voice SDK with improved
250257
VAD, Smart Turn capabilities, and brings dramatic improvements to latency
251258
without any impact on accuracy. Use the `turn_detection_mode` parameter to control
252259
the endpointing of speech, with `TurnDetectionMode.EXTERNAL` (default),
253260
`TurnDetectionMode.ADAPTIVE`, or `TurnDetectionMode.SMART_TURN`.
254-
```python
261+
262+
```python
255263
stt = SpeechmaticsSTTService(
256264
api_key=os.getenv("SPEECHMATICS_API_KEY"),
257265
params=SpeechmaticsSTTService.InputParams(
258266
language=Language.EN,
259-
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
267+
turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
260268
speaker_active_format="<{speaker_id}>{text}</{speaker_id}>",
261269
),
262270
)
263-
```
271+
```
272+
264273
(PR [#3225](https://github.com/pipecat-ai/pipecat/pull/3225))
265274

266275
- `daily-python` updated to 0.23.0.
@@ -273,10 +282,15 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
273282

274283
- Updates to Inworld TTS services:
275284

276-
- Improved `InworldTTSService`'s websocket implementation to better flush
277-
and close context to better handle long inputs.
278-
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
279-
(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))
285+
- Improved `InworldTTSService`'s websocket implementation to better flush
286+
and close context to better handle long inputs.
287+
- Improved docstrings for `InworldTTSService` and `InworldHttpTTSService`.
288+
(PR [#3288](https://github.com/pipecat-ai/pipecat/pull/3288))
289+
290+
- Improved the error handling and reconnection logic for `WebsocketServer` by
291+
distinguishing between errors when disconnecting and websocket communication
292+
errors.
293+
(PR [#3392](https://github.com/pipecat-ai/pipecat/pull/3392))
280294

281295
- Updated `DeepgramSTTService` to push user started/stopped speaking and
282296
interruption frames when `vad_enabled` is set to true. This centralizes the
@@ -308,7 +322,8 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
308322
- Smart Turn now takes into account `vad_start_seconds` when buffering audio,
309323
meaning that the start of the turn audio is not cut off. This improves
310324
accuracy for short utterances.
311-
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
325+
326+
- The default value of `pre_speech_ms` is now set to 500ms for Smart Turn.
312327
(PR [#3377](https://github.com/pipecat-ai/pipecat/pull/3377))
313328

314329
- Improved Krisp SDK management to allow `KrispVivaTurn` and `KrispVivaFilter`
@@ -376,17 +391,18 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
376391
From the developer's point of view, switching to using `LLMContext`
377392
machinery will usually be a matter of going from this:
378393

379-
```python
380-
context = OpenAILLMContext(messages, tools)
381-
context_aggregator = llm.create_context_aggregator(context)
382-
```
394+
```python
395+
context = OpenAILLMContext(messages, tools)
396+
context_aggregator = llm.create_context_aggregator(context)
397+
```
383398

384-
To this:
399+
To this:
400+
401+
```
402+
context = LLMContext(messages, tools)
403+
context_aggregator = LLMContextAggregatorPair(context)
404+
```
385405

386-
```
387-
context = LLMContext(messages, tools)
388-
context_aggregator = LLMContextAggregatorPair(context)
389-
```
390406
(PR [#3263](https://github.com/pipecat-ai/pipecat/pull/3263))
391407

392408
- `STTMuteFilter` is deprecated and will be removed in a future version. Use
@@ -401,16 +417,17 @@ turn_detection_mode=SpeechmaticsSTTService.TurnDetectionMode.ADAPTIVE,
401417
`LLMUserAggregator`'s new parameter `user_turn_strategies` instead. For
402418
example, to disable interruptions but still get user turns you can do:
403419

404-
```python
405-
context_aggregator = LLMContextAggregatorPair(
406-
context,
407-
user_params=LLMUserAggregatorParams(
408-
user_turn_strategies=UserTurnStrategies(
409-
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
410-
),
411-
),
412-
)
413-
```
420+
```python
421+
context_aggregator = LLMContextAggregatorPair(
422+
context,
423+
user_params=LLMUserAggregatorParams(
424+
user_turn_strategies=UserTurnStrategies(
425+
start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
426+
),
427+
),
428+
)
429+
```
430+
414431
(PR [#3297](https://github.com/pipecat-ai/pipecat/pull/3297))
415432

416433
- `TranscriptProcessor` and related data classes and frames
@@ -433,7 +450,8 @@ start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
433450
### Fixed
434451

435452
- Improved error handling in `ElevenLabsRealtimeSTTService`
436-
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
453+
454+
- Fixed an issue in `ElevenLabsRealtimeSTTService` causing an infinite loop
437455
that blocks the process if the websocket disconnects due to an error
438456
(PR [#3233](https://github.com/pipecat-ai/pipecat/pull/3233))
439457

@@ -446,13 +464,14 @@ start=[TranscriptionUserTurnStartStrategy(enable_interruptions=False)],
446464
(PR [#3322](https://github.com/pipecat-ai/pipecat/pull/3322))
447465

448466
- Updated `SpeechmaticsSTTService` for version `0.0.99+`:
449-
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
450-
in order to finalize transcription.
451-
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
452-
detection.
453-
- Only emit VAD + interruption frames if VAD is enabled within the plugin
454-
(modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
455-
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
467+
468+
- Fixed `SpeechmaticsSTTService` to listen for `VADUserStoppedSpeakingFrame`
469+
in order to finalize transcription.
470+
- Default to `TurnDetectionMode.FIXED` for Pipecat-controlled end of turn
471+
detection.
472+
- Only emit VAD + interruption frames if VAD is enabled within the plugin
473+
(modes other than `TurnDetectionMode.FIXED` or `TurnDetectionMode.EXTERNAL`).
474+
(PR [#3328](https://github.com/pipecat-ai/pipecat/pull/3328))
456475

457476
- Fixed an issue with function calling where a handler failing to invoke its
458477
result callback could leave the context stuck in IN_PROGRESS, causing LLM

0 commit comments

Comments
 (0)