Add latency breakdown to UserBotLatencyObserver#3885
Merged
markbackman merged 8 commits intomainfrom Mar 3, 2026
Merged
Conversation
markbackman
added a commit
that referenced
this pull request
Mar 1, 2026
Codecov Report❌ Patch coverage is
... and 15 files with indirect coverage changes 🚀 New features to boost your workflow:
|
cf75eb3 to
776ca5c
Compare
markbackman
added a commit
that referenced
this pull request
Mar 1, 2026
776ca5c to
858404e
Compare
858404e to
f6ed923
Compare
c2e0e01 to
58d0bfc
Compare
de0ac63 to
8b41398
Compare
4270f12 to
75669b1
Compare
The ServiceSettings refactor (PR #3714) changed self._settings from dicts to dataclass subclasses, but tracing code still used .items(), in containment, and subscript access, causing AttributeError on every traced call. Use given_fields() for iteration and attribute access for named fields.
Add per-service latency breakdown metrics alongside existing user-to-bot latency measurement. When enable_metrics=True, the observer now emits an on_latency_breakdown event with TTFB, text aggregation, and user turn duration metrics collected between VADUserStoppedSpeakingFrame and BotStartedSpeakingFrame. - Add LatencyBreakdown dataclass with ttfb, text_aggregation, user_turn_secs fields - Accumulate MetricsFrame data during user→bot cycles - Reset accumulators on InterruptionFrame to discard stale metrics - Measure user_turn_secs from actual user silence (VAD timestamp - stop_secs) to turn release (UserStoppedSpeakingFrame) - Filter zero-value TTFB entries from startup metric resets - Add frame deduplication using bounded deque + set pattern - Update example 29 with latency breakdown display
Measure time from ClientConnectedFrame to first BotStartedSpeakingFrame, emitting a one-time on_first_bot_speech_latency event with breakdown.
Enables .model_dump() serialization for Pipecat Cloud collection. All metrics now include start_time (Unix timestamp) for timeline plotting alongside duration_secs.
8b41398 to
ff5b985
Compare
aconchillo
reviewed
Mar 2, 2026
|
|
||
| if breakdown.text_aggregation: | ||
| ta = breakdown.text_aggregation | ||
| logger.info(f" {ta.processor}: text aggregation {ta.duration_secs:.3f}s") |
Contributor
There was a problem hiding this comment.
Since this might be common. I'm wondering if we could have a function that we could call that returns us a chronological list of strings. Like:
events = observer.chronological_events()
So people could just iterate and print.
Contributor
Author
There was a problem hiding this comment.
Ok, pushing a commit with this function.
aconchillo
approved these changes
Mar 2, 2026
Contributor
|
Approving! This is great. Just added a comment to simplify examples/apps a bit. |
This was referenced Apr 9, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Context
The goal is to measure the user to bot latency and to display the contributing factors to that latency number. The possible contributors are:
The sequence of events looks something like this:

We are breaking the measurement down into the following categories:
FunctionCallInProgressFrametoFunctionCallResultFrame, making previously invisible API call time visible in the waterfall.Note: we do not provide a measurement for the turn result because it does not have a direct impact on response time. The result itself is used by the LLMUserAggregator logic to determine how to respond, when the user stop strategy includes a turn analyzer. The turn result latency is available in a MetricsFrame for those that want access, perhaps for troubleshooting or debugging.
Summary
on_latency_breakdownevent toUserBotLatencyObserverwith aLatencyBreakdownmodel providing per-service TTFB, text aggregation, user turn duration, and function call latency metricson_first_bot_speech_latencyevent measuring time fromClientConnectedFrameto firstBotStartedSpeakingFrame, with a latency breakdown including per-service metrics. Skipped when user speaks first (only meaningful for greetings).FunctionCallMetricstracking toLatencyBreakdown, measuring execution time of each function call viaFunctionCallInProgressFrametoFunctionCallResultFrameuser_turn_secsfrom actual user silence (VAD timestamp minus stop_secs) to turn release, capturing VAD silence detection, STT finalization, and turn analyzer wait timeInterruptionFrameto discard stale metrics from cancelled LLM/TTS cyclesBaseModelwithstart_time(Unix timestamp) andduration_secsfields, enabling.model_dump()serialization and timeline plottingTesting
🤖 Generated with Claude Code