Skip to content

Add latency breakdown to UserBotLatencyObserver#3885

Merged
markbackman merged 8 commits intomainfrom
mb/latency-breakdown
Mar 3, 2026
Merged

Add latency breakdown to UserBotLatencyObserver#3885
markbackman merged 8 commits intomainfrom
mb/latency-breakdown

Conversation

@markbackman
Copy link
Copy Markdown
Contributor

@markbackman markbackman commented Mar 1, 2026

Context

The goal is to measure the user to bot latency and to display the contributing factors to that latency number. The possible contributors are:

  • VAD stop_secs
  • STT latency
  • Turn detection
  • LLM latency
  • Pre-TTS text aggregation
  • TTS latency
  • Function call execution time

The sequence of events looks something like this:
user-bot-events

We are breaking the measurement down into the following categories:

  • User turn secs: This measures from when the user actually stops speaking (VADStoppedSpeakingFrame - VADParams stop_secs) to when the UserStoppedSpeakingFrame occurs. There are two main contributors to this phase: STT latency and turn detection. When an STT latency is available, we display that correspond TTFB value. We do not explicitly display the turn detection latency because it is difficult to characterize, but it is generally going to be the main contributor to the long user_turn_secs values.
  • TTFB latencies: STT, LLM, TTS. The LLM and TTS portions directly impact the response time. The STT is a function of more complex logic, which I mention above.
  • Text aggregation latency: This is the time to build a sentence from the LLM streamed token. If sentence aggregation is in place, we measure it to help the developer understand the impact.
  • Function call latency: Time spent executing tool/function calls between LLM roundtrips. Measured from FunctionCallInProgressFrame to FunctionCallResultFrame, making previously invisible API call time visible in the waterfall.

Note: we do not provide a measurement for the turn result because it does not have a direct impact on response time. The result itself is used by the LLMUserAggregator logic to determine how to respond, when the user stop strategy includes a turn analyzer. The turn result latency is available in a MetricsFrame for those that want access, perhaps for troubleshooting or debugging.

Summary

  • Added on_latency_breakdown event to UserBotLatencyObserver with a LatencyBreakdown model providing per-service TTFB, text aggregation, user turn duration, and function call latency metrics
  • Added on_first_bot_speech_latency event measuring time from ClientConnectedFrame to first BotStartedSpeakingFrame, with a latency breakdown including per-service metrics. Skipped when user speaks first (only meaningful for greetings).
  • Added FunctionCallMetrics tracking to LatencyBreakdown, measuring execution time of each function call via FunctionCallInProgressFrame to FunctionCallResultFrame
  • Measures user_turn_secs from actual user silence (VAD timestamp minus stop_secs) to turn release, capturing VAD silence detection, STT finalization, and turn analyzer wait time
  • Resets metric accumulators on InterruptionFrame to discard stale metrics from cancelled LLM/TTS cycles
  • All breakdown data models are Pydantic BaseModel with start_time (Unix timestamp) and duration_secs fields, enabling .model_dump() serialization and timeline plotting
  • Updated example 29 with waterfall-style latency breakdown display including function calls

Testing

uv run pytest tests/test_user_bot_latency_observer.py -v

🤖 Generated with Claude Code

markbackman added a commit that referenced this pull request Mar 1, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented Mar 1, 2026

Codecov Report

❌ Patch coverage is 98.14815% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/observers/user_bot_latency_observer.py 98.14% 2 Missing ⚠️
Files with missing lines Coverage Δ
src/pipecat/observers/user_bot_latency_observer.py 97.67% <98.14%> (+5.67%) ⬆️

... and 15 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@markbackman markbackman force-pushed the mb/latency-breakdown branch 2 times, most recently from cf75eb3 to 776ca5c Compare March 1, 2026 12:55
markbackman added a commit that referenced this pull request Mar 1, 2026
@markbackman markbackman force-pushed the mb/latency-breakdown branch from 776ca5c to 858404e Compare March 1, 2026 14:14
@markbackman markbackman changed the base branch from main to mb/startup-observer March 1, 2026 14:14
@markbackman markbackman force-pushed the mb/latency-breakdown branch from 858404e to f6ed923 Compare March 1, 2026 15:01
@markbackman markbackman force-pushed the mb/startup-observer branch from c2e0e01 to 58d0bfc Compare March 2, 2026 15:53
@markbackman markbackman force-pushed the mb/latency-breakdown branch from de0ac63 to 8b41398 Compare March 2, 2026 16:03
@markbackman markbackman changed the base branch from mb/startup-observer to main March 2, 2026 16:22
@markbackman markbackman changed the base branch from main to mb/startup-observer March 2, 2026 16:22
@markbackman markbackman force-pushed the mb/startup-observer branch from 4270f12 to 75669b1 Compare March 2, 2026 18:10
Base automatically changed from mb/startup-observer to main March 2, 2026 21:07
The ServiceSettings refactor (PR #3714) changed self._settings from
dicts to dataclass subclasses, but tracing code still used .items(),
in containment, and subscript access, causing AttributeError on
every traced call. Use given_fields() for iteration and attribute
access for named fields.
Add per-service latency breakdown metrics alongside existing user-to-bot
latency measurement. When enable_metrics=True, the observer now emits an
on_latency_breakdown event with TTFB, text aggregation, and user turn
duration metrics collected between VADUserStoppedSpeakingFrame and
BotStartedSpeakingFrame.

- Add LatencyBreakdown dataclass with ttfb, text_aggregation,
  user_turn_secs fields
- Accumulate MetricsFrame data during user→bot cycles
- Reset accumulators on InterruptionFrame to discard stale metrics
- Measure user_turn_secs from actual user silence (VAD timestamp -
  stop_secs) to turn release (UserStoppedSpeakingFrame)
- Filter zero-value TTFB entries from startup metric resets
- Add frame deduplication using bounded deque + set pattern
- Update example 29 with latency breakdown display
Measure time from ClientConnectedFrame to first BotStartedSpeakingFrame,
emitting a one-time on_first_bot_speech_latency event with breakdown.
Enables .model_dump() serialization for Pipecat Cloud collection.
All metrics now include start_time (Unix timestamp) for timeline
plotting alongside duration_secs.
@markbackman markbackman force-pushed the mb/latency-breakdown branch from 8b41398 to ff5b985 Compare March 2, 2026 21:11

if breakdown.text_aggregation:
ta = breakdown.text_aggregation
logger.info(f" {ta.processor}: text aggregation {ta.duration_secs:.3f}s")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this might be common. I'm wondering if we could have a function that we could call that returns us a chronological list of strings. Like:

events = observer.chronological_events()

So people could just iterate and print.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, pushing a commit with this function.

@aconchillo
Copy link
Copy Markdown
Contributor

Approving! This is great. Just added a comment to simplify examples/apps a bit.

@markbackman markbackman merged commit aad1211 into main Mar 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants