Skip to content

feat: handle exceptions for BaseOpenAILLMService#3529

Merged
markbackman merged 2 commits intopipecat-ai:mainfrom
lukepayyapilli:fix/llm-timeout-without-retry
Jan 29, 2026
Merged

feat: handle exceptions for BaseOpenAILLMService#3529
markbackman merged 2 commits intopipecat-ai:mainfrom
lukepayyapilli:fix/llm-timeout-without-retry

Conversation

@lukepayyapilli
Copy link
Copy Markdown
Contributor

@lukepayyapilli lukepayyapilli commented Jan 22, 2026

Summary

Enables retry_timeout_secs to cause failure (not just retry) when retry_on_timeout=False, allowing integration with LLMSwitcher for slow response handling.

Closes #3481

Changes

  • When retry_timeout_secs is explicitly set, the timeout now applies regardless of retry_on_timeout.
  • When timeout occurs and retry_on_timeout=False, the TimeoutError propagates (enabling LLMSwitcher).
  • Backwards compatible: default behavior unchanged.

Behavior Matrix

retry_timeout_secs retry_on_timeout Behavior
None (default) False (default) No timeout
None True 5.0s timeout + retry
Explicit (e.g., 5.0) False Timeout + fail (NEW)
Explicit (e.g., 5.0) True Timeout + retry

Usage

# For LLMSwitcher integration - fail on slow responses
llm = OpenAILLMService(
    model="gpt-4",
    retry_timeout_secs=2.0,  # Fail after 2 seconds
    retry_on_timeout=False,   # Don't retry, let LLMSwitcher handle it
)

Design Considerations

The parameter name retry_timeout_secs is admittedly misleading since it now controls non-retry behavior too. I considered renaming to timeout_secs but opted for backwards compatibility. A future cleanup could:

  • Deprecate retry_timeout_secs in favor of clearer timeout_secs.
  • Simplify the parameter interaction.

@codecov
Copy link
Copy Markdown

codecov Bot commented Jan 22, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
src/pipecat/services/openai/base_llm.py 47.33% <100.00%> (+9.38%) ⬆️

... and 11 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

@markbackman markbackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

It sounds like you want an exception raised (or in Pipecat terms, an ErrorFrame pushed) when an exception is caught as part of a standard completion. That makes sense. I don't think I'd integrate it with the retry system though.

We could just update the process_frame() method to handle the exception:

    async def process_frame(self, frame: Frame, direction: FrameDirection):
        """Process frames for LLM completion requests.

        Handles OpenAILLMContextFrame, LLMContextFrame, LLMMessagesFrame,
        and LLMUpdateSettingsFrame to trigger LLM completions and manage
        settings.

        Args:
            frame: The frame to process.
            direction: The direction of frame processing.
        """
        await super().process_frame(frame, direction)

        context = None
        if isinstance(frame, OpenAILLMContextFrame):
            # Handle OpenAI-specific context frames
            context = frame.context
        elif isinstance(frame, LLMContextFrame):
            # Handle universal (LLM-agnostic) LLM context frames
            context = frame.context
        elif isinstance(frame, LLMMessagesFrame):
            # NOTE: LLMMessagesFrame is deprecated, so we don't support the newer universal
            # LLMContext with it
            context = OpenAILLMContext.from_messages(frame.messages)
        elif isinstance(frame, LLMUpdateSettingsFrame):
            await self._update_settings(frame.settings)
        else:
            await self.push_frame(frame, direction)

        if context:
            try:
                await self.push_frame(LLMFullResponseStartFrame())
                await self.start_processing_metrics()
                await self._process_context(context)
            except httpx.TimeoutException:
                await self._call_event_handler("on_completion_timeout")
            except Exception as e:
                await self.push_error(error_msg=f"Error during completion: {e}", exception=e)
            finally:
                await self.stop_processing_metrics()
                await self.push_frame(LLMFullResponseEndFrame())

The new code being:

            except Exception as e:
                await self.push_error(error_msg=f"Error during completion: {e}", exception=e)

That is, an exception would be raised as a result of the completion failure. That would be caught in process_frame, emitting an ErrorFrame which will allow application code to catch the error. GoogleLLMService follows a similar pattern.

WDYT?

@lukepayyapilli lukepayyapilli force-pushed the fix/llm-timeout-without-retry branch from 7038c0d to 61ba2d1 Compare January 28, 2026 14:42
@lukepayyapilli lukepayyapilli force-pushed the fix/llm-timeout-without-retry branch from 61ba2d1 to ff0eb6d Compare January 28, 2026 14:44
Copy link
Copy Markdown
Contributor Author

@lukepayyapilli lukepayyapilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markbackman Thanks for the review and feedback! Your suggested approach of catching exceptions in process_frame() and pushing an ErrorFrame is a better fix than my original proposal.

I've intentionally kept the scope narrow to address the original issue (#3481) which specifically requested timeout handling for LLMSwitcher failover. The change only catches httpx.TimeoutException and emits an error for that case. Happy to broaden it to handle other exceptions (e.g., API errors, rate limits) if you'd like.

Also added tests to verify:

  • ErrorFrame is pushed with correct message on timeout.
  • LLMFullResponseEndFrame is still pushed in the finally block.
  • on_completion_timeout handler is still called.

Copy link
Copy Markdown
Contributor

@markbackman markbackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I like your idea about TimeoutException. Just one suggestion about broadening the handling for Exception.

Also, thanks for tests 🙏

Comment thread src/pipecat/services/openai/base_llm.py
Copy link
Copy Markdown
Contributor Author

@lukepayyapilli lukepayyapilli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markbackman - done! Added the catch-all Exception handler as suggested. Also added a test to verify the general exception handling works correctly. Please let me know if you'd like me to change anything else and thank you for the suggestions!

Copy link
Copy Markdown
Contributor

@markbackman markbackman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! 🙌

@markbackman markbackman changed the title feat: enable timeout-based failure for LLM services feat: handle exceptions for BaseOpenAILLMService Jan 29, 2026
@markbackman markbackman merged commit b77a50d into pipecat-ai:main Jan 29, 2026
5 checks passed
@lukepayyapilli lukepayyapilli deleted the fix/llm-timeout-without-retry branch January 29, 2026 14:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Allow passing timeout to LLM services (for FAILING, not for retry)

2 participants