Add Inworld Realtime Service#4140
Conversation
c1911c0 to
3ab7e18
Compare
Codecov Report❌ Patch coverage is
... and 5 files with indirect coverage changes 🚀 New features to boost your workflow:
|
03d1054 to
3960c79
Compare
|
Hi @cshape, when trying to run the example Could you double check whether everything is working as expected ? |
filipi87
left a comment
There was a problem hiding this comment.
Thanks for pushing this PR, @cshape.
Overall, it looks good. I’ve left a few comments that we should address.
I noticed that we’re reusing a lot of code from OpenAIRealtime. But the same pattern exists with XAI (Grok), so once this lands, I’ll work on a broader refactor to introduce a shared base class that all three can extend.
For now, it’s probably best to handle that in a follow-up so we don’t block this PR.
Let me know once you’ve addressed the comments so I can review it again 👍.
Adds a WebSocket-based realtime service for Inworld's cascade STT/LLM/TTS API with semantic VAD, function calling, and streaming transcription support. New files: - src/pipecat/services/inworld/realtime/ (service, events) - src/pipecat/adapters/services/inworld_realtime_adapter.py - examples/foundational/19zb-inworld-realtime.py Also includes: - websockets dependency for inworld extra in pyproject.toml - Adapter and settings tests matching OpenAI/Grok realtime patterns - Fix for double-response when server-side VAD is enabled
Adopt _resolve_system_instruction() from BaseLLMAdapter, matching the pattern applied to OpenAI Realtime, Grok Realtime, Gemini Live, and Nova Sonic in the pk/realtime-services-init-v-context-system-instructions-cleanup branch.
- Change default model from gpt-4.1-nano to gpt-4.1-mini - Add function calling demo to example - Remove demo-testing artifact from system instruction - Mention Router support in changelog
- Move example to examples/realtime/realtime-inworld.py - Change initial context role from "user" to "developer" - Remove explicit sample rates from example; sync them in _ensure_audio_config so Inworld gets the transport's actual rates - Add audio race condition guard in _handle_evt_audio_delta (matches OpenAI realtime pattern) - Convert remaining "system"/"developer" messages to "user" in adapter - Add clarifying comment for local-VAD vs server-VAD metrics paths
- Remove function calling from example, switch model to xai/grok-4-1-fast-non-reasoning - Add pipecat-realtime session key prefix and provider_data metadata for Inworld traffic attribution - Remove local VAD code path (Inworld only supports server-side VAD) - Use typed InputAudioBufferAppendEvent for audio sends
3960c79 to
3f47a1e
Compare
- Remove non-functional AdapterType.SHIM custom tools code from adapter - Default STT model to assemblyai/u3-rt-pro - Default VAD eagerness to low
Add Inworld's Realtime Speech to Speech API as a Pipecat service.