Skip to content

Commit 829c5f4

Browse files
authored
Merge pull request pipecat-ai#3169 from Incanta/hathora
Add Hathora STT and TTS services
2 parents e69ccd8 + dc8ea61 commit 829c5f4

9 files changed

Lines changed: 503 additions & 2 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -73,9 +73,9 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
7373

7474
| Category | Services |
7575
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
76-
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
76+
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [Hathora](https://docs.pipecat.ai/server/services/stt/hathora), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
7777
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
78-
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
78+
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hathora](https://docs.pipecat.ai/server/services/tts/hathora), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
7979
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
8080
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
8181
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |

changelog/3169.added.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
- Added Hathora service to support Hathora-hosted TTS and STT models (only non-streaming)

env.example

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -85,6 +85,9 @@ GROK_API_KEY=...
8585
# Groq
8686
GROQ_API_KEY=...
8787

88+
# Hathora
89+
HATHORA_API_KEY=...
90+
8891
# Heygen
8992
HEYGEN_API_KEY=...
9093
HEYGEN_LIVE_AVATAR_API_KEY=...
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
#
2+
# Copyright (c) 2024–2025, Daily
3+
#
4+
# SPDX-License-Identifier: BSD 2-Clause License
5+
#
6+
7+
import os
8+
9+
from dotenv import load_dotenv
10+
from loguru import logger
11+
12+
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
13+
from pipecat.audio.vad.silero import SileroVADAnalyzer
14+
from pipecat.audio.vad.vad_analyzer import VADParams
15+
from pipecat.frames.frames import LLMRunFrame
16+
from pipecat.pipeline.pipeline import Pipeline
17+
from pipecat.pipeline.runner import PipelineRunner
18+
from pipecat.pipeline.task import PipelineParams, PipelineTask
19+
from pipecat.processors.aggregators.llm_context import LLMContext
20+
from pipecat.processors.aggregators.llm_response_universal import (
21+
LLMContextAggregatorPair,
22+
LLMUserAggregatorParams,
23+
)
24+
from pipecat.runner.types import RunnerArguments
25+
from pipecat.runner.utils import create_transport
26+
from pipecat.services.hathora.stt import HathoraSTTService
27+
from pipecat.services.hathora.tts import HathoraTTSService
28+
from pipecat.services.openai.llm import OpenAILLMService
29+
from pipecat.transports.base_transport import BaseTransport, TransportParams
30+
from pipecat.transports.daily.transport import DailyParams
31+
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
32+
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
33+
from pipecat.turns.user_turn_strategies import UserTurnStrategies
34+
35+
load_dotenv(override=True)
36+
37+
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
38+
# instantiated. The function will be called when the desired transport gets
39+
# selected.
40+
transport_params = {
41+
"daily": lambda: DailyParams(
42+
audio_in_enabled=True,
43+
audio_out_enabled=True,
44+
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
45+
),
46+
"twilio": lambda: FastAPIWebsocketParams(
47+
audio_in_enabled=True,
48+
audio_out_enabled=True,
49+
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
50+
),
51+
"webrtc": lambda: TransportParams(
52+
audio_in_enabled=True,
53+
audio_out_enabled=True,
54+
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
55+
),
56+
}
57+
58+
59+
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
60+
logger.info(f"Starting bot")
61+
62+
stt = HathoraSTTService(
63+
model="nvidia-parakeet-tdt-0.6b-v3",
64+
)
65+
66+
tts = HathoraTTSService(
67+
model="hexgrad-kokoro-82m",
68+
)
69+
70+
# See https://models.hathora.dev/model/qwen3-30b-a3b
71+
llm = OpenAILLMService(
72+
base_url="https://app-362f7ca1-6975-4e18-a605-ab202bf2c315.app.hathora.dev/v1",
73+
api_key=os.getenv("HATHORA_API_KEY"),
74+
model=None,
75+
)
76+
77+
messages = [
78+
{
79+
"role": "system",
80+
"content": "You are a helpful LLM in a WebRTC call. Your goal is to demonstrate your capabilities in a succinct way. Your output will be spoken aloud, so avoid special characters that can't easily be spoken, such as emojis or bullet points. Respond to what the user said in a creative and helpful way.",
81+
},
82+
]
83+
84+
context = LLMContext(messages)
85+
context_aggregator = LLMContextAggregatorPair(
86+
context,
87+
user_params=LLMUserAggregatorParams(
88+
user_turn_strategies=UserTurnStrategies(
89+
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
90+
),
91+
),
92+
)
93+
94+
pipeline = Pipeline(
95+
[
96+
transport.input(), # Transport user input
97+
stt,
98+
context_aggregator.user(), # User responses
99+
llm, # LLM
100+
tts, # TTS
101+
transport.output(), # Transport bot output
102+
context_aggregator.assistant(), # Assistant spoken responses
103+
]
104+
)
105+
106+
task = PipelineTask(
107+
pipeline,
108+
params=PipelineParams(
109+
enable_metrics=True,
110+
enable_usage_metrics=True,
111+
),
112+
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
113+
)
114+
115+
@transport.event_handler("on_client_connected")
116+
async def on_client_connected(transport, client):
117+
logger.info(f"Client connected")
118+
# Kick off the conversation.
119+
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
120+
await task.queue_frames([LLMRunFrame()])
121+
122+
@transport.event_handler("on_client_disconnected")
123+
async def on_client_disconnected(transport, client):
124+
logger.info(f"Client disconnected")
125+
await task.cancel()
126+
127+
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
128+
129+
await runner.run(task)
130+
131+
132+
async def bot(runner_args: RunnerArguments):
133+
"""Main bot entry point compatible with Pipecat Cloud."""
134+
transport = await create_transport(runner_args, transport_params)
135+
await run_bot(transport, runner_args)
136+
137+
138+
if __name__ == "__main__":
139+
from pipecat.runner.run import main
140+
141+
main()

0 commit comments

Comments
 (0)