Skip to content

Commit 30a3f42

Browse files
authored
Merge pull request pipecat-ai#3349 from eRuaro/feat/camb-tts-integration
Add Camb.ai TTS integration with MARS models
2 parents 24082b8 + 26ddb2d commit 30a3f42

8 files changed

Lines changed: 516 additions & 2 deletions

File tree

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ Catch new features, interviews, and how-tos on our [Pipecat TV](https://www.yout
7575
| ------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
7676
| Speech-to-Text | [AssemblyAI](https://docs.pipecat.ai/server/services/stt/assemblyai), [AWS](https://docs.pipecat.ai/server/services/stt/aws), [Azure](https://docs.pipecat.ai/server/services/stt/azure), [Cartesia](https://docs.pipecat.ai/server/services/stt/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/stt/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/stt/elevenlabs), [Fal Wizper](https://docs.pipecat.ai/server/services/stt/fal), [Gladia](https://docs.pipecat.ai/server/services/stt/gladia), [Google](https://docs.pipecat.ai/server/services/stt/google), [Gradium](https://docs.pipecat.ai/server/services/stt/gradium), [Groq (Whisper)](https://docs.pipecat.ai/server/services/stt/groq), [NVIDIA Riva](https://docs.pipecat.ai/server/services/stt/riva), [OpenAI (Whisper)](https://docs.pipecat.ai/server/services/stt/openai), [SambaNova (Whisper)](https://docs.pipecat.ai/server/services/stt/sambanova), [Sarvam](https://docs.pipecat.ai/server/services/stt/sarvam), [Soniox](https://docs.pipecat.ai/server/services/stt/soniox), [Speechmatics](https://docs.pipecat.ai/server/services/stt/speechmatics), [Whisper](https://docs.pipecat.ai/server/services/stt/whisper) |
7777
| LLMs | [Anthropic](https://docs.pipecat.ai/server/services/llm/anthropic), [AWS](https://docs.pipecat.ai/server/services/llm/aws), [Azure](https://docs.pipecat.ai/server/services/llm/azure), [Cerebras](https://docs.pipecat.ai/server/services/llm/cerebras), [DeepSeek](https://docs.pipecat.ai/server/services/llm/deepseek), [Fireworks AI](https://docs.pipecat.ai/server/services/llm/fireworks), [Gemini](https://docs.pipecat.ai/server/services/llm/gemini), [Grok](https://docs.pipecat.ai/server/services/llm/grok), [Groq](https://docs.pipecat.ai/server/services/llm/groq), [Mistral](https://docs.pipecat.ai/server/services/llm/mistral), [NVIDIA NIM](https://docs.pipecat.ai/server/services/llm/nim), [Ollama](https://docs.pipecat.ai/server/services/llm/ollama), [OpenAI](https://docs.pipecat.ai/server/services/llm/openai), [OpenRouter](https://docs.pipecat.ai/server/services/llm/openrouter), [Perplexity](https://docs.pipecat.ai/server/services/llm/perplexity), [Qwen](https://docs.pipecat.ai/server/services/llm/qwen), [SambaNova](https://docs.pipecat.ai/server/services/llm/sambanova) [Together AI](https://docs.pipecat.ai/server/services/llm/together) |
78-
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
78+
| Text-to-Speech | [Async](https://docs.pipecat.ai/server/services/tts/asyncai), [AWS](https://docs.pipecat.ai/server/services/tts/aws), [Azure](https://docs.pipecat.ai/server/services/tts/azure), [Camb AI](https://docs.pipecat.ai/server/services/tts/camb), [Cartesia](https://docs.pipecat.ai/server/services/tts/cartesia), [Deepgram](https://docs.pipecat.ai/server/services/tts/deepgram), [ElevenLabs](https://docs.pipecat.ai/server/services/tts/elevenlabs), [Fish](https://docs.pipecat.ai/server/services/tts/fish), [Google](https://docs.pipecat.ai/server/services/tts/google), [Gradium](https://docs.pipecat.ai/server/services/tts/gradium), [Groq](https://docs.pipecat.ai/server/services/tts/groq), [Hume](https://docs.pipecat.ai/server/services/tts/hume), [Inworld](https://docs.pipecat.ai/server/services/tts/inworld), [LMNT](https://docs.pipecat.ai/server/services/tts/lmnt), [MiniMax](https://docs.pipecat.ai/server/services/tts/minimax), [Neuphonic](https://docs.pipecat.ai/server/services/tts/neuphonic), [NVIDIA Riva](https://docs.pipecat.ai/server/services/tts/riva), [OpenAI](https://docs.pipecat.ai/server/services/tts/openai), [Piper](https://docs.pipecat.ai/server/services/tts/piper), [PlayHT](https://docs.pipecat.ai/server/services/tts/playht), [Rime](https://docs.pipecat.ai/server/services/tts/rime), [Sarvam](https://docs.pipecat.ai/server/services/tts/sarvam), [Speechmatics](https://docs.pipecat.ai/server/services/tts/speechmatics), [XTTS](https://docs.pipecat.ai/server/services/tts/xtts) |
7979
| Speech-to-Speech | [AWS Nova Sonic](https://docs.pipecat.ai/server/services/s2s/aws), [Gemini Multimodal Live](https://docs.pipecat.ai/server/services/s2s/gemini), [Grok Voice Agent](https://docs.pipecat.ai/server/services/s2s/grok), [OpenAI Realtime](https://docs.pipecat.ai/server/services/s2s/openai), [Ultravox](https://docs.pipecat.ai/server/services/s2s/ultravox), |
8080
| Transport | [Daily (WebRTC)](https://docs.pipecat.ai/server/services/transport/daily), [FastAPI Websocket](https://docs.pipecat.ai/server/services/transport/fastapi-websocket), [SmallWebRTCTransport](https://docs.pipecat.ai/server/services/transport/small-webrtc), [WebSocket Server](https://docs.pipecat.ai/server/services/transport/websocket-server), Local |
8181
| Serializers | [Exotel](https://docs.pipecat.ai/server/utilities/serializers/exotel), [Plivo](https://docs.pipecat.ai/server/utilities/serializers/plivo), [Twilio](https://docs.pipecat.ai/server/utilities/serializers/twilio), [Telnyx](https://docs.pipecat.ai/server/utilities/serializers/telnyx), [Vonage](https://docs.pipecat.ai/server/utilities/serializers/vonage) |

changelog/3349.added.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
- Added `CambTTSService`, using Camb.ai's TTS integration with MARS models (mars-flash, mars-pro, mars-instruct) for high-quality text-to-speech synthesis.

env.example

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,9 @@ AZURE_DALLE_API_KEY=...
3131
AZURE_DALLE_ENDPOINT=https://...
3232
AZURE_DALLE_MODEL=...
3333

34+
# Camb.ai
35+
CAMB_API_KEY=...
36+
3437
# Cartesia
3538
CARTESIA_API_KEY=...
3639
CARTESIA_VOICE_ID=...
Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
#
2+
# Copyright (c) 2024–2025, Daily
3+
#
4+
# SPDX-License-Identifier: BSD 2-Clause License
5+
#
6+
7+
import os
8+
9+
from dotenv import load_dotenv
10+
from loguru import logger
11+
12+
from pipecat.audio.turn.smart_turn.local_smart_turn_v3 import LocalSmartTurnAnalyzerV3
13+
from pipecat.audio.vad.silero import SileroVADAnalyzer
14+
from pipecat.audio.vad.vad_analyzer import VADParams
15+
from pipecat.frames.frames import LLMRunFrame
16+
from pipecat.pipeline.pipeline import Pipeline
17+
from pipecat.pipeline.runner import PipelineRunner
18+
from pipecat.pipeline.task import PipelineParams, PipelineTask
19+
from pipecat.processors.aggregators.llm_context import LLMContext
20+
from pipecat.processors.aggregators.llm_response_universal import (
21+
LLMContextAggregatorPair,
22+
LLMUserAggregatorParams,
23+
)
24+
from pipecat.runner.types import RunnerArguments
25+
from pipecat.runner.utils import create_transport
26+
from pipecat.services.camb.tts import CambTTSService
27+
from pipecat.services.deepgram.stt import DeepgramSTTService
28+
from pipecat.services.openai.llm import OpenAILLMService
29+
from pipecat.transports.base_transport import BaseTransport, TransportParams
30+
from pipecat.transports.daily.transport import DailyParams
31+
from pipecat.transports.websocket.fastapi import FastAPIWebsocketParams
32+
from pipecat.turns.user_stop import TurnAnalyzerUserTurnStopStrategy
33+
from pipecat.turns.user_turn_strategies import UserTurnStrategies
34+
35+
load_dotenv(override=True)
36+
37+
38+
# We store functions so objects (e.g. SileroVADAnalyzer) don't get
39+
# instantiated. The function will be called when the desired transport gets
40+
# selected.
41+
transport_params = {
42+
"daily": lambda: DailyParams(
43+
audio_in_enabled=True,
44+
audio_out_enabled=True,
45+
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
46+
),
47+
"twilio": lambda: FastAPIWebsocketParams(
48+
audio_in_enabled=True,
49+
audio_out_enabled=True,
50+
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
51+
),
52+
"webrtc": lambda: TransportParams(
53+
audio_in_enabled=True,
54+
audio_out_enabled=True,
55+
vad_analyzer=SileroVADAnalyzer(params=VADParams(stop_secs=0.2)),
56+
),
57+
}
58+
59+
60+
async def run_bot(transport: BaseTransport, runner_args: RunnerArguments):
61+
logger.info("Starting Camb AI TTS bot")
62+
63+
stt = DeepgramSTTService(api_key=os.getenv("DEEPGRAM_API_KEY"))
64+
65+
tts = CambTTSService(
66+
api_key=os.getenv("CAMB_API_KEY"),
67+
model="mars-flash",
68+
)
69+
70+
llm = OpenAILLMService(api_key=os.getenv("OPENAI_API_KEY"))
71+
72+
messages = [
73+
{
74+
"role": "system",
75+
"content": "You are a helpful voice assistant powered by Camb AI text-to-speech. "
76+
"Keep your responses concise and conversational since they will be spoken aloud. "
77+
"Avoid special characters, emojis, or bullet points.",
78+
},
79+
]
80+
81+
context = LLMContext(messages)
82+
user_aggregator, assistant_aggregator = LLMContextAggregatorPair(
83+
context,
84+
user_params=LLMUserAggregatorParams(
85+
user_turn_strategies=UserTurnStrategies(
86+
stop=[TurnAnalyzerUserTurnStopStrategy(turn_analyzer=LocalSmartTurnAnalyzerV3())]
87+
),
88+
),
89+
)
90+
91+
pipeline = Pipeline(
92+
[
93+
transport.input(),
94+
stt,
95+
user_aggregator,
96+
llm,
97+
tts,
98+
transport.output(),
99+
assistant_aggregator,
100+
]
101+
)
102+
103+
task = PipelineTask(
104+
pipeline,
105+
params=PipelineParams(
106+
enable_metrics=True,
107+
enable_usage_metrics=True,
108+
audio_out_sample_rate=22050,
109+
),
110+
idle_timeout_secs=runner_args.pipeline_idle_timeout_secs,
111+
)
112+
113+
@transport.event_handler("on_client_connected")
114+
async def on_client_connected(transport, client):
115+
logger.info("Client connected")
116+
messages.append({"role": "system", "content": "Please introduce yourself to the user."})
117+
await task.queue_frames([LLMRunFrame()])
118+
119+
@transport.event_handler("on_client_disconnected")
120+
async def on_client_disconnected(transport, client):
121+
logger.info("Client disconnected")
122+
await task.cancel()
123+
124+
runner = PipelineRunner(handle_sigint=runner_args.handle_sigint)
125+
126+
await runner.run(task)
127+
128+
129+
async def bot(runner_args: RunnerArguments):
130+
"""Main bot entry point compatible with Pipecat Cloud."""
131+
transport = await create_transport(runner_args, transport_params)
132+
await run_bot(transport, runner_args)
133+
134+
135+
if __name__ == "__main__":
136+
from pipecat.runner.run import main
137+
138+
main()

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@ aws = [ "aioboto3~=15.5.0", "pipecat-ai[websockets-base]" ]
5353
aws-nova-sonic = [ "aws_sdk_bedrock_runtime~=0.2.0; python_version>='3.12'" ]
5454
azure = [ "azure-cognitiveservices-speech~=1.44.0"]
5555
cartesia = [ "cartesia~=2.0.3", "pipecat-ai[websockets-base]" ]
56+
camb = [ "camb-sdk>=1.5.4" ]
5657
cerebras = []
5758
daily = [ "daily-python~=0.23.0" ]
5859
deepgram = [ "deepgram-sdk~=4.7.0", "pipecat-ai[websockets-base]" ]
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
#
2+
# Copyright (c) 2024–2026, Daily
3+
#
4+
# SPDX-License-Identifier: BSD 2-Clause License
5+
#

0 commit comments

Comments
 (0)