feat:Add Google TTS provider to Ultravox OpenAPI with GoogleVoice schema #128

HavenDV · 2025-09-05T21:10:26Z

Summary by CodeRabbit

New Features
- Added Google as a Text-to-Speech provider with bidirectional streaming support.
- Enable configuration of Google voices, including voice selection and speaking rate.
- Support adding Google credentials: provide a service account key in TTS keys and patch the Google API key via settings.
- Google voices are available across both standard and patched voice configurations.

coderabbitai · 2025-09-05T21:10:34Z

Walkthrough

Adds Google as a Text-to-Speech provider in the Ultravox OpenAPI by introducing a GoogleVoice schema and wiring it into ExternalVoice and PatchedVoice. Extends API key schemas to accept Google service account credentials. All changes are confined to the OpenAPI specification.

Changes

Cohort / File(s)	Summary of changes
OpenAPI: Google TTS integration `src/libs/Ultravox/openapi.yaml`	- Added schema `ultravox.v1.GoogleVoice` (voiceId, speakingRate). - Added `google` field to `ultravox.v1.ExternalVoice` and `ultravox.v1.PatchedVoice` referencing `GoogleVoice` (bidirectional streaming noted). - Added `google` credentials to `AccountTtsKeys` (KeyPrefix) and `PatchedSetTtsApiKeysRequest` (string, nullable).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Ultravox API
  participant Google TTS

  rect rgba(220,235,255,0.4)
  note over Client,Ultravox API: Configure Google credentials
  Client->>Ultravox API: PATCH /tts/keys { google: "<service-account-json>" }
  Ultravox API-->>Client: 200 OK
  end

  rect rgba(220,255,220,0.4)
  note over Client,Ultravox API: Select Google voice (bidirectional streaming)
  Client->>Ultravox API: PATCH /voices/{id} { google: { voiceId, speakingRate } }
  Ultravox API-->>Client: 200 OK
  end

  rect rgba(255,245,220,0.4)
  note over Client,Google TTS: Streaming synthesis
  Client->>Ultravox API: Start TTS stream (voice=google)
  Ultravox API->>Google TTS: Open bidi stream (voiceId, speakingRate, creds)
  Google TTS-->>Ultravox API: Audio chunks
  Ultravox API-->>Client: Audio chunks
  end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at clouds that speak,
A Google gust, now bidirection’s chic—
Voices hop through streams so bright,
Carrots click in pure delight.
Schema burrows freshly spun,
Keys in paws—deploy and run!
Ribbit—er, rabbit—work well done.

✨ Finishing Touches

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch bot/update-openapi_202509052109

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (2)

src/libs/Ultravox/openapi.yaml (2)

2614-2617: Clarify description to indicate JSON service account key.

Small wording tweak for accuracy and parity with the patch request field.

         google:
           allOf:
             - $ref: '#/components/schemas/KeyPrefix'
-          description: The Google service account key.
+          description: The Google Cloud service account key (JSON). Returned here as a key prefix only.

3703-3706: Allow either a JSON string or object for Google credentials (more ergonomic).

Many clients will prefer passing the JSON object directly instead of a stringified blob. If backend accepts both, advertise it.

If the backend only accepts a string today, feel free to skip. Otherwise:

       google:
-        type: string
-        description: "A service account JSON key for your Google Cloud project with the Text-to-Speech API enabled.
+        oneOf:
+          - type: string
+          - type: object
+            additionalProperties: {}
+        description: "A service account JSON key (string or object) for your Google Cloud project with the Text-to-Speech API enabled.
 https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries#before-you-begin
 https://cloud.google.com/iam/docs/keys-create-delete#creating"
         nullable: true

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3f0bf2c and afbc68f.

⛔ Files ignored due to path filters (8)

src/libs/Ultravox/Generated/Ultravox.AccountsClient.AccountsMeTtsApiKeysPartialUpdate.g.cs is excluded by !**/generated/**
src/libs/Ultravox/Generated/Ultravox.IAccountsClient.AccountsMeTtsApiKeysPartialUpdate.g.cs is excluded by !**/generated/**
src/libs/Ultravox/Generated/Ultravox.JsonSerializerContextTypes.g.cs is excluded by !**/generated/**
src/libs/Ultravox/Generated/Ultravox.Models.AccountTtsKeys.g.cs is excluded by !**/generated/**
src/libs/Ultravox/Generated/Ultravox.Models.PatchedSetTtsApiKeysRequest.g.cs is excluded by !**/generated/**
src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1ExternalVoice.g.cs is excluded by !**/generated/**
src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1GoogleVoice.Json.g.cs is excluded by !**/generated/**
src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1GoogleVoice.g.cs is excluded by !**/generated/**

📒 Files selected for processing (1)

src/libs/Ultravox/openapi.yaml (4 hunks)

coderabbitai · 2025-09-05T21:15:54Z

src/libs/Ultravox/openapi.yaml

+        google:
+          allOf:
+            - $ref: '#/components/schemas/ultravox.v1.GoogleVoice'
+          description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)"
        generic:


💡 Verification agent

🧩 Analysis chain

Confirm runtime truly requires “bidirectional streaming” (Chirp3+) for Google.

The description is specific and time-sensitive. Please confirm the server implementation currently only supports bidi streaming voices and that non-Chirp voices fail predictably; otherwise soften the claim.

🌐 Web query:

Does Google Cloud Text-to-Speech streaming (StreamingAudioConfig) require “Chirp3” voices, and what are the supported speakingRate bounds for streaming?

💡 Result:

Short answer:

No — StreamingAudioConfig (streaming TTS) does not require Chirp3 voices; it works with the supported voices listed in the Voices page (Wavenet, Neural2, Chirp/Chirp3, Studio, Standard, etc.). [1][2]

The supported speakingRate bounds for streaming are 0.25 to 2.0 (values outside that range return an error; default is 1.0). [1]

Sources:
[1] Google Cloud Text‑to‑Speech StreamingAudioConfig doc (speakingRate range).
[2] Google Cloud Text‑to‑Speech supported voices list (shows available voice families including Chirp/Chirp3).

Clarify Google voice description in OpenAPI schema
Update the description to note that StreamingAudioConfig supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard) rather than implying Chirp3-only.

File: src/libs/Ultravox/openapi.yaml Lines 4765-4769

- description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)" + description: "A voice served by Google using bidirectional streaming; supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard).\n (For non-streaming or output-only streaming, use generic.)"

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

google:

allOf:

- $ref: '#/components/schemas/ultravox.v1.GoogleVoice'

description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)"

generic:

google:

allOf:

- $ref: '#/components/schemas/ultravox.v1.GoogleVoice'

description: "A voice served by Google using bidirectional streaming; supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard).\n (For non-streaming or output-only streaming, use generic.)"

generic:

🤖 Prompt for AI Agents

In src/libs/Ultravox/openapi.yaml around lines 4765-4769, the description for the "google" voice currently implies Chirp3-only streaming support; update that description string to state that StreamingAudioConfig supports all streaming-compatible Google voice families (Wavenet, Neural2, Chirp3, Studio, Standard) for bidirectional streaming, and reword to avoid implying exclusivity (keep existing note about non-streaming/output-only using generic).

coderabbitai · 2025-09-05T21:15:54Z

src/libs/Ultravox/openapi.yaml

+    ultravox.v1.GoogleVoice:
+      type: object
+      properties:
+        voiceId:
+          type: string
+          description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".'
+        speakingRate:
+          type: number
+          description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1.\n See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
+          format: float
+      description: "Specification for a voice served by Google.\n This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported."
    ultravox.v1.HeaderApiKeyRequirement:


💡 Verification agent

🧩 Analysis chain

Add validation bounds and a default for speakingRate; include an example voiceId.

Without min/max/default, clients can submit out-of-range values that will fail at runtime. This mirrors other provider schemas (e.g., LMNT speed 0.25–2, ElevenLabs speed defaults).

Proposed schema tweaks:

ultravox.v1.GoogleVoice: type: object properties: voiceId: type: string - description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".' + description: 'The ID (name) of the voice in Google.' + example: en-US-Chirp3-HD-Charon speakingRate: type: number - description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1. - See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig" format: float + minimum: 0.25 + maximum: 2 + default: 1 + description: "The speaking rate. + See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig" - description: "Specification for a voice served by Google. - This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported." + description: "Specification for a voice served by Google. Uses bidirectional streaming."

Also, please verify that the referenced Python type path is correct for the current client library and version.

🌐 Web query:

Is `google.cloud.texttospeech_v1.types.StreamingAudioConfig` the correct Python type path for Google Cloud TTS streaming configuration?

💡 Result:

Yes. The Python type is google.cloud.texttospeech_v1.types.StreamingAudioConfig. [1][2]

References:

Library reference for StreamingAudioConfig. [1]

Streaming quickstart / examples showing streaming config usage. [2]

Add validation bounds, default, and example to GoogleVoice schema

ultravox.v1.GoogleVoice: type: object properties: voiceId: - description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".' + description: 'The ID (name) of the voice in Google.' + example: en-US-Chirp3-HD-Charon speakingRate: type: number format: float + minimum: 0.25 + maximum: 2 + default: 1 description: "The speaking rate. See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig" - description: "Specification for a voice served by Google. - This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported." + description: "Specification for a voice served by Google. Uses bidirectional streaming."

Verified that google.cloud.texttospeech_v1.types.StreamingAudioConfig is the correct Python type path.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

ultravox.v1.GoogleVoice:

type: object

properties:

voiceId:

type: string

description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".'

speakingRate:

type: number

description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1.\n See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"

format: float

description: "Specification for a voice served by Google.\n This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported."

ultravox.v1.HeaderApiKeyRequirement:

ultravox.v1.GoogleVoice:

type: object

properties:

voiceId:

type: string

description: 'The ID (name) of the voice in Google.'

example: en-US-Chirp3-HD-Charon

speakingRate:

type: number

format: float

minimum: 0.25

maximum: 2

default: 1

description: "The speaking rate.

See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"

description: "Specification for a voice served by Google. Uses bidirectional streaming."

ultravox.v1.HeaderApiKeyRequirement:

🤖 Prompt for AI Agents

In src/libs/Ultravox/openapi.yaml around lines 4854-4865, the ultravox.v1.GoogleVoice schema needs explicit validation and examples: for speakingRate add minimum: 0.25, maximum: 2, default: 1 and an example (e.g. 1.0); ensure format: float stays; for voiceId add an example (e.g. "en-US-Chirp3-HD-Charon") and, if desired, a pattern or minLength to validate non-empty strings; optionally add an overall example for the GoogleVoice object to show both fields together.

feat: Updated OpenAPI spec

afbc68f

github-actions bot approved these changes Sep 5, 2025

View reviewed changes

github-actions bot merged commit 0447f6c into main Sep 5, 2025
3 of 4 checks passed

github-actions bot deleted the bot/update-openapi_202509052109 branch September 5, 2025 21:10

coderabbitai bot changed the title ~~feat:@coderabbitai~~ feat:Add Google TTS provider to Ultravox OpenAPI with GoogleVoice schema Sep 5, 2025

coderabbitai bot reviewed Sep 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

feat:Add Google TTS provider to Ultravox OpenAPI with GoogleVoice schema #128

feat:Add Google TTS provider to Ultravox OpenAPI with GoogleVoice schema #128

Uh oh!

HavenDV commented Sep 5, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 5, 2025 •

edited

Loading

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 5, 2025

Uh oh!

coderabbitai bot Sep 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

feat:Add Google TTS provider to Ultravox OpenAPI with GoogleVoice schema #128

feat:Add Google TTS provider to Ultravox OpenAPI with GoogleVoice schema #128

Uh oh!

Conversation

HavenDV commented Sep 5, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HavenDV commented Sep 5, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 5, 2025 •

edited

Loading