Skip to content

Conversation

@HavenDV
Copy link
Contributor

@HavenDV HavenDV commented Sep 5, 2025

Summary by CodeRabbit

  • New Features
    • Added Google as a Text-to-Speech provider with bidirectional streaming support.
    • Enable configuration of Google voices, including voice selection and speaking rate.
    • Support adding Google credentials: provide a service account key in TTS keys and patch the Google API key via settings.
    • Google voices are available across both standard and patched voice configurations.

@coderabbitai
Copy link

coderabbitai bot commented Sep 5, 2025

Walkthrough

Adds Google as a Text-to-Speech provider in the Ultravox OpenAPI by introducing a GoogleVoice schema and wiring it into ExternalVoice and PatchedVoice. Extends API key schemas to accept Google service account credentials. All changes are confined to the OpenAPI specification.

Changes

Cohort / File(s) Summary of changes
OpenAPI: Google TTS integration
src/libs/Ultravox/openapi.yaml
- Added schema ultravox.v1.GoogleVoice (voiceId, speakingRate).
- Added google field to ultravox.v1.ExternalVoice and ultravox.v1.PatchedVoice referencing GoogleVoice (bidirectional streaming noted).
- Added google credentials to AccountTtsKeys (KeyPrefix) and PatchedSetTtsApiKeysRequest (string, nullable).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant Ultravox API
  participant Google TTS

  rect rgba(220,235,255,0.4)
  note over Client,Ultravox API: Configure Google credentials
  Client->>Ultravox API: PATCH /tts/keys { google: "<service-account-json>" }
  Ultravox API-->>Client: 200 OK
  end

  rect rgba(220,255,220,0.4)
  note over Client,Ultravox API: Select Google voice (bidirectional streaming)
  Client->>Ultravox API: PATCH /voices/{id} { google: { voiceId, speakingRate } }
  Ultravox API-->>Client: 200 OK
  end

  rect rgba(255,245,220,0.4)
  note over Client,Google TTS: Streaming synthesis
  Client->>Ultravox API: Start TTS stream (voice=google)
  Ultravox API->>Google TTS: Open bidi stream (voiceId, speakingRate, creds)
  Google TTS-->>Ultravox API: Audio chunks
  Ultravox API-->>Client: Audio chunks
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at clouds that speak,
A Google gust, now bidirection’s chic—
Voices hop through streams so bright,
Carrots click in pure delight.
Schema burrows freshly spun,
Keys in paws—deploy and run!
Ribbit—er, rabbit—work well done.

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch bot/update-openapi_202509052109

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot merged commit 0447f6c into main Sep 5, 2025
3 of 4 checks passed
@github-actions github-actions bot deleted the bot/update-openapi_202509052109 branch September 5, 2025 21:10
@coderabbitai coderabbitai bot changed the title feat:@coderabbitai feat:Add Google TTS provider to Ultravox OpenAPI with GoogleVoice schema Sep 5, 2025
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/libs/Ultravox/openapi.yaml (2)

2614-2617: Clarify description to indicate JSON service account key.

Small wording tweak for accuracy and parity with the patch request field.

         google:
           allOf:
             - $ref: '#/components/schemas/KeyPrefix'
-          description: The Google service account key.
+          description: The Google Cloud service account key (JSON). Returned here as a key prefix only.

3703-3706: Allow either a JSON string or object for Google credentials (more ergonomic).

Many clients will prefer passing the JSON object directly instead of a stringified blob. If backend accepts both, advertise it.

If the backend only accepts a string today, feel free to skip. Otherwise:

       google:
-        type: string
-        description: "A service account JSON key for your Google Cloud project with the Text-to-Speech API enabled.
+        oneOf:
+          - type: string
+          - type: object
+            additionalProperties: {}
+        description: "A service account JSON key (string or object) for your Google Cloud project with the Text-to-Speech API enabled.
 https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries#before-you-begin
 https://cloud.google.com/iam/docs/keys-create-delete#creating"
         nullable: true
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3f0bf2c and afbc68f.

⛔ Files ignored due to path filters (8)
  • src/libs/Ultravox/Generated/Ultravox.AccountsClient.AccountsMeTtsApiKeysPartialUpdate.g.cs is excluded by !**/generated/**
  • src/libs/Ultravox/Generated/Ultravox.IAccountsClient.AccountsMeTtsApiKeysPartialUpdate.g.cs is excluded by !**/generated/**
  • src/libs/Ultravox/Generated/Ultravox.JsonSerializerContextTypes.g.cs is excluded by !**/generated/**
  • src/libs/Ultravox/Generated/Ultravox.Models.AccountTtsKeys.g.cs is excluded by !**/generated/**
  • src/libs/Ultravox/Generated/Ultravox.Models.PatchedSetTtsApiKeysRequest.g.cs is excluded by !**/generated/**
  • src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1ExternalVoice.g.cs is excluded by !**/generated/**
  • src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1GoogleVoice.Json.g.cs is excluded by !**/generated/**
  • src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1GoogleVoice.g.cs is excluded by !**/generated/**
📒 Files selected for processing (1)
  • src/libs/Ultravox/openapi.yaml (4 hunks)

Comment on lines +4765 to 4769
google:
allOf:
- $ref: '#/components/schemas/ultravox.v1.GoogleVoice'
description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)"
generic:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Confirm runtime truly requires “bidirectional streaming” (Chirp3+) for Google.

The description is specific and time-sensitive. Please confirm the server implementation currently only supports bidi streaming voices and that non-Chirp voices fail predictably; otherwise soften the claim.


🌐 Web query:

Does Google Cloud Text-to-Speech streaming (StreamingAudioConfig) require “Chirp3” voices, and what are the supported speakingRate bounds for streaming?

💡 Result:

Short answer:

  • No — StreamingAudioConfig (streaming TTS) does not require Chirp3 voices; it works with the supported voices listed in the Voices page (Wavenet, Neural2, Chirp/Chirp3, Studio, Standard, etc.). [1][2]
  • The supported speakingRate bounds for streaming are 0.25 to 2.0 (values outside that range return an error; default is 1.0). [1]

Sources:
[1] Google Cloud Text‑to‑Speech StreamingAudioConfig doc (speakingRate range).
[2] Google Cloud Text‑to‑Speech supported voices list (shows available voice families including Chirp/Chirp3).


Clarify Google voice description in OpenAPI schema
Update the description to note that StreamingAudioConfig supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard) rather than implying Chirp3-only.

File: src/libs/Ultravox/openapi.yaml Lines 4765-4769

-          description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)"
+          description: "A voice served by Google using bidirectional streaming; supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard).\n (For non-streaming or output-only streaming, use generic.)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
google:
allOf:
- $ref: '#/components/schemas/ultravox.v1.GoogleVoice'
description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)"
generic:
google:
allOf:
- $ref: '#/components/schemas/ultravox.v1.GoogleVoice'
description: "A voice served by Google using bidirectional streaming; supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard).\n (For non-streaming or output-only streaming, use generic.)"
generic:
🤖 Prompt for AI Agents
In src/libs/Ultravox/openapi.yaml around lines 4765-4769, the description for
the "google" voice currently implies Chirp3-only streaming support; update that
description string to state that StreamingAudioConfig supports all
streaming-compatible Google voice families (Wavenet, Neural2, Chirp3, Studio,
Standard) for bidirectional streaming, and reword to avoid implying exclusivity
(keep existing note about non-streaming/output-only using generic).

Comment on lines +4854 to 4865
ultravox.v1.GoogleVoice:
type: object
properties:
voiceId:
type: string
description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".'
speakingRate:
type: number
description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1.\n See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
format: float
description: "Specification for a voice served by Google.\n This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported."
ultravox.v1.HeaderApiKeyRequirement:
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Add validation bounds and a default for speakingRate; include an example voiceId.

Without min/max/default, clients can submit out-of-range values that will fail at runtime. This mirrors other provider schemas (e.g., LMNT speed 0.25–2, ElevenLabs speed defaults).

Proposed schema tweaks:

     ultravox.v1.GoogleVoice:
       type: object
       properties:
         voiceId:
           type: string
-          description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".'
+          description: 'The ID (name) of the voice in Google.'
+          example: en-US-Chirp3-HD-Charon
         speakingRate:
           type: number
-          description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1.
- See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
           format: float
+          minimum: 0.25
+          maximum: 2
+          default: 1
+          description: "The speaking rate.
+ See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
-      description: "Specification for a voice served by Google.
- This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported."
+      description: "Specification for a voice served by Google. Uses bidirectional streaming."

Also, please verify that the referenced Python type path is correct for the current client library and version.


🌐 Web query:

Is `google.cloud.texttospeech_v1.types.StreamingAudioConfig` the correct Python type path for Google Cloud TTS streaming configuration?

💡 Result:

Yes. The Python type is google.cloud.texttospeech_v1.types.StreamingAudioConfig. [1][2]

References:

  • Library reference for StreamingAudioConfig. [1]
  • Streaming quickstart / examples showing streaming config usage. [2]

Add validation bounds, default, and example to GoogleVoice schema

     ultravox.v1.GoogleVoice:
       type: object
       properties:
         voiceId:
-          description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".'
+          description: 'The ID (name) of the voice in Google.'
+          example: en-US-Chirp3-HD-Charon
         speakingRate:
           type: number
           format: float
+          minimum: 0.25
+          maximum: 2
+          default: 1
           description: "The speaking rate.
 See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
-      description: "Specification for a voice served by Google.
- This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported."
+      description: "Specification for a voice served by Google. Uses bidirectional streaming."

Verified that google.cloud.texttospeech_v1.types.StreamingAudioConfig is the correct Python type path.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
ultravox.v1.GoogleVoice:
type: object
properties:
voiceId:
type: string
description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".'
speakingRate:
type: number
description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1.\n See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
format: float
description: "Specification for a voice served by Google.\n This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported."
ultravox.v1.HeaderApiKeyRequirement:
ultravox.v1.GoogleVoice:
type: object
properties:
voiceId:
type: string
description: 'The ID (name) of the voice in Google.'
example: en-US-Chirp3-HD-Charon
speakingRate:
type: number
format: float
minimum: 0.25
maximum: 2
default: 1
description: "The speaking rate.
See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
description: "Specification for a voice served by Google. Uses bidirectional streaming."
ultravox.v1.HeaderApiKeyRequirement:
🤖 Prompt for AI Agents
In src/libs/Ultravox/openapi.yaml around lines 4854-4865, the
ultravox.v1.GoogleVoice schema needs explicit validation and examples: for
speakingRate add minimum: 0.25, maximum: 2, default: 1 and an example (e.g.
1.0); ensure format: float stays; for voiceId add an example (e.g.
"en-US-Chirp3-HD-Charon") and, if desired, a pattern or minLength to validate
non-empty strings; optionally add an overall example for the GoogleVoice object
to show both fields together.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants