-
-
Notifications
You must be signed in to change notification settings - Fork 0
feat:Add Google TTS provider to Ultravox OpenAPI with GoogleVoice schema #128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughAdds Google as a Text-to-Speech provider in the Ultravox OpenAPI by introducing a GoogleVoice schema and wiring it into ExternalVoice and PatchedVoice. Extends API key schemas to accept Google service account credentials. All changes are confined to the OpenAPI specification. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Ultravox API
participant Google TTS
rect rgba(220,235,255,0.4)
note over Client,Ultravox API: Configure Google credentials
Client->>Ultravox API: PATCH /tts/keys { google: "<service-account-json>" }
Ultravox API-->>Client: 200 OK
end
rect rgba(220,255,220,0.4)
note over Client,Ultravox API: Select Google voice (bidirectional streaming)
Client->>Ultravox API: PATCH /voices/{id} { google: { voiceId, speakingRate } }
Ultravox API-->>Client: 200 OK
end
rect rgba(255,245,220,0.4)
note over Client,Google TTS: Streaming synthesis
Client->>Ultravox API: Start TTS stream (voice=google)
Ultravox API->>Google TTS: Open bidi stream (voiceId, speakingRate, creds)
Google TTS-->>Ultravox API: Audio chunks
Ultravox API-->>Client: Audio chunks
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/libs/Ultravox/openapi.yaml (2)
2614-2617: Clarify description to indicate JSON service account key.Small wording tweak for accuracy and parity with the patch request field.
google: allOf: - $ref: '#/components/schemas/KeyPrefix' - description: The Google service account key. + description: The Google Cloud service account key (JSON). Returned here as a key prefix only.
3703-3706: Allow either a JSON string or object for Google credentials (more ergonomic).Many clients will prefer passing the JSON object directly instead of a stringified blob. If backend accepts both, advertise it.
If the backend only accepts a string today, feel free to skip. Otherwise:
google: - type: string - description: "A service account JSON key for your Google Cloud project with the Text-to-Speech API enabled. + oneOf: + - type: string + - type: object + additionalProperties: {} + description: "A service account JSON key (string or object) for your Google Cloud project with the Text-to-Speech API enabled. https://cloud.google.com/text-to-speech/docs/quickstart-client-libraries#before-you-begin https://cloud.google.com/iam/docs/keys-create-delete#creating" nullable: true
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (8)
src/libs/Ultravox/Generated/Ultravox.AccountsClient.AccountsMeTtsApiKeysPartialUpdate.g.csis excluded by!**/generated/**src/libs/Ultravox/Generated/Ultravox.IAccountsClient.AccountsMeTtsApiKeysPartialUpdate.g.csis excluded by!**/generated/**src/libs/Ultravox/Generated/Ultravox.JsonSerializerContextTypes.g.csis excluded by!**/generated/**src/libs/Ultravox/Generated/Ultravox.Models.AccountTtsKeys.g.csis excluded by!**/generated/**src/libs/Ultravox/Generated/Ultravox.Models.PatchedSetTtsApiKeysRequest.g.csis excluded by!**/generated/**src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1ExternalVoice.g.csis excluded by!**/generated/**src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1GoogleVoice.Json.g.csis excluded by!**/generated/**src/libs/Ultravox/Generated/Ultravox.Models.UltravoxV1GoogleVoice.g.csis excluded by!**/generated/**
📒 Files selected for processing (1)
src/libs/Ultravox/openapi.yaml(4 hunks)
| google: | ||
| allOf: | ||
| - $ref: '#/components/schemas/ultravox.v1.GoogleVoice' | ||
| description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)" | ||
| generic: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Confirm runtime truly requires “bidirectional streaming” (Chirp3+) for Google.
The description is specific and time-sensitive. Please confirm the server implementation currently only supports bidi streaming voices and that non-Chirp voices fail predictably; otherwise soften the claim.
🌐 Web query:
Does Google Cloud Text-to-Speech streaming (StreamingAudioConfig) require “Chirp3” voices, and what are the supported speakingRate bounds for streaming?
💡 Result:
Short answer:
- No — StreamingAudioConfig (streaming TTS) does not require Chirp3 voices; it works with the supported voices listed in the Voices page (Wavenet, Neural2, Chirp/Chirp3, Studio, Standard, etc.). [1][2]
- The supported speakingRate bounds for streaming are 0.25 to 2.0 (values outside that range return an error; default is 1.0). [1]
Sources:
[1] Google Cloud Text‑to‑Speech StreamingAudioConfig doc (speakingRate range).
[2] Google Cloud Text‑to‑Speech supported voices list (shows available voice families including Chirp/Chirp3).
Clarify Google voice description in OpenAPI schema
Update the description to note that StreamingAudioConfig supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard) rather than implying Chirp3-only.
File: src/libs/Ultravox/openapi.yaml Lines 4765-4769
- description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)"
+ description: "A voice served by Google using bidirectional streaming; supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard).\n (For non-streaming or output-only streaming, use generic.)"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| google: | |
| allOf: | |
| - $ref: '#/components/schemas/ultravox.v1.GoogleVoice' | |
| description: "A voice served by Google, using bidirectional streaming.\n (For non-streaming or output-only streaming, use generic.)" | |
| generic: | |
| google: | |
| allOf: | |
| - $ref: '#/components/schemas/ultravox.v1.GoogleVoice' | |
| description: "A voice served by Google using bidirectional streaming; supports all streaming-compatible voice families (Wavenet, Neural2, Chirp3, Studio, Standard).\n (For non-streaming or output-only streaming, use generic.)" | |
| generic: |
🤖 Prompt for AI Agents
In src/libs/Ultravox/openapi.yaml around lines 4765-4769, the description for
the "google" voice currently implies Chirp3-only streaming support; update that
description string to state that StreamingAudioConfig supports all
streaming-compatible Google voice families (Wavenet, Neural2, Chirp3, Studio,
Standard) for bidirectional streaming, and reword to avoid implying exclusivity
(keep existing note about non-streaming/output-only using generic).
| ultravox.v1.GoogleVoice: | ||
| type: object | ||
| properties: | ||
| voiceId: | ||
| type: string | ||
| description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".' | ||
| speakingRate: | ||
| type: number | ||
| description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1.\n See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig" | ||
| format: float | ||
| description: "Specification for a voice served by Google.\n This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported." | ||
| ultravox.v1.HeaderApiKeyRequirement: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Add validation bounds and a default for speakingRate; include an example voiceId.
Without min/max/default, clients can submit out-of-range values that will fail at runtime. This mirrors other provider schemas (e.g., LMNT speed 0.25–2, ElevenLabs speed defaults).
Proposed schema tweaks:
ultravox.v1.GoogleVoice:
type: object
properties:
voiceId:
type: string
- description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".'
+ description: 'The ID (name) of the voice in Google.'
+ example: en-US-Chirp3-HD-Charon
speakingRate:
type: number
- description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1.
- See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
format: float
+ minimum: 0.25
+ maximum: 2
+ default: 1
+ description: "The speaking rate.
+ See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
- description: "Specification for a voice served by Google.
- This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported."
+ description: "Specification for a voice served by Google. Uses bidirectional streaming."Also, please verify that the referenced Python type path is correct for the current client library and version.
🌐 Web query:
Is `google.cloud.texttospeech_v1.types.StreamingAudioConfig` the correct Python type path for Google Cloud TTS streaming configuration?
💡 Result:
Yes. The Python type is google.cloud.texttospeech_v1.types.StreamingAudioConfig. [1][2]
References:
- Library reference for StreamingAudioConfig. [1]
- Streaming quickstart / examples showing streaming config usage. [2]
Add validation bounds, default, and example to GoogleVoice schema
ultravox.v1.GoogleVoice:
type: object
properties:
voiceId:
- description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".'
+ description: 'The ID (name) of the voice in Google.'
+ example: en-US-Chirp3-HD-Charon
speakingRate:
type: number
format: float
+ minimum: 0.25
+ maximum: 2
+ default: 1
description: "The speaking rate.
See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig"
- description: "Specification for a voice served by Google.
- This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported."
+ description: "Specification for a voice served by Google. Uses bidirectional streaming."Verified that google.cloud.texttospeech_v1.types.StreamingAudioConfig is the correct Python type path.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ultravox.v1.GoogleVoice: | |
| type: object | |
| properties: | |
| voiceId: | |
| type: string | |
| description: 'The ID (name) of the voice in Google, e.g. "en-US-Chirp3-HD-Charon".' | |
| speakingRate: | |
| type: number | |
| description: "The speaking rate. Must be between 0.25 and 2. Defaults to 1.\n See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig" | |
| format: float | |
| description: "Specification for a voice served by Google.\n This implementation uses bidirectional streaming, so voices prior to Chirp3 are not supported." | |
| ultravox.v1.HeaderApiKeyRequirement: | |
| ultravox.v1.GoogleVoice: | |
| type: object | |
| properties: | |
| voiceId: | |
| type: string | |
| description: 'The ID (name) of the voice in Google.' | |
| example: en-US-Chirp3-HD-Charon | |
| speakingRate: | |
| type: number | |
| format: float | |
| minimum: 0.25 | |
| maximum: 2 | |
| default: 1 | |
| description: "The speaking rate. | |
| See https://cloud.google.com/python/docs/reference/texttospeech/latest/google.cloud.texttospeech_v1.types.StreamingAudioConfig" | |
| description: "Specification for a voice served by Google. Uses bidirectional streaming." | |
| ultravox.v1.HeaderApiKeyRequirement: |
🤖 Prompt for AI Agents
In src/libs/Ultravox/openapi.yaml around lines 4854-4865, the
ultravox.v1.GoogleVoice schema needs explicit validation and examples: for
speakingRate add minimum: 0.25, maximum: 2, default: 1 and an example (e.g.
1.0); ensure format: float stays; for voiceId add an example (e.g.
"en-US-Chirp3-HD-Charon") and, if desired, a pattern or minLength to validate
non-empty strings; optionally add an overall example for the GoogleVoice object
to show both fields together.
Summary by CodeRabbit