Voice-to-text dictation example using the useVoiceInput hook from @cloudflare/voice.
Captures microphone audio, streams it to an Agent Durable Object for real-time speech-to-text using Workers AI, and displays the transcript in a text area.
npm install && npm startNo API keys needed — uses Workers AI (bound via wrangler.jsonc).
Uses withVoiceInput — a lightweight mixin that only does STT. No TTS provider, no onTurn handler needed:
import { Agent } from "agents";
import { withVoiceInput, WorkersAINova3STT } from "@cloudflare/voice";
const InputAgent = withVoiceInput(Agent);
export class VoiceInputAgent extends InputAgent<Env> {
transcriber = new WorkersAINova3STT(this.env.AI);
onTranscript(text, connection) {
console.log("User said:", text);
}
}Uses useVoiceInput — a lightweight React hook that accumulates transcripts into a single string:
import { useVoiceInput } from "@cloudflare/voice/react";
const { transcript, interimTranscript, isListening, start, stop, clear } =
useVoiceInput({ agent: "VoiceInputAgent" });Returns:
transcript— accumulated final text from all utterancesinterimTranscript— real-time partial transcript (updates as you speak)isListening— whether the mic is activeaudioLevel— current audio level for visual feedbackstart()/stop()— control listeningtoggleMute()— mute without stoppingclear()— reset the transcript
examples/playground— full voice agent with conversation@cloudflare/voice— the voice package