Intelligent AI-powered voice assistant for automating phone calls
✨ Features • 🚀 Quick Start • 🎭 Personas • 🏗️ Architecture • 📖 Documentation
AI Phone Agent is a cutting-edge voice-based AI application that conducts real-time phone conversations using Google Gemini's advanced audio streaming capabilities. It features speech-to-speech interaction with customizable AI personas for various use cases like booking reservations, handling customer calls, and providing tech support.
| 🎙️ Real-time Voice | 🤖 Multiple Personas | 📝 Live Transcription | 🔊 Natural Speech |
|---|---|---|---|
| Bidirectional audio streaming | 5 built-in presets + custom | See conversations in real-time | Multiple voice options |
- 🗣️ Real-time Voice Conversations - Bidirectional audio streaming with Google Gemini
- 🎭 Customizable Personas - Switch between different AI personalities or create your own
- 📝 Live Transcription - See both user and agent speech transcribed in real-time
- 🔊 Multiple Voices - Choose from 5 different voice options (Puck, Charon, Kore, Fenrir, Zephyr)
- ⚡ Low Latency - Optimized audio pipeline for natural conversation flow
- 🎨 Modern UI - Clean, phone-like interface built with React and Tailwind CSS
- 📱 Responsive Design - Works seamlessly across devices
- 📦 Node.js (v18 or higher recommended)
- 🔑 Google Gemini API Key - Get one at Google AI Studio
# Clone the repository
git clone https://github.com/yourusername/ai-phone-agent.git
cd ai-phone-agent
# Install dependencies
npm install
# Configure environment
cp .env.example .env.localCreate a .env.local file in the root directory:
GEMINI_API_KEY=your_gemini_api_key_here# Start development server
npm run dev🎉 Open http://localhost:3000 in your browser!
AI Phone Agent comes with 5 pre-configured personas for common use cases:
| Persona | Description | Voice | Use Case |
|---|---|---|---|
| 🧑💼 Personal Assistant | Helpful assistant for general tasks | Kore | General inquiries & tasks |
| 🍽️ Restaurant Booker | Makes dinner reservations | Zephyr | Outbound booking calls |
| 🏢 Business Receptionist | Answers calls for TechSolutions Inc | Puck | Inbound business calls |
| 🔧 Tech Support | Troubleshoots internet issues | Fenrir | Customer support |
| 📋 Call Screener | Screens incoming calls | Charon | Call filtering |
Create your own persona by configuring:
- Name - Display name for the persona
- Voice - Choose from available voices
- System Instructions - Define the AI's behavior and role
- Greeting - Initial message spoken when call starts
| Category | Technology |
|---|---|
| ⚛️ Frontend | React 19 |
| 📘 Language | TypeScript 5.8 |
| ⚡ Build Tool | Vite 6 |
| 🤖 AI/ML | Google Gemini SDK |
| 🎨 Styling | Tailwind CSS |
| 🔊 Audio | Web Audio API |
ai-phone-agent/
├── 📁 components/ # React UI components
│ ├── CallScreen.tsx # Main call interface & audio handling
│ ├── WelcomeScreen.tsx # Persona selection screen
│ ├── StatusIndicator.tsx # Call status display
│ └── Icons.tsx # SVG icon components
├── 📁 services/
│ └── geminiService.ts # Gemini API integration
├── 📁 utils/
│ └── audioUtils.ts # Audio encoding utilities
├── 📄 App.tsx # Root component
├── 📄 types.ts # TypeScript definitions
├── 📄 constants.ts # Config & persona presets
└── 📄 vite.config.ts # Build configuration
┌─────────────┐ ┌──────────────┐ ┌─────────────┐
│ Microphone │────▶│ 16kHz PCM │────▶│ Gemini │
│ Input │ │ Base64 Encode│ │ Live API │
└─────────────┘ └──────────────┘ └──────┬──────┘
│
┌─────────────┐ ┌──────────────┐ │
│ Speaker │◀────│ 24kHz Decode │◀───────────┘
│ Output │ │ AudioBuffer │
└─────────────┘ └──────────────┘
| Command | Description |
|---|---|
npm run dev |
🚀 Start development server |
npm run build |
📦 Build for production |
npm run preview |
👁️ Preview production build |
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY |
✅ Yes | Your Google Gemini API key |
- Live Conversations:
gemini-2.5-flash-native-audio-preview-09-2025 - Text-to-Speech:
gemini-2.5-flash-preview-tts
- CLAUDE.MD - AI assistant context and codebase guide
- Google Gemini API - Gemini API documentation
- React Documentation - React framework docs
- Vite Guide - Vite build tool docs
# Create optimized build
npm run build
# Preview locally
npm run previewThe build output will be in the dist/ directory, ready for deployment to any static hosting service.
- ▲ Vercel - Zero-config deployment
- 🔷 Netlify - Simple drag & drop
- ☁️ Google Cloud Run - Containerized deployment
🅰️ AWS Amplify - Full-stack hosting
Note: HTTPS is required for microphone access in production environments.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Google Gemini - Powering the AI conversations
- React - UI framework
- Vite - Lightning fast build tool
- Tailwind CSS - Utility-first CSS framework
Built with Google Gemini by Anthony M