📞 AI Phone Agent

Intelligent AI-powered voice assistant for automating phone calls

✨ Features • 🚀 Quick Start • 🎭 Personas • 🏗️ Architecture • 📖 Documentation

🌟 Overview

AI Phone Agent is a cutting-edge voice-based AI application that conducts real-time phone conversations using Google Gemini's advanced audio streaming capabilities. It features speech-to-speech interaction with customizable AI personas for various use cases like booking reservations, handling customer calls, and providing tech support.

🎙️ Real-time Voice	🤖 Multiple Personas	📝 Live Transcription	🔊 Natural Speech
Bidirectional audio streaming	5 built-in presets + custom	See conversations in real-time	Multiple voice options

✨ Features

🗣️ Real-time Voice Conversations - Bidirectional audio streaming with Google Gemini
🎭 Customizable Personas - Switch between different AI personalities or create your own
📝 Live Transcription - See both user and agent speech transcribed in real-time
🔊 Multiple Voices - Choose from 5 different voice options (Puck, Charon, Kore, Fenrir, Zephyr)
⚡ Low Latency - Optimized audio pipeline for natural conversation flow
🎨 Modern UI - Clean, phone-like interface built with React and Tailwind CSS
📱 Responsive Design - Works seamlessly across devices

🚀 Quick Start

Prerequisites

📦 Node.js (v18 or higher recommended)
🔑 Google Gemini API Key - Get one at Google AI Studio

Installation

# Clone the repository
git clone https://github.com/yourusername/ai-phone-agent.git
cd ai-phone-agent

# Install dependencies
npm install

# Configure environment
cp .env.example .env.local

Configuration

Create a .env.local file in the root directory:

GEMINI_API_KEY=your_gemini_api_key_here

Running the App

# Start development server
npm run dev

🎉 Open http://localhost:3000 in your browser!

🎭 Personas

AI Phone Agent comes with 5 pre-configured personas for common use cases:

Persona	Description	Voice	Use Case
🧑‍💼 Personal Assistant	Helpful assistant for general tasks	Kore	General inquiries & tasks
🍽️ Restaurant Booker	Makes dinner reservations	Zephyr	Outbound booking calls
🏢 Business Receptionist	Answers calls for TechSolutions Inc	Puck	Inbound business calls
🔧 Tech Support	Troubleshoots internet issues	Fenrir	Customer support
📋 Call Screener	Screens incoming calls	Charon	Call filtering

Custom Personas

Create your own persona by configuring:

Name - Display name for the persona
Voice - Choose from available voices
System Instructions - Define the AI's behavior and role
Greeting - Initial message spoken when call starts

🛠️ Tech Stack

Category	Technology
⚛️ Frontend	React 19
📘 Language	TypeScript 5.8
⚡ Build Tool	Vite 6
🤖 AI/ML	Google Gemini SDK
🎨 Styling	Tailwind CSS
🔊 Audio	Web Audio API

🏗️ Architecture

ai-phone-agent/
├── 📁 components/           # React UI components
│   ├── CallScreen.tsx       # Main call interface & audio handling
│   ├── WelcomeScreen.tsx    # Persona selection screen
│   ├── StatusIndicator.tsx  # Call status display
│   └── Icons.tsx            # SVG icon components
├── 📁 services/
│   └── geminiService.ts     # Gemini API integration
├── 📁 utils/
│   └── audioUtils.ts        # Audio encoding utilities
├── 📄 App.tsx               # Root component
├── 📄 types.ts              # TypeScript definitions
├── 📄 constants.ts          # Config & persona presets
└── 📄 vite.config.ts        # Build configuration

Audio Pipeline

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Microphone │────▶│ 16kHz PCM    │────▶│   Gemini    │
│   Input     │     │ Base64 Encode│     │   Live API  │
└─────────────┘     └──────────────┘     └──────┬──────┘
                                                │
┌─────────────┐     ┌──────────────┐            │
│   Speaker   │◀────│ 24kHz Decode │◀───────────┘
│   Output    │     │ AudioBuffer  │
└─────────────┘     └──────────────┘

📝 Scripts

Command	Description
`npm run dev`	🚀 Start development server
`npm run build`	📦 Build for production
`npm run preview`	👁️ Preview production build

🔧 Configuration

Environment Variables

Variable	Required	Description
`GEMINI_API_KEY`	✅ Yes	Your Google Gemini API key

Gemini Models Used

Live Conversations: gemini-2.5-flash-native-audio-preview-09-2025
Text-to-Speech: gemini-2.5-flash-preview-tts

📖 Documentation

CLAUDE.MD - AI assistant context and codebase guide
Google Gemini API - Gemini API documentation
React Documentation - React framework docs
Vite Guide - Vite build tool docs

🌐 Deployment

Production Build

# Create optimized build
npm run build

# Preview locally
npm run preview

The build output will be in the dist/ directory, ready for deployment to any static hosting service.

Hosting Options

▲ Vercel - Zero-config deployment
🔷 Netlify - Simple drag & drop
☁️ Google Cloud Run - Containerized deployment
🅰️ AWS Amplify - Full-stack hosting

Note: HTTPS is required for microphone access in production environments.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Google Gemini - Powering the AI conversations
React - UI framework
Vite - Lightning fast build tool
Tailwind CSS - Utility-first CSS framework

Built with Google Gemini by Anthony M

⬆ Back to Top

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

📞 AI Phone Agent

🌟 Overview

✨ Features

🚀 Quick Start

Prerequisites

Installation

Configuration

Running the App

🎭 Personas

Custom Personas

🛠️ Tech Stack

🏗️ Architecture

Audio Pipeline

📝 Scripts

🔧 Configuration

Environment Variables

Gemini Models Used

📖 Documentation

🌐 Deployment

Production Build

Hosting Options

🤝 Contributing

📄 License

🙏 Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

📞 AI Phone Agent

🌟 Overview

✨ Features

🚀 Quick Start

Prerequisites

Installation

Configuration

Running the App

🎭 Personas

Custom Personas

🛠️ Tech Stack

🏗️ Architecture

Audio Pipeline

📝 Scripts

🔧 Configuration

Environment Variables

Gemini Models Used

📖 Documentation

🌐 Deployment

Production Build

Hosting Options

🤝 Contributing

📄 License

🙏 Acknowledgments