Skip to content

Latest commit

 

History

History
253 lines (172 loc) · 8.06 KB

File metadata and controls

253 lines (172 loc) · 8.06 KB
AI Phone Agent Banner

📞 AI Phone Agent

Intelligent AI-powered voice assistant for automating phone calls

React TypeScript Vite Google Gemini License

✨ Features🚀 Quick Start🎭 Personas🏗️ Architecture📖 Documentation


🌟 Overview

AI Phone Agent is a cutting-edge voice-based AI application that conducts real-time phone conversations using Google Gemini's advanced audio streaming capabilities. It features speech-to-speech interaction with customizable AI personas for various use cases like booking reservations, handling customer calls, and providing tech support.

🎙️ Real-time Voice 🤖 Multiple Personas 📝 Live Transcription 🔊 Natural Speech
Bidirectional audio streaming 5 built-in presets + custom See conversations in real-time Multiple voice options

✨ Features

  • 🗣️ Real-time Voice Conversations - Bidirectional audio streaming with Google Gemini
  • 🎭 Customizable Personas - Switch between different AI personalities or create your own
  • 📝 Live Transcription - See both user and agent speech transcribed in real-time
  • 🔊 Multiple Voices - Choose from 5 different voice options (Puck, Charon, Kore, Fenrir, Zephyr)
  • Low Latency - Optimized audio pipeline for natural conversation flow
  • 🎨 Modern UI - Clean, phone-like interface built with React and Tailwind CSS
  • 📱 Responsive Design - Works seamlessly across devices

🚀 Quick Start

Prerequisites

  • 📦 Node.js (v18 or higher recommended)
  • 🔑 Google Gemini API Key - Get one at Google AI Studio

Installation

# Clone the repository
git clone https://github.com/yourusername/ai-phone-agent.git
cd ai-phone-agent

# Install dependencies
npm install

# Configure environment
cp .env.example .env.local

Configuration

Create a .env.local file in the root directory:

GEMINI_API_KEY=your_gemini_api_key_here

Running the App

# Start development server
npm run dev

🎉 Open http://localhost:3000 in your browser!


🎭 Personas

AI Phone Agent comes with 5 pre-configured personas for common use cases:

Persona Description Voice Use Case
🧑‍💼 Personal Assistant Helpful assistant for general tasks Kore General inquiries & tasks
🍽️ Restaurant Booker Makes dinner reservations Zephyr Outbound booking calls
🏢 Business Receptionist Answers calls for TechSolutions Inc Puck Inbound business calls
🔧 Tech Support Troubleshoots internet issues Fenrir Customer support
📋 Call Screener Screens incoming calls Charon Call filtering

Custom Personas

Create your own persona by configuring:

  • Name - Display name for the persona
  • Voice - Choose from available voices
  • System Instructions - Define the AI's behavior and role
  • Greeting - Initial message spoken when call starts

🛠️ Tech Stack

Category Technology
⚛️ Frontend React 19
📘 Language TypeScript 5.8
Build Tool Vite 6
🤖 AI/ML Google Gemini SDK
🎨 Styling Tailwind CSS
🔊 Audio Web Audio API

🏗️ Architecture

ai-phone-agent/
├── 📁 components/           # React UI components
│   ├── CallScreen.tsx       # Main call interface & audio handling
│   ├── WelcomeScreen.tsx    # Persona selection screen
│   ├── StatusIndicator.tsx  # Call status display
│   └── Icons.tsx            # SVG icon components
├── 📁 services/
│   └── geminiService.ts     # Gemini API integration
├── 📁 utils/
│   └── audioUtils.ts        # Audio encoding utilities
├── 📄 App.tsx               # Root component
├── 📄 types.ts              # TypeScript definitions
├── 📄 constants.ts          # Config & persona presets
└── 📄 vite.config.ts        # Build configuration

Audio Pipeline

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│  Microphone │────▶│ 16kHz PCM    │────▶│   Gemini    │
│   Input     │     │ Base64 Encode│     │   Live API  │
└─────────────┘     └──────────────┘     └──────┬──────┘
                                                │
┌─────────────┐     ┌──────────────┐            │
│   Speaker   │◀────│ 24kHz Decode │◀───────────┘
│   Output    │     │ AudioBuffer  │
└─────────────┘     └──────────────┘

📝 Scripts

Command Description
npm run dev 🚀 Start development server
npm run build 📦 Build for production
npm run preview 👁️ Preview production build

🔧 Configuration

Environment Variables

Variable Required Description
GEMINI_API_KEY ✅ Yes Your Google Gemini API key

Gemini Models Used

  • Live Conversations: gemini-2.5-flash-native-audio-preview-09-2025
  • Text-to-Speech: gemini-2.5-flash-preview-tts

📖 Documentation


🌐 Deployment

Production Build

# Create optimized build
npm run build

# Preview locally
npm run preview

The build output will be in the dist/ directory, ready for deployment to any static hosting service.

Hosting Options

  • Vercel - Zero-config deployment
  • 🔷 Netlify - Simple drag & drop
  • ☁️ Google Cloud Run - Containerized deployment
  • 🅰️ AWS Amplify - Full-stack hosting

Note: HTTPS is required for microphone access in production environments.


🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.


🙏 Acknowledgments


Built with Google Gemini by Anthony M

⬆ Back to Top