📄 DocChat - AI-Powered Document Assistant

Transform your documents into intelligent conversations 🚀

DocChat is an AI-powered document assistant that lets you instantly chat with your files. Upload PDFs, Word documents, or text files — ask questions, get precise answers, and extract insights in seconds using advanced embeddings and natural language processing. Built for professionals, researchers, and teams that need fast, context-aware understanding of their documents.

✨ Features

🔍 Advanced Document Processing

Multi-format Support: Upload PDFs, DOCX files, and text documents
Intelligent Text Extraction: Advanced parsing with pdf2json and mammoth libraries
Smart Chunking: Intelligent text segmentation for optimal context retrieval
Vector Embeddings: Transform documents into searchable vector representations

🤖 AI-Powered Conversations

RAG (Retrieval-Augmented Generation): Combines document retrieval with AI generation
Context-Aware Responses: AI understands document context and provides relevant answers
Conversation Memory: Maintains chat history and session continuity
Natural Language Processing: Understands complex queries and provides human-like responses

🔐 Secure & Scalable

User Authentication: Supabase-powered authentication system
File Storage: Secure cloud storage with Supabase Storage
Session Management: Persistent chat sessions across browser sessions
Real-time Updates: Live chat interface with instant responses

📱 Modern UI/UX

Responsive Design: Works seamlessly on desktop and mobile devices
Dark Theme: Modern, eye-friendly dark interface
Real-time Feedback: Toast notifications and loading states
Intuitive Navigation: Sidebar file management and chat interface

🛠️ Technology Stack

Frontend

Next.js 15 - React framework with App Router
TypeScript - Type-safe development
Tailwind CSS - Utility-first styling
React Hot Toast - User notifications
Lucide React - Beautiful icons

Backend & AI

Supabase - Database, authentication, and storage
OpenRouter API - GPT-3.5-turbo integration
Xenova Transformers - Local embedding generation
Vector Search - Semantic document retrieval

Document Processing

pdf2json - PDF text extraction
mammoth - DOCX document parsing
Custom Chunking - Intelligent text segmentation
Embedding Pipeline - All-MiniLM-L6-v2 model

Database & Storage

PostgreSQL - Primary database (via Supabase)
pgvector - Vector similarity search
Supabase Storage - File upload and management

🚀 Getting Started

Prerequisites

Node.js 18+
npm, yarn, pnpm, or bun
Supabase account
OpenRouter API key

Installation

Clone the repository

git clone <your-repo-url>
cd supabase-foundation

Install dependencies

npm install
# or
yarn install
# or
pnpm install

Environment Setup Create a .env.local file with your configuration:

NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key
OPENROUTER_API_KEY=your_openrouter_api_key

Database Setup Set up your Supabase database with the required tables:

documents - Document metadata and text content
vectors - Vector embeddings for semantic search
sessions - Chat session management
messages - Chat message history

Run the development server

npm run dev
# or
yarn dev
# or
pnpm dev

Open http://localhost:3000 to see your application!

🏗️ Architecture

Document Processing Pipeline

Upload → Parse → Chunk → Embed → Store → Query → RAG → Response

Upload: User uploads PDF/DOCX file
Parse: Extract text using specialized parsers
Chunk: Split text into semantic chunks (600 chars max)
Embed: Generate vector embeddings using All-MiniLM-L6-v2
Store: Save chunks and embeddings to PostgreSQL with pgvector
Query: Semantic search for relevant chunks
RAG: Combine chunks with AI for context-aware responses

Vector Search Implementation

Embedding Model: All-MiniLM-L6-v2 (384 dimensions)
Similarity: Cosine similarity with 0.15 threshold
Retrieval: Top 5 most relevant chunks per query
Context: Document-specific search with session management

AI Integration

Model: GPT-3.5-turbo via OpenRouter
Context Window: 512 tokens max
Temperature: 0.1 for consistent responses
System Prompt: Optimized for document assistance

📊 Key Features Explained

RAG (Retrieval-Augmented Generation)

DocChat implements a sophisticated RAG system:

Semantic Search: Uses vector embeddings to find relevant document chunks
Context Assembly: Combines retrieved chunks with user query
AI Generation: GPT-3.5-turbo generates responses based on document context
Conversation Memory: Maintains chat history for contextual understanding

Vector Search

Local Embeddings: Uses Xenova Transformers for client-side embedding generation
pgvector Integration: PostgreSQL extension for efficient vector similarity search
Threshold-based Retrieval: Only returns chunks above similarity threshold
Document Isolation: Search within specific documents or across all documents

Session Management

Persistent Sessions: Chat sessions persist across browser sessions
Document Context: Each session is tied to a specific document
Message History: Maintains conversation context for better AI responses
Session Reuse: Automatically resumes existing sessions for documents

🔧 Configuration

Environment Variables

# Supabase Configuration
NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_anon_key
SUPABASE_SERVICE_ROLE_KEY=your_supabase_service_role_key

# AI Configuration
OPENROUTER_API_KEY=your_openrouter_api_key

# Optional: Customize AI model
AI_MODEL=openai/gpt-3.5-turbo

Database Schema

-- Documents table
CREATE TABLE documents (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES auth.users(id),
  name TEXT NOT NULL,
  path TEXT NOT NULL,
  ext TEXT NOT NULL,
  text TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Vectors table for embeddings
CREATE TABLE vectors (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  document_id UUID REFERENCES documents(id),
  content TEXT NOT NULL,
  embedding vector(384),
  created_at TIMESTAMP DEFAULT NOW()
);

-- Sessions table
CREATE TABLE sessions (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id UUID REFERENCES auth.users(id),
  document_id UUID REFERENCES documents(id),
  name TEXT,
  created_at TIMESTAMP DEFAULT NOW()
);

-- Messages table
CREATE TABLE messages (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  session_id UUID REFERENCES sessions(id),
  role TEXT NOT NULL,
  content TEXT NOT NULL,
  created_at TIMESTAMP DEFAULT NOW()
);

🚀 Deployment

Vercel Deployment

Connect your GitHub repository to Vercel
Add environment variables in Vercel dashboard
Deploy with automatic builds on push

Supabase Setup

Create a new Supabase project
Enable pgvector extension
Run database migrations
Configure storage buckets for file uploads

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

🙏 Acknowledgments

Supabase for the amazing backend-as-a-service
OpenRouter for seamless AI model access
Xenova for client-side transformer models
Next.js team for the incredible React framework
Vercel for seamless deployment

Built with ❤️ for the developer community

AI Powered by: https://openrouter.ai/openai/gpt-3.5-turbo

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
app		app
assets		assets
components		components
constants		constants
db		db
lib		lib
server		server
.gitignore		.gitignore
README.md		README.md
eslint.config.mjs		eslint.config.mjs
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📄 DocChat - AI-Powered Document Assistant

✨ Features

🔍 Advanced Document Processing

🤖 AI-Powered Conversations

🔐 Secure & Scalable

📱 Modern UI/UX

🛠️ Technology Stack

Frontend

Backend & AI

Document Processing

Database & Storage

🚀 Getting Started

Prerequisites

Installation

🏗️ Architecture

Document Processing Pipeline

Vector Search Implementation

AI Integration

📊 Key Features Explained

RAG (Retrieval-Augmented Generation)

Vector Search

Session Management

🔧 Configuration

Environment Variables

Database Schema

🚀 Deployment

Vercel Deployment

Supabase Setup

🤝 Contributing

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

connect-abdullah/DocChat-RAG-Assistant

Folders and files

Latest commit

History

Repository files navigation

📄 DocChat - AI-Powered Document Assistant

✨ Features

🔍 Advanced Document Processing

🤖 AI-Powered Conversations

🔐 Secure & Scalable

📱 Modern UI/UX

🛠️ Technology Stack

Frontend

Backend & AI

Document Processing

Database & Storage

🚀 Getting Started

Prerequisites

Installation

🏗️ Architecture

Document Processing Pipeline

Vector Search Implementation

AI Integration

📊 Key Features Explained

RAG (Retrieval-Augmented Generation)

Vector Search

Session Management

🔧 Configuration

Environment Variables

Database Schema

🚀 Deployment

Vercel Deployment

Supabase Setup

🤝 Contributing

🙏 Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages