Skip to content

Detects AI-generated (deepfake) voices using MFCC features with CNN, Random Forest, and KNN models. Achieves ~98% accuracy using engineered audio features and the SceneFake dataset.

License

Notifications You must be signed in to change notification settings

Srujanrana07/DeepFake-Voice-Detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”Š DeepFake Voice Detection

A robust machine learning system for identifying AI-generated (deepfake) voices using MFCC features, spectral audio statistics, and a combination of Deep Learning (CNN) and Classical ML (Random Forest, KNN) models.


⭐ Key Features

  • Detects real vs fake speech with 98%+ accuracy
  • Uses MFCCs, Mel-spectrogram statistics, and spectral features
  • Implements 1D CNN, Random Forest, and KNN
  • Includes full evaluation metrics with placeholders for all plots
  • Built to support future deployment for fraud and security applications

πŸ“‚ Dataset

  • Audio format: 16 kHz, mono WAV
  • Classes: Real, Fake
  • Includes various SNR levels and noise-reduction methods
  • Dataset contains class imbalance, handled using SMOTE

🎧 Feature Extraction

MFCC Features (40-D)

  • Extracted using Librosa
  • Mean-pooled across time
  • Used as input to the 1D CNN

Engineered Audio Features (318-D)

  • MFCC means
  • Mel-Spectrogram means
  • Log-Spectrogram (STFT) means
  • Used for Random Forest and KNN models

πŸ—οΈ System Architecture

           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€ Preprocessing ───────────────┐
           β”‚                                               β”‚
Audio β†’ Load β†’ Normalize β†’ MFCC / Spectrogram Extraction β†’ Features
           β”‚                                               β”‚
           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  β”‚
       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
       β”‚                                                     β”‚
  318-D Engineered Features                          MFCC Map (40Γ—1)
       β”‚                                                     β”‚
Random Forest / KNN                                     1D CNN Model
       β”‚                                                     β”‚
       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                  ↓
                           **Real / Fake**

🧠 Models Implemented

1️⃣ 1D CNN (MFCC-Based)

βœ” Architecture

  • Conv1D β†’ Dropout
  • MaxPooling
  • Conv1D β†’ Dropout
  • Dense + Softmax
  • Trained for 40 epochs

βœ” Features Used

CNN Features

βœ” Performance

  • Dev Accuracy: ~86%
  • Eval Accuracy: ~88%
  • Strong on detecting fake audio
  • Mild overfitting

πŸ“Š CNN Evaluation (Placeholders)

🟦 Training Accuracy Curve

CNN Accuracy Curve

πŸŸ₯ Training Loss Curve

CNN Loss Curve

🟩 Confusion Matrix (CNN)

CNN Confusion Matrix

πŸŸͺ ROC Curve (CNN)

CNN ROC Curve

🟨 Precision–Recall Curve (CNN)

presision and recall

βœ” Features Used For the 318 Dimention

RN & KNN Features

2️⃣ Random Forest Classifier

βœ” Performance

  • Accuracy: 98.82%
  • High precision & recall on both classes
  • Extremely robust to noise & dataset variance

🟩 Confusion Matrix (RF)

RF Confusion Matrix


3️⃣ K-Nearest Neighbours (KNN)

βœ” Performance

  • Accuracy: 98.29%
  • Very stable across different samples
  • k = 7 chosen for optimal performance

🟩 Confusion Matrix (KNN)

KNN Confusion Matrix


πŸ“Š Overall Model Comparison

Model Accuracy Strengths Weaknesses
Random Forest ⭐ 98.82% Best overall, robust to noise Slow to train on huge datasets
KNN (k=7) 98.29% Simple & competitive Slow inference on large data
CNN (MFCCs) ~88% Learns temporal patterns Overfitting risk

🚧 Limitations

  • Dataset imbalance required oversampling
  • CNN performance limited by MFCC-only representation
  • Needs evaluation on unseen deepfake generators
  • Real-world recordings with background noise not fully tested

πŸš€ Future Enhancements

  • Use 2D CNNs on spectrogram images
  • Add transformer-based encoders (wav2vec 2.0, HuBERT, Whisper)
  • Deploy as a web or mobile app for live detection
  • Add adversarial robustness
  • Add explainable AI for forensic usage

πŸ› οΈ Tech Stack

  • Python
  • Librosa – audio processing
  • TensorFlow / Keras – CNN model
  • scikit-learn – RF, KNN, SMOTE
  • NumPy / pandas – preprocessing

πŸ™Œ Contributors

Srujan Rana


About

Detects AI-generated (deepfake) voices using MFCC features with CNN, Random Forest, and KNN models. Achieves ~98% accuracy using engineered audio features and the SceneFake dataset.

Topics

Resources

License

Stars

Watchers

Forks