Skip to content

Phishing detection pipeline combining domain, content, and visual screenshot comparison for real vs suspect websites.

Notifications You must be signed in to change notification settings

Hitarth-S/PhishCompare

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🛡️ Phishing URL Detection Pipeline

This project compares a legitimate (real) URL against a suspect URL to detect possible phishing. It checks for:

  • 🔗 Domain & URL similarity (lookalike domains, subdomains, punycode)
  • 📄 Page text similarity
  • 📸 Visual similarity using full-page screenshots
  • 📝 JSON report with results & screenshot paths

--

📦 Installation

  1. Clone this repo / download the code
git clone https://github.com/yourusername/phishing-checker.git
cd phishing-checker
  1. Install dependencies
pip install -r requirements.txt

requirements.txt should contain:

beautifulsoup4
tldextract
requests
opencv-python
scikit-image
undetected-chromedriver
selenium

🚀 Usage

1. Run with CLI script

python phishing_checker_cli.py <real_url> <suspect_url>

Example:

python phishing_checker_cli.py https://www.paypal.com/login https://paypal.secure-login-support.com

2. Output

  • A JSON report will be saved in the current folder:

    result_2025-08-16_18-05-40.json
    
  • Screenshots will be saved in a timestamped folder:

    screenshots/2025-08-16_18-05-40/
        ├── real.png        # Legitimate page
        ├── suspect.png     # Suspect page
        └── diff.png        # Highlighted visual differences
    
  • Example JSON output:

{
    "real_domain": "paypal.com",
    "suspect_domain": "secure-login-support.com",
    "domain_similarity": 0.61,
    "url_findings": [
        "Domain mismatch",
        "Lookalike domain"
    ],
    "text_similarity": 0.42,
    "visual_similarity": 0.75,
    "real_screenshot": "screenshots/2025-08-16_18-05-40/real.png",
    "suspect_screenshot": "screenshots/2025-08-16_18-05-40/suspect.png",
    "diff_image": "screenshots/2025-08-16_18-05-40/diff.png",
    "phishing_likely": true
}

⚙️ Files

  • phishing_checker.py → main pipeline (logic + screenshot + comparison)
  • phishing_checker_cli.py → command line wrapper for easy usage
  • screenshots/ → auto-generated timestamped screenshot folders
  • result_*.json → JSON reports per run

🛠 Notes

  • Runs in headless mode using undetected-chromedriver (Chrome required).
  • Works on Linux / macOS / Windows.
  • Needs internet connection to fetch pages.
  • For heavy use, increase timeout in requests / Selenium.

About

Phishing detection pipeline combining domain, content, and visual screenshot comparison for real vs suspect websites.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages