Skip to content

corticalstack/Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🔍 Network Intrusion Detection Clustering

A Python implementation of clustering techniques applied to the KDD Cup 1999 Intrusion Detection System (IDS) dataset, demonstrating both 2D and 3D visualizations with and without Principal Component Analysis (PCA).

📚 Description

This repository contains code for analyzing and visualizing network intrusion detection data using K-means clustering. It demonstrates how different clustering approaches can help identify patterns in network traffic that may indicate various types of attacks. The implementation showcases:

  • Data preprocessing and cleaning techniques
  • Attack categorization and classification
  • Dimensionality reduction using PCA
  • 2D and 3D visualization of clustering results

🧮 Dataset

The project uses the KDD Cup 1999 Intrusion Detection System dataset, which contains a wide variety of simulated intrusions in a military network environment. The dataset includes:

  • Normal connections
  • Four main categories of attacks:
    • Denial of Service (DoS)
    • User to Root (U2R)
    • Remote to Local (R2L)
    • Probing

Each connection in the dataset is represented by 41 features and labeled as either normal or a specific type of attack.

🔧 Prerequisites

To run this code, you'll need:

  • Python 3.x
  • The following Python libraries:
    • pandas
    • numpy
    • scikit-learn
    • matplotlib

🚀 Setup

  1. Clone this repository:
git clone https://github.com/username/network-intrusion-clustering.git
cd network-intrusion-clustering
  1. Install the required dependencies:
pip install pandas numpy scikit-learn matplotlib
  1. Ensure the KDD Cup dataset file (kddcup.data_10_percent) is in the root directory of the project.

💻 Usage

Run the main script to perform clustering and visualization:

python main.py

This will:

  1. Load and preprocess the KDD Cup dataset
  2. Apply K-means clustering with 5 clusters
  3. Generate four visualization plots:
    • 2D clustering without PCA
    • 2D clustering with PCA
    • 3D clustering without PCA
    • 3D clustering with PCA

✨ Features

  • Data Preprocessing: Handles duplicates, outliers, and encodes categorical features
  • Attack Categorization: Classifies attacks into five categories (normal, DoS, U2R, R2L, probe)
  • Flexible Sampling: Supports adjustable dataset sampling for testing and development
  • Multiple Visualization Options: Provides both 2D and 3D visualizations with and without dimensionality reduction

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🔗 Resources

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages