🔍 Network Intrusion Detection Clustering

A Python implementation of clustering techniques applied to the KDD Cup 1999 Intrusion Detection System (IDS) dataset, demonstrating both 2D and 3D visualizations with and without Principal Component Analysis (PCA).

📚 Description

This repository contains code for analyzing and visualizing network intrusion detection data using K-means clustering. It demonstrates how different clustering approaches can help identify patterns in network traffic that may indicate various types of attacks. The implementation showcases:

Data preprocessing and cleaning techniques
Attack categorization and classification
Dimensionality reduction using PCA
2D and 3D visualization of clustering results

🧮 Dataset

The project uses the KDD Cup 1999 Intrusion Detection System dataset, which contains a wide variety of simulated intrusions in a military network environment. The dataset includes:

Normal connections
Four main categories of attacks:
- Denial of Service (DoS)
- User to Root (U2R)
- Remote to Local (R2L)
- Probing

Each connection in the dataset is represented by 41 features and labeled as either normal or a specific type of attack.

🔧 Prerequisites

To run this code, you'll need:

Python 3.x
The following Python libraries:
- pandas
- numpy
- scikit-learn
- matplotlib

🚀 Setup

Clone this repository:

git clone https://github.com/username/network-intrusion-clustering.git
cd network-intrusion-clustering

Install the required dependencies:

pip install pandas numpy scikit-learn matplotlib

Ensure the KDD Cup dataset file (kddcup.data_10_percent) is in the root directory of the project.

💻 Usage

Run the main script to perform clustering and visualization:

python main.py

This will:

Load and preprocess the KDD Cup dataset
Apply K-means clustering with 5 clusters
Generate four visualization plots:
- 2D clustering without PCA
- 2D clustering with PCA
- 3D clustering without PCA
- 3D clustering with PCA

✨ Features

Data Preprocessing: Handles duplicates, outliers, and encodes categorical features
Attack Categorization: Classifies attacks into five categories (normal, DoS, U2R, R2L, probe)
Flexible Sampling: Supports adjustable dataset sampling for testing and development
Multiple Visualization Options: Provides both 2D and 3D visualizations with and without dimensionality reduction

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
kddcup.data_10_percent		kddcup.data_10_percent
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔍 Network Intrusion Detection Clustering

📚 Description

🧮 Dataset

🔧 Prerequisites

🚀 Setup

💻 Usage

✨ Features

📄 License

🔗 Resources

About

Releases

Packages

Languages

License

corticalstack/Clustering

Folders and files

Latest commit

History

Repository files navigation

🔍 Network Intrusion Detection Clustering

📚 Description

🧮 Dataset

🔧 Prerequisites

🚀 Setup

💻 Usage

✨ Features

📄 License

🔗 Resources

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages