Skip to content

Unsupervised anomaly detection on CIC-IDS 2017 (Monday BENIGN traffic) using Isolation Forest and PCA visualization — complete preprocessing and EDA included.

Notifications You must be signed in to change notification settings

Br7eleven/cicids2017-monday-anomaly-detection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

🛡️ CICIDS2017 - Monday Anomaly Detection

This project focuses on analyzing and detecting anomalies in the CICIDS 2017 Monday dataset using unsupervised machine learning methods. The dataset simulates normal and malicious network traffic generated in a controlled environment, specifically from Monday's working hours (BENIGN only).


📁 Dataset Overview

  • Source: CICIDS2017 Dataset
  • File Used: Monday-WorkingHours.pcap_ISCX.csv
  • Rows: ~490K
  • Columns: 71
  • Label: Only BENIGN traffic is present in this file.

📊 Objective

The objective is to:

  • Clean & preprocess the dataset,
  • Visualize it using PCA & statistical plots,
  • Apply unsupervised anomaly detection via Isolation Forest,
  • Evaluate model performance on clean network data.

🔧 Steps Performed

✅ Data Cleaning

  • Removed duplicate rows (~66k).
  • Handled missing values.
  • Replaced infinite values with NaN and filled with mean.
  • Reset and renamed messy column headers (F0, F1, ..., F69, Label).

📉 Preprocessing

  • Converted Label to numerical format (BENIGN = 0).
  • Verified feature distributions before and after cleaning.
  • Saved the cleaned dataset for reuse.

📌 Visualization

  • PCA plots generated:
    • Before cleaning
    • After cleaning
  • Distributions of features using seaborn and matplotlib.

🧠 Anomaly Detection

  • Used Isolation Forest (unsupervised).
  • Identified outliers in what should be "clean" benign traffic.

              precision    recall  f1-score   support

      BENIGN       1.00      0.98      0.99    423758
     Anomaly       0.00      0.00      0.00         0

    accuracy                           0.98    423758
   macro avg       0.50      0.49      0.49    423758
weighted avg       1.00      0.98      0.99    423758


🚀 How to Run (Google Colab)

  1. Open monday_to_sunday.ipynb in Google Colab
  2. Upload the dataset: Monday-WorkingHours.pcap_ISCX.csv from CICIDS2017
  3. Run the notebook cells step-by-step:
    • Cleaning & preprocessing
    • PCA visualizations
    • Model training & anomaly detection

💡 Future Improvements

  • Combine Monday–Friday data for multi-class classification
  • Apply supervised models like RandomForest, XGBoost
  • Test other unsupervised models: One-Class SVM, AutoEncoders, LOF
  • Build complete intrusion detection pipeline using all CICIDS2017 days

🧠 Technologies Used

  • Python 3.x
  • pandas, numpy
  • seaborn, matplotlib
  • scikit-learn
  • Google Colab

📂 Project Structure

📁 cicids2017-monday-anomaly-detection ├── monday_to_sunday.ipynb ← Jupyter Notebook with full workflow └── README.md ← Project documentation


📚 References


🔖 GitHub Topics (Tags)

cybersecurity, anomaly-detection, intrusion-detection, cicids2017, isolation-forest, unsupervised-learning, pca, data-cleaning, network-traffic


🧑‍💻 Author

Br7eleven

📄 License

This project is licensed under the MIT License.
You are free to use, modify, and distribute it. See the LICENSE file for more details.


☕ Support

If you found this project helpful or interesting, consider supporting me :)

About

Unsupervised anomaly detection on CIC-IDS 2017 (Monday BENIGN traffic) using Isolation Forest and PCA visualization — complete preprocessing and EDA included.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published