This project focuses on analyzing and detecting anomalies in the CICIDS 2017 Monday dataset using unsupervised machine learning methods. The dataset simulates normal and malicious network traffic generated in a controlled environment, specifically from Monday's working hours (BENIGN only).
- Source: CICIDS2017 Dataset
- File Used:
Monday-WorkingHours.pcap_ISCX.csv - Rows: ~490K
- Columns: 71
- Label: Only
BENIGNtraffic is present in this file.
The objective is to:
- Clean & preprocess the dataset,
- Visualize it using PCA & statistical plots,
- Apply unsupervised anomaly detection via Isolation Forest,
- Evaluate model performance on clean network data.
- Removed duplicate rows (~66k).
- Handled missing values.
- Replaced infinite values with NaN and filled with mean.
- Reset and renamed messy column headers (
F0,F1, ...,F69,Label).
- Converted
Labelto numerical format (BENIGN = 0). - Verified feature distributions before and after cleaning.
- Saved the cleaned dataset for reuse.
- PCA plots generated:
- Before cleaning
- After cleaning
- Distributions of features using
seabornandmatplotlib.
- Used Isolation Forest (unsupervised).
- Identified outliers in what should be "clean" benign traffic.
precision recall f1-score support
BENIGN 1.00 0.98 0.99 423758
Anomaly 0.00 0.00 0.00 0
accuracy 0.98 423758
macro avg 0.50 0.49 0.49 423758
weighted avg 1.00 0.98 0.99 423758
- Open
monday_to_sunday.ipynbin Google Colab - Upload the dataset:
Monday-WorkingHours.pcap_ISCX.csvfrom CICIDS2017 - Run the notebook cells step-by-step:
- Cleaning & preprocessing
- PCA visualizations
- Model training & anomaly detection
- Combine Monday–Friday data for multi-class classification
- Apply supervised models like
RandomForest,XGBoost - Test other unsupervised models:
One-Class SVM,AutoEncoders,LOF - Build complete intrusion detection pipeline using all CICIDS2017 days
- Python 3.x
- pandas, numpy
- seaborn, matplotlib
- scikit-learn
- Google Colab
📁 cicids2017-monday-anomaly-detection ├── monday_to_sunday.ipynb ← Jupyter Notebook with full workflow └── README.md ← Project documentation
cybersecurity, anomaly-detection, intrusion-detection, cicids2017, isolation-forest, unsupervised-learning, pca, data-cleaning, network-traffic
Br7eleven
This project is licensed under the MIT License.
You are free to use, modify, and distribute it. See the LICENSE file for more details.
If you found this project helpful or interesting, consider supporting me :)