tshark [sudo apt-get install wireshark]
pandas
os
numpy
sklearn
pickle
sys
time
-
feature_extraction [python script] a. We use the tool
tsharkto extract required features from the .pcap files. b. TCP and UDP protocol packets are extracted into separate CSV files and combined at a later stage in the pipeline. -
feature_engineering [python script] a. The tcp and udp packets of the same session (each pcap file) are combined back into their original structure using the
frame.numberattribute to restore packet order integrity. The combined data is stored in apandas data frameb. This data is then split intobatches(batch size = 100) to analyze the packets in the form of clusters. c. To ensure proper variability of data in each clusterbootstrap samplingis applied to each batch. d. Summary statistics (such as mean, median etc) are calculated on the features of each cluster to summarise that cluster’s network profile. -
ml [python script] a. All the csv files containing batches of data are combined into one dataframe. b. Some features are dropped before running the dataframe through the machine learning model. c.
RandomForestClassifieris used to train the data. d. The trained model is saved usingpickleinbatch_network_traffic_classifier.sav
- Features are extracted.
- Feature engineering is done.
- Batches are formed.
- Classification of clusters is done.
- The
benignandddos_attacktraffic is outputted inFINAL_OUTPUT.txt.
-
ddosdetect.pycreates the folderTEMPORARY_FILESto store temporary.csv files, text files which contain the path to pcap files etc -
feature_extraction.pycreates the foldersCSV_FILES/BENIGNandCSV_FILES/MALWAREto store the data extracted from thepcapfiles usingtshark -
feature_engineering.pycreates the foldersCSV_FILES_BATCH,CSV_FILES_PATH_LIST,PCAP_CSV_FILESto store temporary files as needed. -
ml.pystores the trained Machine Learning model inbatch_network_traffic_classifier.sav
- In this work, the goal is to identify characteristics in network traffic that are able to distinguish the normal network behavior from DoS attacks.
- For feature extraction we used the command line tool tshark which helps parse pcap files with ease and form CSVs of required fields. We extracted the following
- Frame number- Helps mark the specific packets that have been marked as malicious in network traffic.
- Source IP address - Helps monitor the packets that have been sent to and from the host. This is essential in the output file.
- Destination IP address
- Protocol used- Since a large number of DDoS attacks are carried over protocols such as ICMP and UDP. The protocol used is a worthy feature to use in the classifier.
- TCP_SourcePort
- TCP_DestPort
- TCP_Flags
- UDP_SourcePort
- USP_DestPort
- UDP_Flags
Summary statistics on some features were calculated to ensure the importance of features is correctly measured.
- IP Protocol
- IP source - Mean, Median, Var, Standard Deviation, Cross Variance, Rate of change
- IP Destination - Mean, Median, Var, Standard Deviation, Cross Variance, Rate of change
- Source Port - Mean, Median, Var, Standard Deviation, Cross Variance, Rate of change
- Destination Port - Mean, Median, Var, Standard Deviation, Cross Variance, Rate of change
- TCP Flags - Mean, Median, Var, Standard Deviation, Cross Variance, Rate of change
![Confusion Matrix][/Confusion Matrix.png] ![Feature Importance][/Feature_Importance.png]