/ids-prediction

๐Ÿ“ถ Predicting the type of Cyber attack based on Network Packets (Intrusion) using Machine Learning models

Primary LanguageJupyter Notebook

INTRUSION DETECTION PREDICTION

Predicting the type of Cyber attack based on Network Packets (Intrusion) using Machine Learning models


๐ŸŒŸ EXPERIENCE HERE ๐ŸŒŸ

https://huggingface.co/spaces/raghavtwenty/ids-prediction


PROTOTYPE VIDEO

video.mov



HOW TO EXECUTE

Terminal

git clone https://github.com/raghavtwenty/ids-prediction.git

cd ids-prediction/

pip install -r requirements.txt

cd gradio/

gradio ids_ml_gradio.py

Web Browser

http://127.0.0.1:7860/

PROBLEM

A firewall alone doesnโ€™t provide adequate protection against modern cyber threats. Malware and other malicious content are often delivered using legitimate types of traffic, such as email, or web traffic. In order to solve this problem we need to step in further and examine the network traffic, this is where the Intrusion Detection System plays a major role.


WHAT IS IDS?

An Intrusion Detection System (IDS) is a network security technology originally built for detecting vulnerability exploits against a target application or computer. The IDS is a listen-only device. The IDS monitors traffic and reports results to an administrator.


WORKING OF IDS

ids

Typical intrusion detection systems look for known attack, Signature-based IDS monitors inbound network traffic, looking for specific patterns and sequences that match known attack signatures or abnormal deviations from set norms. These anomalous patterns in the network traffic are then sent up in the stack for further investigation at the protocol and application layers of the OSI (Open Systems Interconnection) model.

An IDS is placed out of the real-time communication band (a path between the information sender and receiver) within your network infrastructure to work as a detection system. It instead leverages a SPAN or TAP port for network monitoring and analyzes a copy of inline network packets (fetched through port mirroring) to make sure the streaming traffic is not malicious or spoofed in any way. The IDS efficiently detects infected elements with the potential to impact your overall network performance, such as malformed information packets, DNS poisonings, port scans and more.
IDS is either installed on your network or a client system (host-based IDS)


OBJECTIVE

To predict the type of cyber attack that could have possibly occurred in a network. Having the past network logs from a server using machine learning models, We have to choose the best suitable model for the prediction. For the new input classify the type of cyber attack that has a higher chance of occurence.


END USERS

  1. Security operations center (SOC) analysts.
  2. Incident responders.
  3. Cyber Security analysts.
  4. A person with adequate knowledge on networking can experiment this.


OVERVIEW OF INITIAL DATASET

260610474-e652e75a-a1bd-41b8-b6dc-84300ecc1848

This dataset contains 5000 records of features extracted from Network Port Statistics to protect modern-day computer networks from cyber attacks and are thereby classified into 5 classes.

Switch ID - The switch through which the network flow passed.
Port Number - The switch port through which the flow passed.
Received Packets - Number of packets received by the port.
Received Bytes - Number of bytes received by the port.
Sent Bytes - Number of bytes sent by the port.
Sent Packets - Number of packets sent by the port.
Port alive Duration (S) - The time port has been alive in seconds.
Packets Rx Dropped - Number of packets dropped by the receiver.
Packets Tx Dropped - Number of packets dropped by the sender.
Packets Rx Errors - Number of transmit errors.
Delta Received Packets - Number of packets received by the port.
Delta Received Bytes - Number of bytes received by the port.
Delta Sent Bytes - Number of bytes sent by the port.
Delta Sent Packets - Number of packets sent by the port.
Delta Port alive Duration (S) - The time port has been alive in seconds.
Delta Packets Rx Dropped - Number of packets dropped by the receiver.
Delta Packets Tx Dropped - Number of packets dropped by the sender.
Delta Packets Rx Errors - Number of receive errors.
Delta Packets Tx Errors - Number of transmit errors.
Connection Point - Network connection point expressed as a pair of the network element identifier and port number.
Total Load/Rate - Obtain the current observed total load/rate (in bytes/s) on a link.
Total Load/Latest - Obtain the latest total load bytes counter viewed on that link.
Load/Rate - Obtain the current observed unknown-sized load/rate (in bytes/s) on a link.
Unknown Load/Latest - Obtain the latest unknown-sized load bytes counter viewed on that link.
Latest bytes counter - Latest bytes counted in the switch port.
Checkis_valit - Indicates whether this load was built on valid values.
vpn_keyTable ID - Returns the Table ID values.
Active Flow Entries - Returns the number of active flow entries in this table.
Packets Looked Up - Returns the number of packets looked up in the table.
Packets Matched - Returns the number of packets that successfully matched in the table.
Max Size - Returns the maximum size of this table.

TARGET --- Label - Label types for intrusions - Normal:0, Blackhole:1, TCP-SYN:2, PortScan:3, Diversion:4


PREPROCESSING (Techniques)

  • Exploratory Data Analysis (EDA)
  • Cleaning
  • Sampling
  • Scaling
  • Visualization


PREPROCESSING (Visualization)

  • Heatmap before scaling the columns

    without_scaling

  • Heatmap after scaling the columns

    after_scaling

  • Heatmap after cleaning

    cleaned



MACHINE LEARNING MODELS USED

  • Naive Bayes
  • Random Forest
  • XG Boost


MODEL BUILDING TECHNIQUES USED

  • Cross Validation
  • Hyper Parameter Tuning


EVALUATION METRICS USED

  • Accuracy
  • Confusion Matrix
  • Precision
  • Recall


RESULTS (Confusion Matrix)

  • Navie Bayes

    confusion_matrix

  • Random Forest

    confusion_matrix

  • XG Boost

    confusion_matrix



PERFORMANCE

  • Navie Bayes

    nb

  • Random Forest

    rf

  • XG Boost

    xbg



INFERENCE

Best hyperparameters for XG Boost
gamma: 0
learning_rate: 0.1
max_depth: 7
min_child_weight: 1
subsample: 0.9

After preprocessing the dataset, Naive Bayes algorithm, Random Forest algorithm, XG Boost had been used for classifying the test dataset. After multiple trials The XG Boost classified the test dataset and resulted in an average of 94 % accuracy, While other algorithms resulted in less accuracy. Since the XG Boost algorithm performed better than other models and because of it's high scalability, robustness and stable performance, It is chosen for the deployment process.


OUTPUTS

  • Home Screen

    1

  • Predefined Examples

    2

  • Prediction Label: NORMAL

    3

  • Prediction Label: BLACKHOLE Attack

    4

  • Prediction Label: TCP-SYN Attack

    5

  • Prediction Label: PORTSCAN Attack

    6

  • Prediction Label: DIVERSION Attack

    7



FUTURE SCOPE

Companies realize the limitations of a standard IDS. Some are reacting to build bigger and better products for their customers. New IDS solutions may come with a lower administrative burden. They may rely on machine learning to lower the risk of false positives, So staff have less to examine every day and vendors may update them simultaneously, So the system always has access to up-to-date information in real time.

END OF README