Network traffic Machine learning

This project is designed to scan and analyze network traffic in real-time. The project includes several Python scripts that work together to achieve this goal:

  • traffic_scan.py: This script is used to train a model to predict the label (website and usage) based on the traffic data. It uses machine learning techniques, specifically logistic regression algorithm from sklearn library to classify the traffic data.

  • unsecured_scan.py: This script is used to scan for unsecured http connections and try to collect login and password data from them.

  • unencrypted_scan.py: This script is used to scan for unencrypted data in the traffic.

  • main.py: This script is the main script that ties everything together. It loads the traffic data, trains the model, scans for unsecured and unencrypted connections and then clusters the data using unsupervised learning techniques.

  • launch.sh: This script is used to launch the main script in the background and give execute permissions to the main script

  • get_model_info.py: This script is used to get the information about the trained model and generate a SQLite database to store it.

  • report.txt: This file is generated by the main script and contains the report of the traffic analysis.

  • clustering_traffic.py: This script is used to cluster the traffic data using unsupervised learning technique such as KMeans algorithm.

  • traffic_data.csv : The sample data set used in the project

Usage

  1. Run the launch.sh script in the terminal to start the main script in the background.
  2. The script will start analyzing the traffic, and at the end of the analysis, a report file will be generated.
  3. The get_model_info.py script can be run to store the model's information in a SQLite database.

Disclaimer

Please note that this project is for educational purposes only and should not be used in a production environment without proper security measures in place. The project uses techniques such as machine learning and unsupervised learning, which should be used with caution and proper understanding of the data and algorithms.