/mlbridge-ui

This repository contains code for a user interface for the purposes of domain name cybersecurity.

Primary LanguagePythonMIT LicenseMIT

ML Bridge - User Interface

Build Status codecov GitHub issues GitHub license

GSoC Icon

This repository contains the user interface for the MLBridge Project. It allows the user to customise it as per his/her requirement for any other project, for example genomics, computer vision etc. Currently, it is being used for domain name cybersecurity.

Installation

Clone the repository:

git clone https://github.com/mlbridge/mlbridge-ui.git

Go to the mlbridge-ui directory and install the dependencies:

cd mlbridge-ui
pip install -r requirements.txt

Install Elasticsearch by following the instructions from this link. Start the Elasticsearch server and then run the ui.py app:

cd mlbridge-ui/mlbridge_ui/src
python ui.py

Features:

This repository contains code for a user interface for the purposes of domain name cybersecurity. The three main features of the application are:

  • Historical Analysis: The application allows the user to historically analyse the frequency at which domains have been queried and the IP addresses of the users querying those domains
  • Manual Vetting: The application allows a user to manually vet domain names (classify them as benign or malicious) that the machine learning model is not confident about.
  • Retraining the Model: The application allows the user to retrain the machine learning model that is used to classify domain names as malicious or benign and obtain the analytics for the same.

Historical Analysis

A demo of the application can be seen below:

There are three main use cases for having a historical analysis:

  • Domain Name Analysis: The application allows the user to search for a a particular domain name along with a request time range. The application will then search for that particular domain name in the Elasticsearch database. Once the domain name is found, the app will display the number of requests to that particular domain name in that time range, the nature of the domain name (benign or malicious) and also the IP addresses that have queried that particular domain name. This allows for a domain specific analysis.

  • Analysis of Malicious Domain Names: The application allows the user to visualize the top 20 malicious domains queried, as a bar graph. It also displays a list of all the malicious domains queried which can be seen via a toggle switch in the same window. This allows the user to gain a general picture of all the malicious domain names queried and also helps in identifying model misclassification.

  • Analysis of Benign Domain Names: The application allows the user to visualize the top 20 benign domains queried, as a bar graph. It also displays a list of all the benign domains queried which can be seen via a toggle switch in the same window. This allows the user to gain a general picture of all the benign domain names queried and also helps in identifying model misclassification.

Manual Vetting

A demo of the application can be seen below:

Manual Vetting allows the user to manually vet domain names that the model has a low confidence on, thereby creating a new dataset of malicious or benign domains. This dataset can be used for blocking or allowing domains and also for updating the dataset for retraining the model.

Retraining the Model

The machine learning model would have to be retrained when there is new data or the model is underperforming. The model can be retrained via the training tab.

The accuracy and loss graphs are updated in real time while the model is being trained.

Once the training is complete, the model efficacy can be judged by looking at the confusion matrices as well as the confusion metrics.