/Android-Malware-App-Detector

Code for my Master Thesis that contain code related to detecting malware using machine learning

Primary LanguagePython

Android Malware App Detector

This project contains the code for the Android Malware App Detector based on permissions used by apps, as part of experiments for a Master's Thesis titled "Machine Learning Methods for Improvement of the Security of Android Apps."

Technologies

  • Python 3.11
  • Pandas 2.2.2
  • Scikit-learn 1.4.2
  • Matplotlib 3.8.4
  • Seaborn 0.13.2

About the Project

Datasets

The datasets used in the experiments are available in the datasets directory. The sources of the datasets are:

Data Preprocessing

The data preprocessing code is available in the data_preprocessing directory. Two strategies are used for data preprocessing:

  • One Permission Per Feature - Using information about the permissions used or not used by the apps to create features.
  • Group Permissions Per Feature - Grouping permissions into categories and using the information about the number of permissions used from each category to create features.

The results of the data preprocessing are available in the processed_data directory.

Training

The training code is available in the training_models directory. This code is used to train the models on the datasets and evaluate the models using cross-validation.

Models

The following models are used in the experiments:

  • Naive Bayes
  • Decision Tree
  • K-Nearest Neighbors
  • Logistic Regression
  • Random Forest
  • SVM

Results

The results of the experiments are available in the results directory. The results are presented in the form of plots.

Metrics

The following metrics are used to evaluate the models:

  • Accuracy
  • Precision
  • Recall
  • F1 Score
  • Support
  • Prediction Time (10%, 50%, 100% of test data)

Usage

There is a Makefile available in the root directory that contains commands to run the code. The following commands are available:

  • make venv - Create a virtual environment and install requirements.
  • make run-data-<strategy> - Run the data preprocessing code for a specific strategy (OnePermissionPerFeature or GroupPermission).
  • make train_models-<strategy> - Run the training code for a specific strategy (OnePermissionPerFeature or GroupPermission).
  • make run-all - Run the data preprocessing and training code for both strategies.
  • make clean - Clean the processed data and results.
  • make help - Display the help message.

How to Run

Install all the requirements by running the following command:

make venv

Run the data preprocessing and training code for both strategies by running the following command:

make run-all