This project contains the code for the Android Malware App Detector based on permissions used by apps, as part of experiments for a Master's Thesis titled "Machine Learning Methods for Improvement of the Security of Android Apps."
- Python 3.11
- Pandas 2.2.2
- Scikit-learn 1.4.2
- Matplotlib 3.8.4
- Seaborn 0.13.2
The datasets used in the experiments are available in the datasets
directory. The sources of the datasets are:
The data preprocessing code is available in the data_preprocessing
directory. Two strategies are used for data preprocessing:
- One Permission Per Feature - Using information about the permissions used or not used by the apps to create features.
- Group Permissions Per Feature - Grouping permissions into categories and using the information about the number of permissions used from each category to create features.
The results of the data preprocessing are available in the processed_data
directory.
The training code is available in the training_models
directory. This code is used to train the models on the datasets and evaluate the models using cross-validation.
The following models are used in the experiments:
- Naive Bayes
- Decision Tree
- K-Nearest Neighbors
- Logistic Regression
- Random Forest
- SVM
The results of the experiments are available in the results
directory. The results are presented in the form of plots.
The following metrics are used to evaluate the models:
- Accuracy
- Precision
- Recall
- F1 Score
- Support
- Prediction Time (10%, 50%, 100% of test data)
There is a Makefile available in the root directory that contains commands to run the code. The following commands are available:
make venv
- Create a virtual environment and install requirements.make run-data-<strategy>
- Run the data preprocessing code for a specific strategy (OnePermissionPerFeature
orGroupPermission
).make train_models-<strategy>
- Run the training code for a specific strategy (OnePermissionPerFeature
orGroupPermission
).make run-all
- Run the data preprocessing and training code for both strategies.make clean
- Clean the processed data and results.make help
- Display the help message.
Install all the requirements by running the following command:
make venv
Run the data preprocessing and training code for both strategies by running the following command:
make run-all