AndroidMalware

The objective of this repository is to detect malware on Android apps.

TL;DR: Bad guys abuse permissions and outdated software

Accessing the Data

The data can be found on Kaggle: https://www.kaggle.com/saurabhshahane/android-malware-dataset

For more background, refer to the research paper.

System Requirements

The entire repository is in Python 3.x and you will need some standard data science libraries like pandas and scikit-learn.

How to Navigate

The notebooks are ordered sequentially in the order I went about to get this project done. I have added notes in the notebooks to further explain my reasoning.

Results

I was able to obtain a high enough AUC of 0.94 with logistic regression. The plot below illustrate the ROC and PRC curves as I tried to play with the size of vocabulary.

Spoiler Alert: The determining weights for the model were the numerical features inside the Android Manifest, not the text data.

NadimKawwa/AndroidMalware

AndroidMalware

Accessing the Data

System Requirements

How to Navigate

Results