/Malware-classifier-PyTorch

A Malware Classifier with PyTorch

Primary LanguagePython

Malware Classifier with PyTorch

Malware Classifier is a Rest API made with Django that classifies apk files between legitimate and malware. After finishing the Udacity course : Deep learning with PyTorch. We were told to make a side project to practice what we've learnt =)

Motivation and how I worked

Function

Legitimate Malware

At first my motivation was to test myself on the acquired knowledge on the course and apply it on real examples and figure out if I can help and improve some ideas to produce a full well developed and helpful application. One of the side projects I made was this malware classifier. I always wondered how antivirus were made, I'm always surprised on the quantity of characteristics that we have to check to detect if a program is malicious or not, and I thought I can use deep learning to classify a program as legitimate or malicious. After searching a bit I found some articles talking about classifying .apk files between malware and legitimate and without further reading my brain begins to think about several features that I can use to detect if It's a malware or not depending on the manifest file (I used to develop android applications so it was a fun experience to try to detect if an application respects the permissions).

I found a cool XML parser AXMLPrinter2 : https://github.com/flyfei/ApkDecompile/tree/master/Tools that extract the manifest from an apk and make it readable and I extracted all the permission that the app requires and those are my features (In the future the application may also extract the features requires from the manifest).

Once I get how to extract the features from an apk, I searched for a dataset and found this one : https://www.unb.ca/cic/datasets/android-adware.html

Data ready .. Jump to the fun part ! The training one I chose to work on a fully connected Neural Network with (1 - 2 - 3 hidden layers) and played with the hyperparameters and the optimizer to obtain a Training accuracy of 98.3% and a Test accuracy of 95% on 1500 .apk in total.

Training Accuracy and Loss

I started deploying it on a web app and got it work and the next step is to deploy it as an Android App.

Requirements

You'll find on the requirements.txt all the packages needed to run the application.

Test the application

Clone the application and run the following commands :

  1. Install all the requirements of the app (listed on the file : requirements.txt) pip install -r requirements.txt

  2. Launch the django web app python manage.py runserver

  3. Check : http://127.0.0.1:8000

  4. On the test folder there are two .apk files one malware and another one legitimate. Chose one of them and upload it to the app =)

  5. Test your real .apk files !