my-other-computer-is-your-computer

This project emerged for fulfilling a requirement of the Machine Learning course (EE 769) I took in Spring 2018 at IIT Bombay.

The detailed report for the project is available at my blog here.

Microsoft Malware Classification Challenge

Directory Structure

The git repository has the following directories -

src - this directory contains all the code files
feature-dump - this contains separate pickle files for each type of features(viz) corresponding to each malware instance's extracted features
all-features - this directory contains the pickle files corresponding to each malware instance's extracted features
all-feature-train - folder with features of train instances
all-feature-test - folder with features of test instances
new-files - folder containing file which needs to be classified. If you want to predict a class for a particular pair of .asm and .byte files, keep those files in this folder
new-files-feature-dump - the extracted features' pickle files are stored in this directory
new-files-all-feature-dump - this contains the pickle file for all features

Training

The training is done by running the command python3 preprocessing.py in the src/ directory from the terminal. After training the trained models are stored as pickle files in the src/ folder by their respective names. The finalModel, an object of class SupervisedModels is stored as the file `finalModels.pkl' which contains the information about scalers, features and underlying trained classifiers

Testing

Testing can be done in 2 ways -

Predicting the lables of test dataset - run the command python3 test.py 0 in src/ folder from the terminal. This prints out the accuracy of the model on testdata
Predicting the labels of a new file - run the command python3 test.py 1 fileName in src/ folder from the terminal. The files fileName.asm and fileName.bytes are assumed to be in the folder new-files/. This prints out the predicted label by each of the underlying classifier

Both of the testing procedures load the finalModel from the file 'finalModels.pkl' and predict the labels on the corresponding data instances.

CodeMaxx/my-other-computer-is-your-computer

my-other-computer-is-your-computer

Directory Structure

Training

Testing