/Malware-Classification

Malware Classification based on dataset illustrated with Malware Instruction Set

Primary LanguageJupyter Notebook

Malware Classification

Author: Zeyuan Xu (github.heraclixus.com) for detailed description, refer to the ipython notebook

Motivation

Use Machine learning algorithms to classify malware samples, especially addressing polymorphic and metamorphic malware samples. The datasets (in terabytes) most desirable is the Microsoft Big 2015. It is most desirable to tackle the dataset using cloud computing platforms. The local ML project is done using open sourced malware sample sets, parsed into JSON files.

Limitations and Future Work

Require more samples of different malware types. In addition, more advanced feature selection techniques can be used, and other classification algorithms can be tested against the benchmark Random Forest classifier. If using the image representation of malware samples, CNN can also be tested (with tensorflow).