/DLS-Singapore

NUS hackathon

Primary LanguagePython

DLS-Singapore

The present project was developed for the CyberSecurity Camp & Hackaton @Singapore 2017

We provide an OSS implementation of the paper Deep Neural Network Based Malware Detection Using Two Dimensional Binary Program Features and some variations we explored:

  • original vector compression by using auto-encoders
  • binary code representation based on LSTM auto-encoders
  • malware analysis trasposing binary code information to images domain
  • multiple architectures we used to evaluate the features generated

We used Keras to generate the encoders and architectures.

Team members:

  • Palash Chauhan (Indian Institute of Technology, Kanpur)
  • Vivek Verma (Indian Institute of Technology, Kanpur)
  • Joze Rozanec (Universidad de Buenos Aires, Argentina)

Things to read

  1. https://www.blackhat.com/docs/us-16/materials/us-16-Berlin-An-AI-Approach-To-Malware-Similarity-Analysis-Mapping-The-Malware-Genome-With-A-Deep-Neural-Network.pdf
  2. https://arxiv.org/pdf/1508.03096v2.pdf
  3. Any tutorial on Auto-Encoders a. https://keras.io/getting-started/sequential-model-guide/ b.

Dataset

  1. Malware Files: https://drive.google.com/open?id=0B9fG_FJRJKS_d1BOUjJRZ2pWZEE
  2. Reports on Malware Files: https://drive.google.com/file/d/0B9fG_FJRJKS_ZkxxT1BlbWh6MEU/view?usp=sharing
  3. Benign Files: https://drive.google.com/open?id=0B9fG_FJRJKS_RnF5Ynhmb0R3WTQ
  4. https://zeltser.com/malware-sample-sources/
  5. https://www.kaggle.com/c/malware-classification/data

Useful papers & info

Malware representation

Execution Graph as vector

Binary sequence to vector

Data