/dynamic-malware-analysis

Re-implementation of the proposed model in dynamic malware analysis by Zhang et al. (2019) in https://arxiv.org/abs/1907.07352.

Primary LanguagePython

Python version

  • python3

Python package

  • tensorflow
  • keras
  • numpy
  • jupyter
  • pandas

Data detail

  1. The task of this project is to predict on the test dataset whether this sample is malware based on the sample's features.
  2. We run system API calls at the server and then collected their runtime logs. Then we applied the proposed feature engineering method on these logs to obtain the published dataset. The shape of each sample is (LENGTH, 102). The LENGTH is at most 1000 and it is not fixed, because we collect data of different time lengths for different API calls. And 102 is the feature dimension of each API call.
  3. Each sample is stored in numpy format. you can load it by numpy.load('./test/0.npy'). There is no meaning in opening the file directly.

Model detail

  • The architecture used in this project is a re-implementation of the proposed model by Zhang, Z., Qi, P., & Wang, W. (2019). Dynamic Malware Analysis with Feature Engineering and Feature Learning. arXiv preprint arXiv:1907.07352.

Implementation detail

  • for model training: $ python model_training.py --dataset train_dataset --csv src/train.csv --model model/classifier.h5
  • for model testing: $ python model_testing.py --dataset test_dataset --csv src/test.csv --model model/classifier.h5 --pred prediction/pred.csv