- python3
- tensorflow
- keras
- numpy
- jupyter
- pandas
- The task of this project is to predict on the test dataset whether this sample is malware based on the sample's features.
- We run system API calls at the server and then collected their runtime logs. Then we applied the proposed feature engineering method on these logs to obtain the published dataset. The shape of each sample is (LENGTH, 102). The LENGTH is at most 1000 and it is not fixed, because we collect data of different time lengths for different API calls. And 102 is the feature dimension of each API call.
- Each sample is stored in numpy format. you can load it by numpy.load('./test/0.npy'). There is no meaning in opening the file directly.
- The architecture used in this project is a re-implementation of the proposed model by Zhang, Z., Qi, P., & Wang, W. (2019). Dynamic Malware Analysis with Feature Engineering and Feature Learning. arXiv preprint arXiv:1907.07352.
- for model training:
$ python model_training.py --dataset train_dataset --csv src/train.csv --model model/classifier.h5
- for model testing:
$ python model_testing.py --dataset test_dataset --csv src/test.csv --model model/classifier.h5 --pred prediction/pred.csv