dataset

The data directory contains all the samples and the label.csv provides the labels of these samples.

The original data at the dataset was collected from two months, April 2017 and May 2017. We run these malware at Cuckoo server and then collected their runtime logs. Then we applied the proposed feature engineering method on these logs to get this published dataset.

The summary of the dataset as the following:

	Benign	Malicious	Total
April	10160	15609	25769
May	20552	11465	32017
Total	30712	27074	57786

Each sample is stored as numpy format, you can load it by numpy.load('./data/201704_0.npy'). The shape of each sample is (LENGTH, 102), and the LENGTH is at most 1000. 102 is the dimension of each API call, please refer to our paper for more details.

feature

This is the code of a feature engineering method.

Thare are two python scripts. The DMDS.py containes the core code of the feature engineering method. And the Cuckoo2DMDS.py implemented a multi-process function to call the DMDS.py.

Please refer to main function at each python script about how to run the code.

model

This is the deep learning model for our proposed approach, which is built within keras platform.

if you want to run the model, please unzip the dataset/dataset.zip, then you can run the model.py by using python.

happyfir/Archive

dataset

feature

model