
What should I do from labeling step 3?

Closed this issue · 3 comments

To compute the membership on each malware family, a transformer was developed (see the observation above) to "vote" for each available family. For example, if an antivirus engine tag was Trj, then one vote for the trojan family was offered. All tags were consumed in this way and the votes for all families were normalized.

(see the observation above)

I entered this link, but I didn't know from labeling step 3.

What should I do from labeling step 3?





Hi, @recsater,
The script only deals with dumping that raw data into a CSV file from Google Cloud Storage. After achieving the scanning step, you need to create your own labeling strategy or adapt the dike's one.
You can check dike's implementation in the update_malware_labels function from dataset module. There, the votes and tags are processes to obtain the malice and the families' ownership.

Hi, @recsater, The script only deals with dumping that raw data into a CSV file from Google Cloud Storage. After achieving the scanning step, you need to create your own labeling strategy or adapt the dike's one. You can check dike's implementation in the update_malware_labels function from dataset module. There, the votes and tags are processes to obtain the malice and the families' ownership.

First of all, thank you for your reply.

As an additional question, I would like to get exactly the same constant used to make the DikeDataset labels.

Because I'm working on a project to classify malicious code using labels(malware.csv, benign.csv) from DikeDataset.

To do that, can I know the following values?

In Class DataFolderScanner,

These are defined like

I am sorry for my bad English. thank you.

dike used a YAML configuration file that contains all the configurable aspects of its functioning. You can find out the values you mentioned by checking the dataset section in the configuration.yaml file.

And I'm glad to hear that these repositories are useful! Please let me know if you have any other questions, I'm happy to help.