marcoramilli/MalwareTrainingSets

The MIST encoding in the json file are all length-8 hex strings

heraclixus opened this issue · 1 comments

From the training set, it looks like that all the hex strings are length-8 substrings separated by spaces, but the original paper for MIST has hex substrings of length 2:

i.e.

02 02 00006b2c, where the first two "words" are of length 2, but I don't see it in the training set, is there a reason for it? Thanks.

Hi heraclixus, thank you for this issue. Actually I modified the original MIST version in order to (a) improve performances and (b) auto-description. I should have described the new structure in the original post here: https://marcoramilli.com/2016/12/16/malware-training-sets-a-machine-learning-dataset-for-everyone/