iosifache/DikeDataset

Malware classification issues

Closed this issue · 4 comments

I have observed the malware.csv file in the labels directory, and I have seen that you have classified the malware into I have seen that you have classified malware into 9 categories and given the associated probabilities, we can assume that the category with the highest probability is the one that it belongs to, but if we go by this idea, it is obvious that the next six categories are not mentioned because their probabilities are too small, is there something wrong with my understanding of the classification probabilities.
Uploading pic1.png…

Hi @fjycomes,

Could you please re-upload the image? I'll wait for it before responding to your observations.

Hi @fjycomes,

Could you please re-upload the image? I'll wait for it before responding to your observations.

pic1

May I ask if these data represent the probability of their respective categories, for example, 0.4285714 in the second row and fifth column represents the probability of changing the file to trojan, if so I get the labels of all the files by this probability and found that backdoor, worm, spyware, rootkit, encrypter, downloader, and so on. The files are almost useless, I don't know if my understanding is wrong?

@fjycomes, yes, that is the meaning of the probabilities. Also consider that they are normalised for each entry in the dataset. Why do you think the files are irrelevant?

I will close this issue due to inactivity. Please feel free to reopen it if the information is still relevant for you. Thanks!