honeynet/cuckooml

Global "normalized" field does not correspond to the same field per VT vendor

So-Cool opened this issue · 4 comments

Global "normalized" field has to be updated with corresponding fields per VT vendor which has been updated to provide better labelling.

What do you mean exactly here? Can you give an example?

The structure of the JSON at the moment has one normalized field per VT vendor:
virustotal -> scans -> *vendor name* -> normalized;
additionally there is a global normalized field here:
virustotal -> normalized
which pulls together all of the *vendor name* -> normalized fields.
The latter one is not getting all the new normalized tokens that I've just implemented.

I remember @jbremer and you discussing about the normalized field. Is the field
virustotal -> normalized actually used in cuckoo? Or you are the one storing all normalized vendor names there? Couldn't you use this field to store the final label of the sample?

Currently it is not used because there was too much noise in there. Hopefully with @So-Cool's changes it will be usable and we can indeed start using it for labelling.