This Python3 script extracts all hashes (sha256, sha1, and md5) from APT_CyberCriminal_Campagin_Collections PDF reports and creates an output_YYYY-MM-DD.json
file.
For more information on file structure, here you can find the latest run.
Moreover, during its execution, it also extracts all the archives it finds with well-known passwords.
The hashes can be used to download missing samples from VirusTotal, while the extracted files can be organized as desired (I use my PyPEfilter).
sudo apt install p7zip-full
git clone <PROTO>packmad/APT_Dataset_Creator
cd APT_Dataset_Creator/
git submodule update --init --recursive
Get yourself some coffee because it's going to take a long time...
git pull
git submodule foreach git pull origin master
Using this script, we created datasets for the following papers:
@inproceedings{mantovani2020prevalence,
title={Prevalence and Impact of Low-Entropy Packing Schemes in the Malware Ecosystem},
author={Mantovani, Alessandro and Aonzo, Simone and Ugarte-Pedrero, Xabier and Merlo, Alessio and Balzarotti, Davide},
booktitle={Network and Distributed System Security (NDSS) Symposium, NDSS},
volume={20},
year={2020}
}
- Under submission #1
- Under submission #2