/Curated-Malware-Database

A curated malware database with more then +73000 samples.

GNU General Public License v3.0GPL-3.0

Disclaimer

This database was used to test an undisclosed antivirus on an undisclosed Company against 300k samples. It is intended only for research / testing purposes, I hold no responsibility for any damage resulting of unproper handling. Every archive has been encrypted and is protected by a password (which I'm not disclosing) since if you are a researcher you should already know what that password is.

What's so special about this Repo?

  • It has been manually polished by verifying, removing false positives (what I could / as much as possible).
  • It includes only the virii, so beside the archive's folder you don't need to deal with uncommon archiviation , renaming removing spaces or non-ASCII chars , removing related but non-virii files and so on...
  • It's geared with "testing" in mind so it was cleared from duplicates with https://github.com/sahib/rmlint , and allot of public repos on github have copied each-other.
  • It has been divided in categories based on file-type / or platform wise.
  • The samples collected were collected from multiple sources underground or public ones (extensive research), and all of those were processed and combined.
  • It includes an APT and Trending folder which is including most of the recent APT campaigns samples and some "exotic" stuff.
  • It has allot of IoT malware (most of those Mirai and Gafgyt samples).
  • Includes as well samples cought by honeypots.
  • The DB has around 75k+ samples total.

Downside

  • The tests were done in August 2020, so for the APTandTrending folder consider that most of the samples by now should be already on VirusTotal or online sandboxes.
  • I'm not mantaining this, in case there are contributors, feel free to help.
  • I couldn't upload the PE32/64-Windows archive (100+GB) which holds most of the viruses, maybe a torrent in the future.

Some of the commands used create the db (samples).

find . -depth -name "*" -execdir perl-rename 's/[ )]/_/g; s/\(//g; s/\,//g;' "{}" \;  #rename dirs removing commas and ( whilre replacing space+) with _
for f in . ; do ls -aclh "$f" | awk {'print $9'} | xargs file | egrep 'Zip|gzip|RAR|7-zip|tar archive' | awk {'print $1'} | sed 's/.$//' | xargs -I '{}' mv --backup=t '{}' ARCHIVES ; done
for file in $(cat doslist.log); do mv "$file" ../MSDOS/ ; done #I removed the MSDOS stuff because it's un-needed, but you'll find some interesting stuff on "others.zip" for the old timers.
md5sum * | sed -e 's/\([^ ]*\) \(.*\(\..*\)\)$/mv -v \2 \1\3/e'
find . -name '*.zip' -exec sh -c 'unzip -P nonono -d "${1%.*}" "$1"' _ {} \; #extract under the same directory as the zip
rmlint -a md5 -g .

Cloud

I decided to put all archives on the cloud (even the small one's) after all, github was made for code and not for storage.

How is this supposed to be used?

For safety purposes, the archives need to be extarcted on a virtual env, on an http server. Then you can run wget2 (supports multi-threading) and download the samples through an antivirus / proxy interface.

time wget2 -np -r -m -U "Mozilla/5.0 (Windows NT 5.1; rv:11.0) Gecko Firefox/11.0" -e robots=off -e http_proxy=http://192.168.0.15:8080 -l 2 http://172.16.11.99/OTHERS/ -o others-windows.txt

Download

Copy the cloud link inside the txt files for the archive that you want.