/The-Collection-Statistics

Domain and TLD statistics of "The Collection (1-5), Antipublic, AP MYR & Zabugor" breach exposure. Contains number of matches per file within, based on provider.

THE COLLECTION (1-5), ANTIPUBLIC, AP MYR & ZABUGOR

DATA ANALYSIS & DOMAIN STATISTICS

Here are the raw statistics for "The Collection" mega archive. Files are based off of domain/service and TLDs. Searched by the popularity of the services, each .txt file contains the number of matches per file inside The Collection.

This data is meant for researchers who can't compute such numbers due to the size of the Collection (spans 1 TB). If you plan on using this data please cite back to this repo! It took forever to get these numbers.

To get an idea of how big the complete collection is, there is a total of 29,083,053,678 entries (calculated by using GNU wc utility). Keep in mind this number also includes any white spaces that might appear in the combo lists. Check out the breakout by email service providers:

Yahoo: 6,413,950,221 (yahoo.com, .ca, .fr, .co.uk, et al)
Hotmail: 4,130,645,551 (hotmail.com, .ca, .co.uk, .fr, et al)
Gmail: 2,980,903,393 (gmail.com)
AIM / AOL: 1,021,510,538 (aim.com | aol.com)
Yandex: 868,467,900 (yandex.com | yandex.ru)
Live: 590,784,654 (live.com, .fr, .co.uk, et al)
.edu: 117,493,160
Mail: 72,530,713 (mail.com | email.com)
Outlook: 37,770,997 (outlook.com)
Apple: 36,941,682 (icloud.com | mac.com)
.gov: 28,263,752
Protonmail: 66,848 (protonmail.com | protonmail.ch)

If you like to learn more about The Collection archive, read this write up I did on it: "Quick Dissections: Collections 2 - 5"