berzerk0/Probable-Wordlists

Suggestion: Statistics about popularity.

Ho52198 opened this issue · 1 comments

Hello,
Maybe I am wrong, but I have a feeling that a big number of all passwords, are "seen" only once of the different sources (if they are not just copy/upgrade of each other). Will be useful to have some general guididence like:

first milion - words seen between 200 to 20 times
from 1000k to 10000k - words seen between 19 to 4 times
From 10000k to 1000000k - words seen between 3 to 2 times
from 100000k to the end - words seen 1 time only

This will give better understanding - where the probability stops, and random/alphabetically order starts.

For examble - even in the 120m wordlist I saw many passwords, that are obviously from random generator, and the chance to be used by many people or on many places is close to zero.

This is already in place, and how the list sizes are determined. I'll make this information more prevalent.

From the ReadMe at https://github.com/berzerk0/Probable-Wordlists/tree/master/Real-Passwords

- I generated files by the number of times each line appeared in my analysis. Files are available for 75, 50, 25, 10, and 5 appearances.
- Top 196 - appeared at least 75 times - these are the MOST common passwords
- Top 3575 - appeared at least 50 times
- Top 95 Thousand - appeared at least 25 times 
- Top 32 Million - appeared at least 10 times
- Top 258 Million - appeared at least 5 times
- Top 2Billion - appeared at least 2 times

From the source files to make Rev 1, only 1/3 of the Passwords appeared more than once. Those lines don't make it on to this list. If it is only shown once, I can hardly call it "Probable".

It might be that some passwords appear random, and seem very unlikely to be used. However, if a line appeared in the files more than once - it ended up in the files. It's quite difficult, if not impossible, to reverse engineer the giant encyclopedic wordlists that form some of the source material. Odds are the random-looking lines near the bottom of the 2 billion list only appeared in one leak, but there isn't any way for me to know that.

I erred on the side of inclusivity - this time. I may make the minimum number of appearances needed for inclusion in Rev 3 five or three.