Author: Joseph Wilson
At some point in all of our lives we've received one of the following - A notification that flashes across your phone or pops into your inbox. . .
Dear Customer,
We’re letting you know that we have detected unusual activity on your account and are taking action to. . .
“Was my information stolen from my bank, my employer, or could it be from that one website I love to read at night”? 1
How the bad-guys managed to steal or purchase your information, you’ll never know for sure. Out of all the ways you can better protect yourself and your family against cyber criminals, there is one piece of advice you’ve heard the most. It’s also the one you’ve actually agreed with, told yourself you’d commit to, but then promptly decided to put it off again. . .
Until the news of another data breach slides across your phone. 2
Create a strong password!
When you create a password for an account, the majority of service companies are now immediately use a hashing algorithm to turn your plaintext password into a "hash". This hash is unique to your password and cannot be undone to identify your password if it is stolen or leaked from a database. This sounds like it should be a secure solution, but when a hacker receives stolen password hashes from a company, they have already determined which hashing algorithm that company used. The hacker then runs billions of password guesses through the same hashing algorithm to see if any hashes are a match - effectively identifying what the original passwords were.
So what does it take to make a strong password anyway?
- Should you include that specific date that you haven’t told anyone?
- Or the amount of fancy characters you can find on your keyboard?
- Or perhaps the cat ladies of the world have the true secret to fighting cyber criminals after all - just make a password with all of the names of your cats 😺 😺.3
I have analyzed a dataset of ten million unique passwords in an attempt to answer the questions above.
The dataset used for this first project comes from the 10-million passwords dataset generously published by Mark Burnett 6
This dataset had only the password plaintext available as a feature, so all analysis had to occur from creating new features, the code for which has been provided.
Features Created
- length of password
- count of character types used
- lowercase, numbers, uppercase, symbols
- guesses_log10
- Measure of Password Strength
- Estimated number of max guesses to identify a password
- score
- Simplified version of guesses
- Useful as feedback during new-password creation
- Promotes stronger passwords
These times were calculated with the following assumptions:
- The attacker is commencing an offline bruteforce attack
- He already has your password's hash from the company server
- The only thing slowing his time to crack it is:
- Your password's strength!
- His computer's capability
- The company's use of hashing algorithm (fast/slow)
- The company opted for the slower (more secure) hashing algorithm
- Resulting in 10K Guesses/Second
- A fast hash would result in about 16B Guesses/Second
Guesses_log | Time to Crack | Score |
---|---|---|
14 | <300 years | |
13 | 30 years | |
12 | 3 years | |
11 | 4 months | 4 |
10 | 2 weeks | 3 |
8 | 3 hours | 2 |
7 | 10 minutes | 1 |
6 | <1 minute | 0 |
Using the features created, I sought to accomplish the following:
- Provide you examples of strong and weak passwords.
- Help you understand how to make a strong password
- Motivate you to never again have
low-hanging fruitweak passwords for a hacker to profit from.
- Create a novel data pipeline to prepare the datasets to a common standard.
- Practice methods of data manipulation that scale well with datasets in excess of 3-25GB (e.g. Dask, PySpark)
- Create code using Object-Oriented Principles that can be further developed in future capstones and open-source projects.
- Note: minimum password lengths were 4
- Significant clusters at 6 & 8
- Possible factors could be password policies, on account of the bimodal shape
- Or natural user preference for these sizes
- Strength (log-guesses)
- The estimated max number of guesses needed to crack the password, presented in a log_10 form due to exponential scales involved
- Strength distributions will always be lower than lengths
- The max strength is dicatated by the password length
- Actual strength is a factor of the password complexity
- Common words and sequences of numbers/keys result in greatly reduced strength
This graphic9 was made to show the different characters used as password strength varies. Observations
- Length is the driving factor of password strength
- Lowercase letters are the most common
- Symbols are seldom used at all
- Weaker passwords tend to have:
- Common words or names
- Only numbers
- Keyboard sequences (qwerty, zxcvbn)
- There is a significant use of Uppercase letters at strength=13
- Possible factors could be common password policies requiring 12+ characters and the use of an uppercase.
- And users settling for the bare minimum?
- Length is the most effective way to increase Password Strength
- Followed by the complexity
- Use numbers/symbols throughout
- random letters vs. Use of common words|names
- "L33T" character replacement is not effective
Try out the Password Strength Indicator!
While a few companies are using effective password strength indicators such as Dropbox's zxcvbn, most of them are unreliable!
- For less important, but commonly used passwords
- Length > 20
- Combine 4 or more uncommon words
- for a memorable phrase/image that you can remember
- Add symbols and numbers between each word
- Important accounts you can't afford to have stolen
- Length 18-30
- Use random generator with all possible characters
- Use a password manager to store your passwords lastpass
- The length is only limited by the company requirements because the password manager will remember it for you
Note: some of the following passwords are not part of the dataset, as they were provided by audience members.
Password | Length | Score | Guesses_log |
---|---|---|---|
DogVito123 | 11 | 3 | 8.41 |
Auntlaurie1! | 12 | 3 | 8.40 |
011235813 | 9 | 1 | 4.54 |
13741374Zz | 10 | 2 | 6.47 |
andrew2222 | 10 | 1 | 4.18 |
republican | 10 | 0 | 2.99 |
bigdaddy66 | 10 | 1 | 4.42 |
bobpremium | 10 | 2 | 6.25 |
London1765 | 10 | 1 | 5.98 |
lalakers24 | 10 | 1 | 5.99 |
orwell1984 | 10 | 1 | 5.88 |
boomboomk0 | 10 | 1 | 5.72 |
123456789m | 10 | 1 | 3.71 |
Password | Length | Score | Guesses_log |
---|---|---|---|
Tr0ub4dor&3 | 11 | 4 | 11.00 |
A@%^b27C$d1 | 11 | 4 | 11.00 |
iyswtric04 | 10 | 3 | 10.00 |
balamelnur | 10 | 3 | 10.00 |
varfalamei | 10 | 3 | 10.00 |
0jNsyTDAhn | 10 | 3 | 10.00 |
ifEbevodEH | 10 | 3 | 10.00 |
mortecouil | 10 | 3 | 9.56 |
greWEGWegw | 10 | 3 | 9.60 |
ni5mlnuken | 10 | 3 | 9.68 |
wildhack43 | 10 | 3 | 9.05 |
Password | Length | Score | Guesses_log |
---|---|---|---|
1234qwer!@#$QWER | 17 | 3 | 9.55 |
qwertyuiopqwertyuiop | 20 | 0 | 1.67 |
qwertyuioppoiuytrewq | 20 | 1 | 4.18 |
1705secret1705secret | 20 | 2 | 6.38 |
1234567890qwertyuiop | 20 | 1 | 4.18 |
drPASSWORDdrPASSWORD | 20 | 1 | 4.60 |
98765432100123456789 | 20 | 1 | 4.26 |
bla_bla_84bla_bla_84 | 20 | 2 | 6.90 |
11111111111111111111 | 20 | 0 | 2.38 |
wizard.123wizard.123 | 20 | 2 | 6.85 |
qqqqqqqqqqqqqqqqqqqq | 20 | 0 | 2.38 |
Password | Length | Score | Guesses_log |
---|---|---|---|
correcthorsebatterystaple | 24 | 4 | 14.4 |
correct_horse_battery_staple | 27 | 4 | 20.3 |
jagjagarmed13hundar | 21 | 4 | 15.18 |
A9FADBE93A7C326FA97B | 20 | 4 | 20.0 |
ldtnsczxbjlbyyflwfnm | 20 | 4 | 20.0 |
ZeKqusrQ2BhicQbc4Y7I | 20 | 4 | 20.0 |
y3fk86egsJ66Kz1MMu8s | 20 | 4 | 20.0 |
vitulya.poddubnaya83 | 20 | 4 | 20.0 |
1516producchuotlaanh | 20 | 4 | 20.0 |
t43ty34tt2t43t34t324 | 20 | 4 | 20.0 |
FPuw0ReC^CachyZe34q5 | 20 | 4 | 20.0 |
Try out the Password Strength Indicator!
- Implement Password Strength Indicator (.js) on Project Website
- Finish capability for data processing pipeline to use multiple processors
- Dask, Rapids, Spark as options
- Standardize larger datasets for use in future research.
- Explore use of Machine Learning models such as Long Short-Term Memory (LSTM) Recurrent Neural Networks (RNN) and Markov-based models.
- How Hackers Crack Passwords
- A Machine Learning Approach to Predicting Passwords -Christoffer Olsen
- PassGAN: A Deep Learning Approach for Password Guessing
- hashcat
- hashcat setup guide for Ubuntu
- Comparison of Dictionary Attack Rulesets