Recurrence by email address / username

Question

Recurrence by email address / username

dennisbmoore opened this issue 4 years ago · 6 comments

Did you capture usernames / email addresses in your data set? Can you determine uniqueness or lack thereof by email addresses? For example, what fraction of the passwords associated with a specific username (email address if relevant) are unique, and how does that vary with the number of duplicates of the username (i.e., reuse of passwords vs # of times the username is matched in the data set). Thanks!

Answer 1 · 2020-07-08T16:21:58.000Z

Hello!

Yes, i did capture username/email tuples in my data.

It is a great idea, however it is extremely time consuming to do a large-scale analysis on both username and password, because it requires doing a join operation on 1 billion rows.

But it is not as impactful as you might think.

Average number of times each email was found is 1.889.
196.250.369 emails were only found once.
A few email addresses are responsible of raising the average. mail.ru@hotmail.com was the most common email address, found 90549 times, along with gmail.com@hotmail.com (85k times), password@gmail.com (38k times), info@yahoo.com (31k times) and so on.

So, i've decided not to process that metric, because it will be too computationally heavy with minimal impact.

If you disagree, please feel free to write so!

Cheers!

Answer 2 · 2020-07-08T17:37:25.000Z

Interesting. For the emails used many thousands of times, I wonder if those should be blacklisted (along with any accounts created using those as secondary accounts) - probably fraud related.

What if you limited it to say accounts which appeared within a smaller range of occurrences - say 10 to 500 times? This could substantially reduce the computational cost and would seem to still provide important information about reuse of passwords

Thanks for doing the important work you do!

Answer 3 · 2020-07-08T19:18:46.000Z

I've filtered accounts which have appeared more than once in a dump (just because i dont think a regular user can register with the same email more than once to a website).

If there were 25 (username,password) tuples with same username and password in a single dump, they were only counted as 1.

This had 2 possible outcomes - Either accounts repeating 90k times also shared the password and did not get processed 90k times, or they had random password, and did not influence the most common passwords list.

Interesting point though, these spam accounts appear in all kinds of lists, and they have very natural looking passwords, so i don't think these accounts skewed the statistics other than most common passwords either.

Answer 4 · 2020-07-08T19:31:48.000Z

Hmmmmmmmmmmm interesting breakthrough, i checked some of the more unique-looking passwords used by the mail.ru@hotmail account.

I'm pretty certain people trying to sell these leaks bloated the number of credentials inside, by duplicating accounts and replacing their usernames with these junks.

Answer 5 · 2020-07-08T19:42:16.000Z

I've been checking passwords from mystery lists frantically, i was really excited there was something to possibly explain that, but it looks like just a fraction of these passwords are from these spam accounts.

Answer 6 · 2022-02-14T17:13:35.000Z

i need the commands for this how do i search for passwords