/doppelganger

a tool that creates permutations of domain names using homographic unicode characters

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

doppelgänger - A tool to search for IDN lookalike/fake domains

doppelgänger is a tool that creates permutations of domain names using lookalike unicode characters and identifies registered domains using dns queries. It can be used to identify phishing domains. Furthermore it finds typosquatting domains.

Example

  • original: example.com
  • eẋample.com (xn--eample-i77b.com)
  • exampĺe.com (xn--exampe-mcb.com)
  • exạmple.com (xn--exmple-xc8b.com)
  • exampǀe.com (xn--exampe-f3b.com)
  • exaṃple.com (xn--exaple-5s7b.com)
  • examƿle.com (xn--examle-62b.com)
  • examᴘle.com (xn--examle-e35b.com)

Usage

./doppelganger.py example.com

Without any flags, doppelgänger will perform a search for homographic IDNs.

Typosquatting

To search for typosquatting domains instead of IDNs use the --typo-only or -y flag.

Dry-run mode

If you just like to output the domains without performing a dns lookup, use the --dry-run or short -d flag.

File-output

You can use doppelgänger to output the found permutations to a file:

./doppelganger.py -o ./doppelgangers.list example.com

As with --dry-run, no dns lookups will be performed.

TLD support

TLD Support
gTLDs
.com ⚠️ partial - only latin and lisu script
.org ⚠️ partial - Korean and Chinese missing
.net ⚠️ partial - only latin and lisu script
ccTLDs
.ag ❌ no
.ar ❌ no
.at ✅ complete
.au 🔵 IDN not supported
.be ❌ no
.br ❌ no
.ca 🔵 not applicable
.ch ✅ complete
.cn ❌ no
.co ❌ no
.cz ❌ no
.de ✅ complete
.dk ✅ complete
.es ❌ no
.eu ❌ no
.fi ❌ no
.fm ❌ no
.fr ✅ complete
.gr ❌ no
.hu ❌ no
.hr ❌ no
.ie ❌ no
.in ❌ no
.io ❌ no
.ir 🔵 IDN not supported
.is ❌ no
.it ❌ no
.jp ❌ no
.me ❌ no
.nl 🔵 IDN not supported
.no ✅ complete
.pl ❌ no
.pm ✅ complete
.рф (rf) ❌ no
.rs ❌ no
.ru 🔵 IDN not supported
.tf ✅ complete
.tr ❌ no
.tv ❌ no
.tw ❌ no
.uk 🔵 IDN not supported
.us 🔵 IDN not supported
.wf ✅ complete
.yt ✅ complete
sTLDs
.biz ❌ no
.info ❌ no
.name ❌ no

Limitations

Lack of support for all TLDs

As each TLD allows for a different subset of unicode characters, routines have to be implemented for each TLD separately. As this is really time consuming this tool is limited to a couple of TLDs atm. See list above.

Too big data

This tools creates a large amount of permutations. If your domain name is long enough, there are millions of possible doppelganger domains. Atm, this tool works in RAM only. So if you try to check a large number of domains and your system is straight outta memory running out of memory, this tool will fall back to check only domains where exactly one character has been replaced. This is not a big limitation though, as most malicious actors will try to change as little characters as possible when creating phishing domans.

If I have time I'll add support for big data sets in the future.

DNS query speed

Performing large numbers of unique dns queries to uncached domains in a short amount of time can result in a block or a rate-limitation by your dns provider

Furthermore this tool does the dns lookup sequentially resulting in poor performance.

TODOs

  • Add support to perform round-robin queries to a set of user selectable dns servers
  • check existence of domain by whois entries instead of dns
  • export function for registered domains to a list / csv
  • Add support for working with (really) large sets of permutations that don't fit into memory