Discovery is a fully automated OSINT tool that will gather informations from a lot of differents sources and cross the results in order to fingerprint as best as possible a domain name.
This tool (for the moment ;) ) relies on 4 FREE API's :
- Shodan : https://www.shodan.io/
- WhatCMS : https://whatcms.org/API
- Hunter.io : https://hunter.io/
- RocketReach : https://rocketreach.co/api
You can chose whether or not you want to use them. If you don't provide the API keys then the hunter, shodan, rocketreach and whatcms API won't be used.
The tool configuration can be set in the configuration file :
How to use :
python3 discovery.py -d domain.tld --dns
This module actually does a lot of things :
- Query Whois databases to gather the name of the registrant, who's responsible of it and some DNS servers name
- Query Google DNS to retrieve records
- Try to perform DNS transfert zone on the previously found DNS servers
- Gather IPv4 ranges beloging to the domain.tld
From this IP range Discovery will list all existant IP's :
The second module is the implementaiton of the sublist3r python3 module written by aboulela : https://github.com/aboul3la/Sublist3r You can call it using two differents options :
python3 discovery.py -d domain.tld --sublist
or
python3 discovery.py -d domain.tld --subrute
The difference is that when using --subrute, sublist3r will perform a DNS bruteforce which will take much more time but will also find more subdomains.
Discovery will also use the IP2host and reverse DNS lookup in order to gather new Virtual Hosts.
In the configuration file you can choose whether or not the www.domain.tld and domain.tld should be merged. You must be aware that almost all the time, www.domain.tld is the same thing as domain.tld. But sometimes it is not which means we might loose some Virtual Hosts by merging www.domain.tld and domain.tld
This configuration can be done in the configuration file :
This module is composed of two functions and can be called that way :
python3 discovery.py -d domain.tld --scan ["full" or "fast"]
The first function is an implementaiton of the python nmap librairy. It will scan the discovered IP's either the "full" way (which means it will check for the all 65535 ports) or the "fast" way (-F nmap option).
Depending of the services discovered it will perform a few actions :
- Check SSL certificate
- Check for the CMS used (if there is one)
- Check for comon important files (.git, /status, trace.axd, robots.txt. If those files are found, they will be downloaded).
- Look for WAF using Wafw00f (https://github.com/EnableSecurity/wafw00f)
- Look For reverse proxy using HTTP-Traceroute.py made by Nicolas Gregoire and Julien Cayssol
You can add as much files as you want in the configuration file :
The tool will output an XML files that you will be able to add in the NMAP plugin (rapport type)
The second function will use the Shodan API to gather informations about the domain name : found servers, services, CVE's related to the services and a quick decription
This module is basically my python version of FOCA :
python3 discovery.py -d domain.tld --gather
Using Google dorks it will gather publicly exposed documents and parse their metadatas in order to find sensitive informations (credentials for exemple) This module is inspired by the pyfoca script written by altjx : https://github.com/altjx/ipwn/tree/master/pyfoca
To parse the metadatas I used exiftool.
You can add as much extensions as you want in the configuration file :
Note that in order to be parsed, the gathered documents must be downloaded. Sometimes it might take a lot of space. If you don't want to keep the downloaded files you can set the option in the configuration so that they will be deleted once parsed :
This module will also check for sensitive files on Pastebin and Github. For each document found,it will check for the words filled in the configuration file :
The last module will use differents API's to gather names of employee working for the given domain name. Especially the RocketReach API using some parts of N3tsky's PeopleScrap tool : https://github.com/n3tsky/PeopleScrap
It will then create a few lists of emails :
python3 discovery.py -d domain.tld --harvest
You can set the pattern to use for the mail creation in the configuration file :
If you specify a non handled pattern or don't specify any then Discovery will create all possible lists using all handled patterns :
So basically if you want to run all modules you can use this command :
python3 discovery.py -d domain.tld --dns (--sublist or --subrute) --scan (full or fast) --gather --harvest
All results will be written in a file in this tree :
-
Code refactoring
- Use of class
- each functions/modules in separate files
- Review of the already existing code (adding some performance)
-
Document gathering :
- Search for sensitive files on Github
- Archive.org : error messages
- Google dorks (index of, error message)
-
Scanning function
- Implement https://github.com/Tuhinshubhra/CMSeeK for the CMS discovery
- Add Http screenshot
- Add UDP scans
- Dirbuster ? http/s vhost/ipt (MAY BE)
-
DNS enumeration
- Add FDNS databases lookup
- Googid : search for Google Analytics ID to find other websites connected
- Domainview : get the websites that have an outlink from the target
-
Final :
- Search on the deep web
- Threads files downloads and modules so that they can work in parrallel
- Get ride of the API's (especially the whatcms api)
- Add the possibility to use some modules with a list of domains (at least --sublist/--subrute and --harvest)