████████╗ ██████╗ ██████╗ ██████╗ ██████╗ ████████╗ ╚══██╔══╝██╔═══██╗██╔══██╗ ██╔══██╗██╔═████╗╚══██╔══╝ ██║ ██║ ██║██████╔╝ ██████╔╝██║██╔██║ ██║ ██║ ██║ ██║██╔══██╗ ██╔══██╗████╔╝██║ ██║ ██║ ╚██████╔╝██║ ██║ ██████╔╝╚██████╔╝ ██║ ╚═╝ ╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ `.` ` ``.:.--.` .-+++/-` `+sso:` `` /yy+. -+.oho. o../+y -s.-/:y:` .:o+-`--::oo/-` `/o+:.```---///oss+- .+o:.``...`-::-+++++sys- :y/```....``--::-yooooosh+ -h-``--.```..-:-::ssssssssd+ h:``:.``....`--:-++hsssyyyym. .d.`/.``--.```:--//odyyyyyyym/ `d.`+``:.```.--/-+/smyyhhhhhm: os`./`/````/`-/:+oydhhhhhhdh` `so.-/-:``./`.//osmddddddmd. /s/-/:/.`/..+/ydmdddddmo` `:oosso/:+/syNmddmdy/. `-/++oosyso+/.` ██████╗ ███████╗██████╗ ███████╗██████╗ ██████╗ ██╗███╗ ██╗███████╗██╗██████╗ ███████╗ ██╔══██╗██╔════╝██╔══██╗██╔════╝╚════██╗██╔════╝ ██║████╗ ██║██╔════╝██║██╔══██╗██╔════╝ ██║ ██║█████╗ ██║ ██║███████╗ █████╔╝██║ ██║██╔██╗ ██║███████╗██║██║ ██║█████╗ ██║ ██║██╔══╝ ██║ ██║╚════██║ ╚═══██╗██║ ██║██║╚██╗██║╚════██║██║██║ ██║██╔══╝ ██████╔╝███████╗██████╔╝███████║██████╔╝╚██████╗ ██║██║ ╚████║███████║██║██████╔╝███████╗ ╚═════╝ ╚══════╝╚═════╝ ╚══════╝╚═════╝ ╚═════╝ ╚═╝╚═╝ ╚═══╝╚══════╝╚═╝╚═════╝ ╚══════╝
Open-source intelligence offers value in information security decision making through knowledge of threats and malicious activities that potentially impact business. Open-source intelligence using the internet is common, however, using the darknet is less common for the typical cybersecurity analyst. The challenges to using the darknet for open-source intelligence includes using specialized collection, processing, and analysis tools. TorBot is an open source intelligence tool developed in python. The main objective of this project is to collect open data from the deep web (aka dark web) and with the help of data mining algorithms, collect as much information as possible and produce an interactive tree graph. The interactive tree graph module will be able to display the relations of the collected intelligence data.
The idea of developing an open source intelligence tool like TorBot emerged from the deep web itself. Crawling a collection of webpages which has high anonymity and complex data encryption without an index is a tedious task. The crawler in TorBot has to be designed in such a way that the links are identified from a webpage (any webpage) and other links are identified and crawled recursively, then combining all these links to form an index. Each link is then crawled for more links and emails for intelligence information. Unlike a surface web discovery tool a deep web discovery tool are limited for both general and domain-specific search.Extensive use of Dark web for communication of terrorism-related information makes it a challenge for Law Enforcement Agencies. TorBot should be able to monitor such illegal activities that are happening in this encrypted network. Therefore this tool will be able to ease the task of finding such activities by an intelligence group or researchers, thus making this the main objective of TorBot.
The basic procedure executed by the web crawling algorithm takes a list of seed URLs as its input and repeatedly executes the following steps:
- Remove a URL from the URL list.
- Check existence of the page.
- Download the corresponding page.
- Check the Relevancy of the page.
- Extract any links contained in it.
- Check the cache if the links are already in it.
- Add the unique links back to the URL list.
- After all URLs are processed, return the most relevant page.
- Onion Crawler (.onion).(Completed)
- Returns Page title and address with a short description about the site.(Partially Completed)
- Save links to database.(PR to be reviewed)
- Get emails from site.(Completed)
- Save crawl info to JSON file.(Completed)
- Crawl custom domains.(Completed)
- Check if the link is live.(Completed)
- Built-in Updater.(Completed)
- Visualizer module.(Not started)
- Social Media integration.(not Started) ...(will be updated)
Contributions to this project are always welcome. To add a new feature fork the dev branch and give a pull request when your new feature is tested and complete. If its a new module, it should be put inside the modules directory and imported to the main file. The branch name should be your new feature name in the format <Feature_featurename_version(optional)>. For example, Feature_FasterCrawl_1.0. Contributor name will be updated to the below list. :D
- Tor
- Python 3.x (Make sure pip3 is installed)
- requests
- Beautiful Soup 4
- Socket
- Sock
- Argparse
- Git
- termcolor
- tldextract
- Golang
Before you run the torBot make sure the following things are done properly:
-
Run tor service
sudo service tor start
-
Make sure that your torrc is configured to SOCKS_PORT localhost:9050
On Linux platforms, you can make an executable for TorBot by using the install.sh script.
You will need to give the script the correct permissions using chmod +x install.sh
Now you should have an executable file named torBot, run ./torBot
to execute the program.
An alternative way of running torBot is shown below, along with help instructions.
python3 torBot.py or use the -h/--help argument
usage: torBot.py [-h] [-v] [--update] [-q] [-u URL] [-s] [-m] [-e EXTENSION] [-l] [-i] optional arguments: -h, --help Show this help message and exit -v, --version Show current version of TorBot. --update Update TorBot to the latest stable version -q, --quiet Prevent header from displaying -u URL, --url URL Specifiy a website link to crawl, currently returns links on that page -s, --save Save results to a file in json format -m, --mail Get e-mail addresses from the crawled sites -e EXTENSION, --extension EXTENSION Specifiy additional website extensions to the list(.com or .org etc) -l, --live Check if websites are live or not (slow) -i, --info Info displays basic info of the scanned site (very slow)`
- NOTE: All flags under -u URL, --url URL must also be passed a -u flag.
Read more about torrc here : Torrc
- Visualization Module
- Implement A* Search for webcrawler
- Multithreading
- Optimization
- Randomize Tor Connection (Random Header and Identity)
- Keyword/Phrase search
- Social Media Integration
- Increase anonymity and efficiency
If you have new ideas which is worth implementing, mention those by starting a new issue with the title [FEATURE_REQUEST]. If the idea is worth implementing, congratz you are now a contributor.
GNU Public License
- P5N4PPZ - Owner
- agrepravin - Contributor,Reviewer
- KingAkeem - Experienced Contributor,Reviewer
- y-mehta - Contributor
- Manfredi Martorana - Contributor
- Evan Sia Wai Suan - New Contributor
- Lean - New Contributor
- shivankar-madaan - New Contributor
- Gus - New Contributor
- SubaruSama - New Contributor
- robly78746 - New Contributor