
Simple version of the tor exit node list DB. Part of the Internet Inventory project.

Primary LanguageJavaScriptGNU Affero General Public License v3.0AGPL-3.0

Tracking Tor Exit Nodes

These set of scripts fetch data on Tor exit nodes from the only - as of now - currently known valid and authoritative source: https://check.torproject.org/exit-addresses

Upon downloading the data, it gets inserted into a DB (directory: 'db/') There is a web interface written in python+flask in 'www/'

Directory Contents

www/		-> website stuff
db/			DB structure and initial data import mechanisms
data/		data directory


  • wget
  • postgresql 9.3 or higher
  • python
  • flask
  • webserver such as nginx

See the requirements.txt file in www/tor

Initial setup

$ cd db
$ sudo su
# su - postgresql
$ createuser -s userename
$ psql template1 < db.sql

Testing if it works

  1. Activate any virtualenv or conda environment in case you use that to install the prerequisites.
  2. Test if fetching the data works:

First, make sure that the newly created user tordb may access the tables and the DB and add it to the postgresql pg_hba.conf file.

$ ./fetch-tor-list.sh 
psql -U tordb tordb_simple
select count(*) from node

you should see a non-zero result.

If it works, you can continue to run this automatically...

The execution of fetch-tor-list.sh is expected to output a lot of error messages like this:

ERROR:  duplicate key value violates unique constraint "idx_node_combined"
DETAIL:  Key (node_id, ip, exit_address_ts, id_nodetype)=(0011BD2485AD45D984EC4159C88FC066E5E3300E,, 2019-08-08 09:12:18+02, 1) already exists.

They can/should be ignored.

How to get this to run automatically?

$ crontab -l
# fetch the list once a day at 1:05 A.M.
# m h  dom mon dow   command
5 01   *   *   *     ( cd /home/your_user/torexitnodes_simple; source venv/bin/activate ; ./fetch-tor-list.sh  >/dev/null 2>&1 ) 

(Note that this assumes you installed the prerequisites via virtual-env).

Deploying the web interface properly

The built-in webserver of flask is a no-no for production environments. Hence, please follow the great documentation on production setups. The instructions vary depending on which web server you use.