discv4-crawl

⚠️ Update: New contributions should go to the active ethereum/discv4-crawl fork.

Background

Geth now ships with an implementation of EIP-1459. This EIP defines a way to put devp2p node lists behind a DNS name. There are a couple of things worth knowing about this system:

EIP-1459 is intended to be a replacement for hard-coded bootstrap node lists that we maintain in Ethereum clients.
This is a centralized system where all nodes configured with a certain name resolve subdomains of the name find bootstrap nodes.
The node list is signed with a key which will be hard-coded into the client (i.e. geth) and which we should keep in a secure place.

To create suitable bootstrap node lists for all common networks, we have devised a scheme where software crawls the discovery DHT, then creates a list of all found nodes in JSON format. The crawler software can filter this list and has a built-in deployer that can install the DNS records.

You can read the DNS Discovery Setup Guide for more information about the discovery DHT crawler.

Description

This repository contains the scripts used to automatically generate the list of nodes that are published to the multiple DNS zones. The node list is also automatically pushed to this repository.

Running with docker

Environment variables

Name	Default	Description
`CRAWL_GIT_REPO`	`https://github.com/skylenet/discv4-dns-lists.git`	Git repository `used to clone and push the node list
`CRAWL_GIT_BRANCH`	`master`	Git branch used for the fetch and push
`CRAWL_GIT_PUSH`	`false`	When set to `true`, it will push the node lists to the git repository
`CRAWL_GIT_USER`	`crawler`	Git username. Will appear in the commit messages.
`CRAWL_GIT_EMAIL`	`crawler@localhost`	Git email address. Will appear in the `commit messages.
`CRAWL_DNS_DOMAIN`	`nodes.example.local`	DNS domain suffix used for the directory structure
`CRAWL_TIMEOUT`	`30m`	The time spent crawling the discovery DHT
`CRAWL_INTERVAL`	`300`	Interval, in seconds, between multiple executions.
`CRAWL_RUN_ONCE`	`false`	Set to `true` if you only want to run the execution once.
`CRAWL_DNS_SIGNING_KEY`	`/secrets/key.json`	Path to the signing key. Won't sign if the file doesn't exist.
`CRAWL_DNS_PUBLISH_ROUTE53`	`false`	Publish the TXT records to a DNS zone on AWS Route53
`ROUTE53_ZONE_ID`	``	Route53 DNS zone identifier. This is the zone where the records will be published to.
`AWS_ACCESS_KEY_ID`	``	AWS access key
`AWS_SECRET_ACCESS_KEY`	``	AWS secret access key
`CRAWL_DNS_PUBLISH_CLOUDFLARE`	`false`	Publish the TXT records to a DNS zone on Cloudflare
`CLOUDFLARE_API_TOKEN`	``	API token used for the Cloudflare API
`CLOUDFLARE_ZONE_ID`	``	Cloudflare DNS zone identifier. This is the zone where the records will be published to.
`CRAWL_PUBLISH_METRICS`	`false`	Set to `true` if you want to send metrics to InfluxDB
`INFLUXDB_URL`	`http://localhost:8086`	Address of the InfluxDB API
`INFLUXDB_DB`	`metrics`	Database name
`INFLUXDB_USER`	`user`	Username for InfluxDB
`INFLUXDB_PASSWORD`	`password`	Password for InfluxDB

Building the image

$ docker build -t disc4-crawl .

Run examples

Run the list generation and push the results to git via SSH:

$ docker run -it \
    -v "$HOME/.ssh/crawler:/root/.ssh" \  # Needed if you use git via SSH
    -v "$HOME/secrets/secret-signing-key.json:/secrets/key.json" \ # Only needed if you want to sign the node lists
    -e CRAWL_TIMEOUT=10m \ # Specify your custom timeout
    -e CRAWL_GIT_REPO=git@github.com:skylenet/discv4-dns-lists.git \ # Use SSH instead of HTTPS
    -e CRAWL_GIT_PUSH=true \ # Specify that we want to push the changes
    skylenet/discv4-crawl