Generate malicious URL blocklists for DNSBL applications like pfBlockerNG or Pi-hole by scanning various public URL sources using the Safe Browsing API from Google and/or Yandex.
Report Bug
·
Request Feature
Table of Contents
You may download the blocklists here
Name | URL Count | Source | Description |
---|---|---|---|
Tranco TOP1M | 1M | https://tranco-list.eu | A Research-Oriented Top Sites Ranking Hardened Against Manipulation |
DomCop TOP10M | 10M | https://www.domcop.com/top-10-million-domains | Top 10 million domains Based on Open PageRank data |
Registrar R01 | 6M | https://r01.ru | Zone files for .ru .su .rf domains |
CubDomain.com | 196M | https://cubdomain.com | Aggregator that tracks newly registered domains daily |
ICANN CZDS (Centralized Zone Data Service) | 247M | https://czds.icann.org | ICANN's centralized point for interested parties to request access to Zone Files provided by participating Top Level Domain Registries |
Domains Project | 2.1B | https://domainsproject.org | World’s single largest Internet domains dataset |
Amazon Web Services EC2 | 57M | https://docs.aws.amazon.com/vpc/latest/userguide/vpc-dns.html#vpc-dns-hostnames | Amazon Elastic Compute Cloud hostnames |
Google Compute Engine | 11M | https://www.gstatic.com/ipranges/cloud.json | Google Compute Engine |
OpenINTEL.nl | 6M | https://openintel.nl | Zone files for .se .nu .ee domains |
Switch.ch | 3.3M | https://switch.ch/open-data | Zone files for .ch .li domains |
AFNIC.fr | 7M | https://www.afnic.fr/en/products-and-services/fr-and-associated-services/shared-data-reuse-fr-data | Daily newly registered .fr .re .pm .tf .wf .yt domains |
Internet.ee | 153K | https://www.internet.ee/domains/ee-zone-file | Estonian Internet Foundation (.ee) |
Internetstiftelsen | 1.7M | https://zonedata.iis.se | Swedish Internet Foundation |
SK-NIC.sk | 400K | https://sk-nic.sk/subory/domains.txt | Domain Registry of the Slovak Republic (.sk) |
Google TAG IOCs | 200 | https://blog.google/threat-analysis-group | Google Threat Analysis Group Indicators of Compromise |
IPv4 Addresses | 4.2B | 0.0.0.0 - 255.255.255.255 | Exhaustive list of all IPv4 addresses |
Yandex | |
Terms-of-Service | Terms-of-Service |
- Linux or macOS
- Python 3.10+
- Multi-core x86-64 CPU; for Python Ray support
- RAM: At least 32GB
- SSD Storage Space: At least 700GB required to process all URL sources
Choose at least one
- Google: Obtain a Google Developer API key and set it up for the Safe Browsing API
- Yandex: Obtain a Yandex Developer API key
- ICANN Zone Files: Sign up for a ICANN CZDS account
- Once registered, turn off email notifications in the user settings (otherwise they will send you hundreds of acknowledgement emails),
then select
Create New Request
on the Dashboard to request for zone file access.
- ICANN CZDS (Centralized Zone Data Service): Once every 24 hours per zone file
- Switch.ch: Once every 24 hours per zone file
git clone
and cd
into the project directory
cp .env-dev .env
In .env
, fill in the following variables
# Mandatory: At least one of the following Safe Browsing API keys
GOOGLE_API_KEY=
YANDEX_API_KEY=
# Optional: ICANN zone file access
ICANN_ACCOUNT_USERNAME=
ICANN_ACCOUNT_PASSWORD=
# Some registrars will not accept your request reason unless you include your Name, Email, IP Address, Physical Address (Building, Street, Postcode etc.), and Phone Number
ICANN_REQUEST_REASON='Detection of potentially malicious domains for cybersecurity research. Name: _ Email: _ IP Address: _ Physical Address: _ Phone Number: _'
# Optional: Upload generated blocklists to your GitHub repository
GITHUB_ACCESS_TOKEN=
BLOCKLIST_REPOSITORY_NAME=
According to PEP 668, use of a virtual environment is strongly recommended as of 2023.
python3 -m venv venv
venv/bin/python3 -m pip install --upgrade pip
venv/bin/python3 -m pip install -r requirements.txt
# Dataset size ~49Gb
cd ../
git clone https://github.com/tb0hdan/domains.git
cd domains
git lfs install # you will need to install Git LFS first (https://git-lfs.github.com)
Edit unpack.sh
and remove combine
from the last line, then run:
./unpack.sh
⚠️ As of 4 August 2023, the following command will make around 9000 calls (exact number depends on number of hashes in Google's dataset) to Google Safe Browsing API. As the daily limit is 10,000 calls,--update-hashes
should be run no more than once every 24 hours.
venv/bin/python3 main.py --update-hashes --vendors google
- ✔️ Add Tranco TOP1M URLs to database
- ✔️ Identify malicious URLs from database using Safe Browsing API hashes, and generate a blocklist
- ✔️ Update database with latest malicious URL statuses
- 📝 Sources: Tranco TOP1M
- 🛡️ Vendors: Google
venv/bin/python3 main.py --fetch-urls --identify-malicious-urls --sources top1m --vendors google
- ✔️ Add DomCop TOP10M URLs to database (no blocklist will be generated)
- 📝 Sources: DomCop TOP10M
- 🛡️ Vendors: Not Applicable
venv/bin/python3 main.py --fetch-urls --sources top10m
⚠️ Requires at least 700GB free space.ℹ️ If you have not downloaded any Safe Browsing API hashes yet, add the
--update-hashes
flag to the following command.
- ✔️ Add URLs from all sources to database
- ✔️ Identify malicious URLs from database using Safe Browsing API hashes, and generate a blocklist
- ✔️ Update database with latest malicious URL statuses
- 📝 Sources: Everything
- 🛡️ Vendors: Google
venv/bin/python3 main.py --fetch-urls --identify-malicious-urls --vendors google
- ✔️ Retrieve URLs with malicious statuses (attained from past scans) from database, and generate a blocklist
- 📝 Sources: DomCop TOP10M, Domains Project
- 🛡️ Vendors: Google
venv/bin/python3 main.py --retrieve-known-malicious-urls --sources top10m domainsproject --vendors google
venv/bin/python3 main.py --help
- Yandex Safe Browsing Update API appears to be unserviceable. Yandex Technical support has been notified.
-
This project is not sponsored, endorsed, or otherwise affiliated with Google and/or Yandex.
-
Google works to provide the most accurate and up-to-date information about unsafe web resources. However, Google cannot guarantee that its information is comprehensive and error-free: some risky sites may not be identified, and some safe sites may be identified in error.
-
URLs detected with the Safe Browsing API usually have a malicious validity period of about 5 minutes. As the blocklists are updated only once every 24 hours, the blocklists must not be used to display user warnings.
More information on Google Safe Browsing API usage limits: https://developers.google.com/safe-browsing/v4/usage-limits