/unbound-adblock

Generate ad-serving and malware list for unbound

Primary LanguageGoGNU General Public License v2.0GPL-2.0

Script to generate Ad-block domains for unbound

Take a list of known malware and ad-serving domains and generate an amalgamated configuration file fragment for unbound. This fragment when included in the main body of unbound.conf, will block these hosts and domains serving malware and/or intrusive ads.

Usage

You will need GNU Make (any recent version). And a recent golang toolchain (>1.11). Assuming GNU Make is available as gmake, type:

gmake

This will generate two config file fragments for unbound:

  • bad-hosts.conf: Config file fragment with a few trackers; the list of blocklist items are in myfeed.txt
  • big.conf: Very large list of blocklist domains and hosts (~30MB, ~700k entries). The blocklist feed comes from bigfeed.txt (auto-generated).

Include one of the config files (bad-hosts.conf or big.conf) in your unbound.conf as follows:

# include auto-generated ad-block/malware list
include: /path/to/bad-hosts.conf

And reload unbound config to use the new blocklist.

Details

The blocklist is generated by a golang program in the blgen directory. It is built using the shell script build. The output binary is put in a platform specific directory (bin/$os-$arch/blgen). Usage:

blgen [options] [blocklist ...]

Read one or more blocklist files and generate a composite file containing
blocked hosts and domains. The final output is written to STDOUT or to
an output file.

blgen can optionally read a feed (txt file) of well known 3rd party malware and
tracker URLs. The feed.txt is a simple file:
- Each line starts with either a 'txt' or 'json' followed by a URL.
- The keyword 'txt' or 'json' identifies the type of output returned by the URL

Example:

    txt http://pgl.yoyo.org/files/adhosts/plaintext
    txt http://mirror2.malwaredomains.com/files/justdomains

Options:
  -c, --cache-dir D      Use 'D' as the cache directory ["."]
  -F, --feed F           Read blocklists from feed file 'F' [""]
  --no-cache             Ignore the cache and re-fetch every blocklist [False]
  -o, --output-file F    Write output to file 'F' [""]
  -f, --output-format T  Set output format to 'T' (text or unbound) [""]
  -v, --verbose          Show verbose output [false]
  -W, --allowlist F      Add whistlist entries from file 'F' [[]]

The -W flag can be used multiple times to add multiple allow list sources.

Caching

blgen caches the downloaded blocklists and only refreshes it once a day. In the default invocation of blgen in GNUmakefile, the cache-dir is the current directory. Each cache file uses the URL as the prefix and a truncated SHA256 sum of the URL as the suffix. The cache can be ignored via the --no-cache option.

Guide to source code

The go program is organized as follows:

  • internal/blgen: contains the implementation of the blocklist DB, fetching host-lists etc.
  • blgen/: contains the driver program ("main()") along with a few helper routines to generate the output.