/redlib-instances

Primary LanguageShellGNU General Public License v3.0GPL-3.0

Redlib Instances

This repository maintains a list of Redlib instances in JSON format, providing the URL, location, and Redlib version for each instance. A helper script exists in this repository to generate the list in JSON form.

Status page

You can access the status page for the instances here.

Contents

This repo consists of four files:

  1. instances.json: This is the list of Redlib instances.
  2. instances-schema.json: JSON Schema governing instances.json.
  3. instances.txt: This is a CSV of Redlib instances. While this is also machine-readable, it is recommended to use instances.json instead. instances.txt is meant for contributors to add and remove instances, and generate-instances-json.sh will validate those instances and generate instances.json.
  4. instances.md: This is table in Markdown format of the Redlib instances in instances.json. It is generated by the script generate-instances-markdown.py.
  5. generate-instances-json.sh: This script takes in a CSV file as input, typically instances.txt, and outputs a JSON object with a list of Redlib instances. This is the script that generates instances.json.

Adding or removing an instance

To generate instances.json, perform the following:

  1. Modify instances.txt to add or remove instances. See Expected CSV format for the expected format of each CSV row.
  2. Run generate-instances-json.sh -i ./instances.txt -o ./instances.json to generate instances.json. The existing instances.json will be replaced.
  3. Run generate-instances-markdown.py --output=./instances.md ./instances.json to generate instances.md. The existing instances.md file will be replaced.

Pull requests to add or remove instances are always welcome.

generate-instances-json.sh

generate-instances-json.sh is the script that produces a JSON of Redlib instances, given a CSV input of Redlib instances.

Unless -i and -o are specified (see Usage below), the input and output are assumed the stdin and stdout streams respectively.

Usage

USAGE
    ./generate-instances-json.sh [-I INPUT_JSON] [-T] [-e | -f] [-i INPUT_CSV] [-o OUTPUT_JSON]
    ./generate-instances-json.sh -h

DESCRIPTION
    Generate a JSON of Redlib instances, given a CSV file at INPUT_CSV
    listing those instances. If INPUT_CSV is not given, this script will
    read the CSV file from stdin.

    The INPUT_CSV file must be a file in CSV syntax of the form

        [url],[country code],[cloudflare enabled],[description]

    where all four parameters are required (though the description may be
    blank). Except for onion and I2P sites, all URLs MUST be HTTPS.

    OUTPUT_JSON will be overwritten if it exists. No confirmation will be
    requested from the user.

    By default:

    * This script will not attempt to connect to I2P instances. If you want
      this script to consider instances on the I2P network, you will need to
      provide an HTTP proxy in the environment variable I2P_HTTP_PROXY.
      This proxy typically listens at 127.0.0.1:4444.

    * This script will attempt to connect to instances in the CSV that are on
      Tor, provided that it can (it will check to see if Tor is running). 
      If you want to disable connections to these onion sites, provide the 
      -T option.

    * This script will return a non-zero status code when at least one instance
      could not be reached. If you want this script always to return 0 even
      when not all instances could be reached, provide the -e option (this
      script will still return a non-zero code if there was a problem
      constructing the final JSON object or if the file supplied to the -I
      option could not be read).

OPTIONS
    -I INPUT_JSON
        Import the list of Redlib onion and I2P instances from the file
        INPUT_JSON. To use stdin, provide `-I -`. Implies -T, and further
        causes the script to ignore the value in I2P_HTTP_PROXY. Note that the
        argument provided to this option CANNOT be the same as the argument
        provided to -i. If the JSON could not be read, the script will exit with
        status code 1, even if -e is provided.

    -T
        Do not connect to Tor. Onion sites in INPUT_CSV will not be processed.
        Assuming no other failure, the script will still exit with status code
        0.

    -e
        Always exit with status code 0, even when at least one instance cannot
        be reached, except in the situations where (1) the file in INPUT_JSON
        (see `-I`) could not be processed; or (2) the JSON object could not
        be constructed. Cannot be used together with -f.

    -f
        Force the script to exit, with status code 1, upon the first failure to
        connect to an instance. Normally, the script will continue to build and
        output the JSON even when one or more of the instances could not be
        reached, though the exit code will be non-zero. Cannot be used together
        with -e.

    -i INPUT_CSV
        Use INPUT_CSV as the input file. To read from stdin (the default
        behavior), either omit this option or provide `-i -`. Note that the
        argument provided to this option CANNOT be the same as the argument
        provided to -I.

    -o OUTPUT_JSON
        Write the results to OUTPUT_JSON. Any existing file will be
        overwritten. To write to stdout (the default behavior), either omit
        this option or provide `-o -`.

ENVIRONMENT

    USER_AGENT
        Sets the User-Agent that curl will use when making the GET to each
        website. By default, this script will tell curl to set its User-Agent
        string to "redlib-instance-updater/0.1".

    I2P_HTTP_PROXY
        HTTP proxy for connecting to the I2P network. This is required in
        order to connect to instances on I2P. If -I is provided, the value in
        this variable is ignored.

Prerequisites

generate-instances-json.sh requires curl in order to make HTTP(S) requests and jq to process and format JSON.

tor is required for processing onion sites, but the script will skip instances on Tor if tor is not running. An option exists to import onion sites from an existing JSON file should you wish not to use tor.

Expected CSV format

The CSV must take on the form:

[url],[country code],[cloudflare enabled],[description]

Each field described:

  • url (REQUIRED): The url to the Redlib instance. This must be HTTPS, unless the instance is an onion site.
  • country code (REQUIRED): The two-letter code for the country in which the instance is hosted, in caps.
  • cloudflare enabled (REQUIRED): A boolean; true if the instance sits behind Cloudflare.
  • description (REQUIRED): A description of the instance; a description can be blank, but one must be provided for the script to parse the CSV correctly. As this description string becomes a JSON value without any transformation, any special characters, including and especially newlines, must be escaped.

Processing the CSV

The script will process the CSV and for each row connect to the URL and get the version string of the running instance. For each row, if the connection is successful and the script can determine the version, it will yield a JSON object (an "entry") of the form:

{
    "url": "<url>",
    "version": "<version>",
    "cloudflare": <true if cloudflare is enabled; null otherwise>,
    "description": "<description if non-empty; null otherwise>"
}

At the end, the script will assemble the entries into a JSON array and place them in a new JSON object:

{
    "updated": "<today's date (at the Greenwich meridian) in ISO 8601 format>",
    "instances": ["<entries>"]
}

If all instances could be processed, the script exits with an exit code of 0. If the script was unable to process an instance, it will continue processing other instances, but the exit code will be 1. If there was an error to do with processing the CSV, the exit code is 2.

Instances on Tor or I2P

This script will attempt to connect to instances that are onion or I2P sites.

Tor

To make sure it can connect to onion sites, the script will see if Tor is running. If it is not, the script will not attempt to connect to Redlib onion sites and will skip them. The exit code will still be 0, assuming that the WWW Redlib sites were processed without error.

I2P

In order to allow the script to connect to I2P, you must specify a proxy host and port in the environment variable I2P_HTTP_PROXY. This is typically 127.0.0.1:4444, unless your proxy listens on a separate address and/or port. If this environment variable is not defined or it is empty, the script will not attempt to connect to Redlib I2P sites and will skip them. The exit code will still be 0, assuming that the WWW Redlib sites were processed without error.

generate-instances-markdown.py

generate-instances-markdown.py will generate a table in Markdown format of the instances. This requires the JSON file that is generated by generate-instances-json.sh.

Usage

usage: generate-instances-markdown.py [-h] [-o OUTPUT_FILE] [INPUT_FILE]

Generate a markdown table of the Redlib instances in the instances.JSON file. By default, this will read the file 'instances.json' in the
current working directory, and will write to 'instances.md' in that same directory. WARNING: This script will overwrite the output file if it
exists.

positional arguments:
  INPUT_FILE            location of instances JSON

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE, --output OUTPUT_FILE
                        where to write the markdown table; if a file exists at this path, it will be overwritten

Prerequisites

generate-instances-markdown.py requires python3 of at least 3.5.

License

The script generate-instances-json.sh and the schema file instances-schema.json are licensed under the GNU General Public License v3.0. Almost all of generate-instances-markdown.py is licensed under GPL v3.0, with the exception of a portion of MIT-licensed code adapted from Django Countries which generates a regional indicator symbol for a given ISO 3166-1 alpha-2 country code; view the source for generate-instances-markdown.py for the applicable code along with a copy of the MIT License as it appeared in the Django Countries license at the time the code was adapted.

instances.json, instances.md, and instances.txt are released to the public domain.