/irc-url-title-bot

Dockerized IRC bot to post titles of URLs posted in channels

Primary LanguagePythonGNU Affero General Public License v3.0AGPL-3.0

irc-url-title-bot

irc-url-title-bot is a dockerized Python 3.9 based IRC URL title posting bot. It essentially posts the page titles of the URLs that are posted in the configured channels on an IRC server. As a disclaimer, note that SSL verification is disabled, and that the posted titles are not guaranteed to be accurate due to a number of factors.

Links

Caption Link
Code https://github.com/impredicative/irc-url-title-bot
Changelog https://github.com/impredicative/irc-url-title-bot/releases
Image https://hub.docker.com/r/ascensive/irc-url-title-bot

Examples

<Adam> For the mathematics of deep learning, see https://arxiv.org/abs/2105.04026 and https://arxiv.org/pdf/2104.14033
<TitleBot> ⤷ [2105.04026] The Modern Mathematics of Deep Learning | PDF: https://arxiv.org/pdf/2105.04026
<TitleBot> ⤷ [2104.14033] A Study of the Mathematics of Deep Learning | Abstract: https://arxiv.org/abs/2104.14033
<Eve> Is github.com/visinf/n3net a good project? I've been studying bugs.python.org/file47781/Tutorial_EDIT.pdf
<TitleBot> ⤷ GitHub - visinf/n3net: Neural Nearest Neighbors Networks (NIPS*2018)
<TitleBot> ⤷ Python Tutorial

For more examples, see urltitle.

Usage

The bot can work in multiple channels but on only one server. To use with multiple servers, use an instance per server.

Configuration

Prepare a private secrets.env environment file using the sample below.

IRC_PASSWORD=YourActualPassword

Prepare a version-controlled config.yaml file using the sample below. A full-fledged real-world example is also available.

# Mandatory:
host: irc.libera.chat
ssl_port: 6697
nick: MyTitleBot
channels:
  - '#some_chan1'
  - '##some_chan2'

# Optional:
alerts_channel: '#mybot-alerts'
blacklist:
  title:
    - Invalid host
    - Untitled
  url:
    - model.fit
    - tf.app
ignores:
  - some_user1
  - some_user2
mode:

# Site-specific (optional):
sites:
  arxiv.org:
    format:
      - re:
          url: /pdf/(?P<url_id>.+?)(?:\.pdf)*$
        str:
          title: '{title} | https://arxiv.org/abs/{url_id}'
      - re:
          url: /abs/(?P<url_id>.+?)$
        str:
          title: '{title} | https://arxiv.org/pdf/{url_id}'
  bpaste.net:
    blacklist:
      title: show at bpaste
  imgur.com:
    blacklist:
      title: 'Imgur: The magic of the Internet'
  paste.ee:
    blacklist:
      title_re: ^Paste\.ee\ \-\ View\ paste\b
  youtube.com:
    blacklist:
      channels:
        - '##some_chan2'

Global settings

Mandatory
  • host
  • ssl_port
  • nick
  • channels
Optional
  • alerts_channel: Some but not all warning and error alerts are sent to this channel. Its default value is ##{nick}-alerts. The key {nick}, if present in the value, is formatted with the actual nick. For example, if the nick is MyTitleBot, alerts will by default be sent to ##MyTitleBot-alerts. Since a channel name starts with #, the name if provided must be quoted. It is recommended that the alerts channel be registered and monitored.
  • blacklist.title: This is a list of strings. If a title is one of these strings, it is not posted. The comparison is case insensitive.
  • blacklist.url: This is a list of strings. If a URL is one of these strings, its title is not posted. The comparison is case insensitive.
  • ignores: This is a list of nicks to ignore.
  • mode: This can for example be +igR for Libera. Setting it is recommended.

Site-specific settings

The site of a URL is as defined and returned by the urltitle package. Refer to the examples contained in the Customizations section of its readme.

Site-specific settings are specified under the top-level sites key. The order of execution of the interacting operations is: blacklist, format. Refer to the sample configuration for usage examples.

  • alert.read: If false, a read failure is not alerted. The default is true.
  • blacklist.channels: This is a list of channels for which a title is not posted if the URL matches the site. The channel comparison is case insensitive.
  • blacklist.title: This is a single string or a list of strings. If the title for a URL matching the site is a blacklisted string, the title is not posted. The comparison is case sensitive.
  • blacklist.title_re: This is a single regular expression pattern that is searched for in the title. If the title for a URL matching the site is matched against this blacklisted pattern, the title is not posted.
  • format: This contains a list of entries, each of which have keys re.title and/or re.url along with str.title.
  • format.re.title: This is a single regular expression pattern that is searched for in the title. It is used to collect named key-value pairs from the match. If there isn't a match, the next entry in the parent list, if any, is attempted.
  • format.re.url: This is similar to format.re.title. If both this and format.re.url are specified, both patterns must then match their respective strings, failing which the next entry in the parent list, if any, is attempted.
  • format.str.title: The key-value pairs collected using format.re.title and/or format.re.url, are combined along with the default additions of both title and url as keys. The key-value pairs are used to format the provided quoted title string. The default value is {title}. If the title is thereby altered, any remaining entries in the parent list are skipped.

Deployment

  • As a reminder, it is recommended that the alerts channel be registered and monitored.

  • It is recommended that the bot be auto-voiced (+V) in each channel.

  • It is recommended that the bot be run as a Docker container using using Docker ≥18.09.2, possibly with Docker Compose ≥1.24.0. To run the bot using Docker Compose, create or add to a version-controlled docker-compose.yml file:

version: '3.7'
services:
  irc-url-title-bot:
    container_name: irc-url-title-bot
    image: ascensive/irc-url-title-bot:latest
    restart: always
    logging:
      options:
        max-size: 10m
        max-file: "3"
    volumes:
      - ./irc-url-title-bot:/config:ro
    env_file:
      - ./secrets.env
  • In the above service definition in docker-compose.yml:

    • image: For better reproducibility, use a specific versioned tag, e.g. 0.2.2 instead of latest.
    • volumes: Customize the relative path to the previously created config.yaml file, e.g. ./irc-url-title-bot.
    • env_file: Customize the relative path to secrets.env.
  • From the directory containing the above YAML file, run docker-compose up -d irc-url-title-bot. Use docker logs -f irc-url-title-bot to see and follow informational logs.

Maintenance

  • If config.yaml is updated, the container must be restarted to use the updated file.
  • If secrets.env or the service definition in docker-compose.yml are updated, the container must be recreated (and not merely restarted) to use the updated file.