Notion Docs Tracker

Generated by Gemini CLI ;-P

This project automatically tracks changes in Notion's official help documentation, synchronizes the content with two Notion databases (a front-end and a back-end), and sends notifications to a Telegram channel about new and updated articles.

Features

  • Web Scraping: Fetches the latest categories and articles from the Notion help website.
  • Notion Integration:
    • Creates and updates pages in a "front-end" Notion database, which serves as a clean, user-facing view of the documentation.
    • Creates and updates pages in a "back-end" Notion database, which stores detailed metadata, including content hashes for change detection.
  • Content Archiving: Saves the content of each help article as a Markdown file in the content/ directory.
  • Change Detection: Compares content hashes to identify and process only new or updated articles.
  • Telegram Notifications: Sends a formatted message to a specified Telegram chat with lists of new and updated articles, including direct links.
  • Concurrent Processing: Fetches and processes multiple articles concurrently to speed up the synchronization process.
  • Configurable: Use command-line flags and a .env file to customize behavior, such as enabling/disabling Telegram messages or local file saving.

Tech Stack

  • Runtime: Bun
  • Language: TypeScript
  • Notion API: @notionhq/client
  • Web Scraping: linkedom for DOM parsing.
  • Markdown Conversion: hast-util-to-mdast and mdast-util-to-markdown to convert HTML to Markdown.
  • Configuration: yargs for command-line arguments.

Setup and Installation

  1. Clone the repository:

    git clone <repository-url>
    cd notion-docs-tracker-final
  2. Install dependencies:

    bun install

Configuration

  1. Create a .env file by copying the example file:

    cp .env.example .env
  2. Fill in the environment variables in the .env file:

    • NOTION_TOKEN: Your Notion integration token.
    • NOTION_FRONTEND_DS_ID: The ID of the Notion database to use as the front-end.
    • NOTION_BACKEND_DS_ID: The ID of the Notion database to use as the back-end/metadata store.
    • TELEGRAM_BOT_TOKEN: The token for your Telegram bot.
    • TELEGRAM_CHAT_ID: The ID of the Telegram chat or channel where notifications will be sent.
    • TELEGRAM_TOPIC_ID (Optional): The ID of a specific topic/thread within the chat to send messages to.
    • HELP_DOCS_URL: The base URL for the help documentation being tracked.
    • TRANSLATION_URL: A secondary URL for a translated version of the docs, linked in the Telegram message.

Usage

You can run the tracker using the bun start command.

bun start

Command-Line Flags

You can modify the script's behavior using the following flags:

  • --send-telegram=<true|false>: Enable or disable sending Telegram notifications. (Default: true)
  • --save-content=<true|false>: Enable or disable saving content to local Markdown files. (Default: true)
  • --wait=<milliseconds>: The time to wait between processing each item. (Default: 200)
  • --concurrency=<number>: The number of items to process concurrently. (Default: 3)

Example: Run the script without sending Telegram messages.

bun start --send-telegram=false