/linkspector

Uncover broken links in your content.

Primary LanguageJavaScriptApache License 2.0Apache-2.0

GitHub Marketplace NPM Donate using Liberapay

Logo

Uncover broken links in your content.

Linkspector

Linkspector is a CLI app that checks for dead hyperlinks in files. It supports multiple markup languages such as Markdown, AsciiDoc (limited - hyperlinks only), and ReStructured Text (coming soon).

With Linkspector, you can easily check all hyperlinks in your files, ensuring that they are not broken and that your readers can access all the relevant content. The app allows you to quickly and easily identify any broken links, so you can fix them before publishing your content.

Linkspector is a powerful tool for anyone who creates content using markup languages.

How this is different from existing tools?

  1. Enhanced Link Checking with Puppeteer: It uses Puppeteer to check links in Chrome's headless mode, reducing the number of false positives.
  2. Addresses limitations and adds user-requested features: It is built to adress the shortcomings in GitHub Action - Markdown link check and adds many user requested features.
  3. Single repository for seamless collaboration: All the code it needs to run is in a single repository, making it easier for community to collaborate.
  4. Focused for CI/CD use: Linkspector is purposefully tailored to run into your CI/CD pipelines. This ensures that link checking becomes an integral part of your development workflow.

Installation

Before you can use Linkspector, you need to install it. You can do this using the following command:

npm install -g @umbrelladocs/linkspector

This command installs Linkspector globally, allowing you to use it from anywhere in your terminal. If you don't want to install using npm you can download the binary from GitHub releases.

GitHub action

For more details, see action-linkspector

Checking Hyperlinks

To check hyperlinks in your markup language files, follow these steps:

  1. Open your terminal.

  2. Navigate to the directory containing the files you want to check.

  3. (Optional) Create a configuration file called .linkspector.yml. By default, Linkspector looks for a configuration file named .linkspector.yml in the current directory. If you have a custom configuration file or want to specify its path, you can use the -c or --config option.

  4. Use the linkspector check command to initiate the hyperlink check. For example:

    linkspector check
    • To specify a custom configuration file path:

      linkspector check -c /path/to/custom-config.yml
    • To output the results in JSON format:

      linkspector check -j

      The JSON output follows rdjson format.

  5. Linkspector starts checking the hyperlinks in your files based on the configuration provided in the configuration file or using the default configuration. It then displays the results in your terminal.

  6. After the check is complete, Linkspector provides a summary of the results. If any dead links are found, they are listed in the terminal, along with their status codes and error messages.

  7. If no dead links are found, Linkspector displays a success message, indicating that all links are working.

Configuration

Linkspector uses a configuration file named .linkspector.yml to customize its behavior. If this file is not found in the current directory when the program is run, Linkspector displays a message saying "Configuration file not found. Using default configuration." and uses a default configuration.

Default Configuration

The default configuration is as follows:

dirs:
  - .
useGitIgnore: true

If you are defining a custom configuration, you must include the dirs or files section in the configuration file.

Following are the available configuration options:

Option Description Required
files The list of Markdown files to check for broken links. Yes, if dirs is not specified.
dirs The list of directories to search for Markdown files. Yes, if files is not specified.
excludedFiles The list of Markdown files to exclude from the link checking process. No
excludedDirs The list of directories to exclude from the link checking process. No
baseUrl The base URL to use when checking relative links in Markdown files. No
ignorePatterns The list of regular expressions that match URLs to be ignored during link checking. No
replacementPatterns The list of regular expressions and replacement strings to modify URLs during link checking. No
aliveStatusCodes The list of HTTP status codes that are considered as "alive" links. No
useGitIgnore Indicates whether to use the rules defined in the .gitignore file to exclude files and directories. No
modifiedFilesOnly Indicates whether to check only the files that have been modified in the last git commit. No

Files to Check

The files section specifies the Markdown files that Linkspector should check for broken links. You can add the file paths you want to include in this list. For example:

files:
  - README.md
  - file2.md
  - file3.md

Directories to Search

The dirs section lists the directories where Linkspector should search for Markdown files. You can specify directories relative to the current working directory. For example:

dirs:
  - ./
  - folder2

Excluded Files

The excludedFiles section allows you to specify Markdown files that should be excluded from the link checking process. Add the paths of the files you want to exclude. For example:

excludedFiles:
  - ./check.md
  - excluded-file2.md

Excluded Directories

The excludedDirs section lets you specify directories that should be excluded from the link checking process. Provide the paths of the directories you want to exclude. For example:

excludedDirs:
  - ./lib
  - excluded-folder2

Base URL

The baseUrl option sets the base URL that will be used when checking relative links in Markdown files. In this example:

baseUrl: https://example.com

The base URL is set to https://example.com.

Ignore Patterns

The ignorePatterns section allows you to define regular expressions that match URLs to be ignored during the link checking process. For example:

ignorePatterns:
  - pattern: '^https://example.com/skip/.*$'
  - pattern: "^(ftp)://[^\\s/$?#]*\\.[^\\s]*$"

In this example, URLs matching the specified patterns will be skipped during link checking.

Replacement Patterns

The replacementPatterns section lets you define regular expressions and replacement strings to modify URLs during link checking. For example:

replacementPatterns:
  - pattern: "(https?://example.com)/(\\w+)/(\\d+)"
    replacement: '$1/id/$3'
  - pattern: "\\[([^\\]]+)\\]\\((https?://example.com)/file\\)"
    replacement: '<a href="$2/file">$1</a>'

These patterns and replacements will be applied to URLs found in the Markdown files.

Alive Status Codes

The aliveStatusCodes section allows you to specify a list of HTTP status codes that are considered as "alive" links. In this example:

aliveStatusCodes:
  - 200
  - 201
  - 204

Links returning any of these status codes will be considered valid.

Use .gitignore

The useGitIgnore option, when set to true, indicates that Linkspector should use the rules defined in the .gitignore file to exclude files and directories. For example:

useGitIgnore: true

When enabled, the app will respect the .gitignore rules during link checking.

Check Modified Files Only

The modifiedFilesOnly option, when set to true, indicates that Linkspector should only check the files that have been modified in the last git commit. For example:

modifiedFilesOnly: true

When enabled, Linkspector will use git to find the list of modified files and only check those files. Please note that this option requires git to be installed and available on your system path. If git is not installed or not found in the system path, Linkspector will throw an error.

Also, if no modified files are found in the list of files to check, Linkspector will skip link checking and exit with a message indicating that modified files are not specified in the configuration.

Sample configuration

files:
  - README.md
  - file2.md
  - file3.md
dirs:
  - ./
  - folder2
excludedFiles:
  - ./check.md
  - excluded-file2.md
excludedDirs:
  - ./lib
  - excluded-folder2
baseUrl: https://example.com
ignorePatterns:
  - pattern: '^https://example.com/skip/.*$'
  - pattern: "^(ftp)://[^\\s/$?#]*\\.[^\\s]*$"
replacementPatterns:
  - pattern: "(https?://example.com)/(\\w+)/(\\d+)"
    replacement: '$1/id/$3'
  - pattern: "\\[([^\\]]+)\\]\\((https?://example.com)/file\\)"
    replacement: '<a href="$2/file">$1</a>'
aliveStatusCodes:
  - 200
  - 201
  - 204
useGitIgnore: true

Sample output

If there are failed links, linkspector shows the output as comma-seprated values and exit with error. File, HTTP status code, Line number, Error message

REDISTRIBUTED.md, https://unlicense.org/, null, 186, net::ERR_SSL_VERSION_OR_CIPHER_MISMATCH at https://unlicense.org/]
💥 Error: Some hyperlinks in the specified files are invalid.

If there are no errors, linkspector shows the following message:

✨ Success: All hyperlinks in the specified files are valid.

Using Linkspector with Docker

To use Linkspector with Docker, follow these steps:

  1. Clone the Linkspector repository to your local machine and switch to the cloned directory:

    git clone git@github.com:UmbrellaDocs/linkspector.git
    cd linkspector
  2. Build the docker image locally, while being at the root (.) of this project:

    docker build --no-cache --pull --build-arg LINKSPECTOR_PACKAGE= -t umbrelladocs/linkspector .
  3. To perform a check using the default configuration, while being at the root ($PWD) of the project to be checked:

    docker run --rm -it -v $PWD:/app \
           --name linkspector umbrelladocs/linkspector \
           bash -c 'linkspector check'

    To specify a custom configuration file path:

    docker run --rm -it -v $PWD:/app -v $PWD/custom-config.yml:/path/to/custom-config.yml \
           --name linkspector umbrelladocs/linkspector \
           bash -c 'linkspector check -c /path/to/custom-config.yml'

What's planned

  • Spinner for local runs.
  • Create a GitHub action. See action-linkspector
  • Modified files only check.
  • [!] Asciidoc support. (Limited to hyperlinks only)
  • ReStructured Text support.
  • Disable binary files downlaod.
  • JSON output for failed-only or all links.
  • [ ] CSV output for all links. (dropped for now)
  • [ ] Experimaental mode to gather all links and check them in batches to study performance gains. (dropped for now)
  • [ ] Proxy support to connect puppeteer to a remote service. (dropped for now)
  • [ ] Puppeteer config support. (dropped for now)

Contributing

If you would like to contribute to Linkspector, please read the contributing guidelines.