/spellcheck-github-actions

Spell check action

Primary LanguageShellMIT LicenseMIT

spellcheck-github-actions

Markdownlint Action Spellcheck Action Docker Pulls

A GitHub Action that spell checks Python, Markdown, and Text files.

This action uses PySpelling to check spelling in source files in the designated repository.

Table of Contents

Features

  • Customizable configuration and spell checking using PySpelling
  • Support for the following formats (via PySpelling):
    • Markdown
    • Python
    • C++
    • HTML
    • JavaScript
    • ODF
    • OOXML
    • CSS
    • XML
    • Plain text
  • Support for aspell, Do see the section on Language Support for details
  • Support for the following languages:
    • English
    • French
    • German
    • Spanish
    • Do see the section on Language Support for details
  • Per repository and format custom word list to avoid errors based on words not known to default dictionary, see: PySpelling for more options
  • Flexible repository layout integration via file name matching using Wildcard Match
  • Support for Python's Markdown extensions, namely the pymdown-extensions via PySpelling

Configuration

  1. First you have to add a configuration for the spelling checker
  2. Create a file named: .spellcheck.yml or .spellcheck.yaml, do note if both files exist the prior will have precedence. Do note the recommendation is hidden files since these configuration files are not first rate citizens of your repository. You can also provide your own configuration file. Check out spellcheck configuration section down below.
  3. Paste the contents of the outlined example, which is a configuration for Markdown, useful for your README file

Do note that this action requires the contents of the repository, so it is recommended used with the Checkout action.

You have to define this part in your workflow, since it not a part of the action itself.

Example:

name: Spellcheck Action
on: push

jobs:
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
    steps:
    # The checkout step
    - uses: actions/checkout@v3
    - uses: rojopolis/spellcheck-github-actions@0.44.0
      name: Spellcheck

This configuration file must be created in a the .github/workflows/ directory.

For example, it could be named .github/workflows/spelling_action.yml for easy identification, if other actions are present.

Using a Canonical Version

In the above example, the configuration is pointing to the exact version of 0.44.0, this repository also offers the canonical version v0, so there is less hassle keeping the action up to date.

name: Spellcheck Action
on: push

jobs:
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
    steps:
    # The checkout step
    - uses: actions/checkout@v3
    - uses: rojopolis/spellcheck-github-actions@v0
      name: Spellcheck

Specifying Sources Files To Check

By default, this action will use the sources: list under each task in your config file to identify which files to scan. You can override this behaviour by setting source_files to the list of files or file patterns you want scanning.

When this option is used, you must also specify the task_name to override the sources: list for.

Do note that file paths containing spaces need to be quoted using either ' (single quotes) or " (double quotes). The quoting has to be uniform and the two quoting styles can not be intermixed.

Examples

Parts are lifted from issue #84

No spaces, quotes not required

source_files: README.md CHANGELOG.md notes/Notes.md

No spaces, quotes not required, double quotes used for complete parameter

source_files: "README.md CHANGELOG.md notes/Notes.md"

This might actually work, but it is not recommended and might it might break, instead using proper quoting.

No spaces, quotes not required, double quotes used for single parameters

source_files: "README.md" "CHANGELOG.md" "notes/Notes.md"

This would also work using single quotes

Spaces, quotes required, single quotes used

source_files: 'Managed Services/Security Monitor/README.md' 'Terraform/Development Guide/README.md'

Spaces, quotes required, double quotes used

source_files: "Managed Services/Security Monitor/README.md" "Terraform/Development Guide/README.md"

Spaces, quotes required, intermixed quotes, will not work

source_files: README.md CHANGELOG.md notes/Notes.md

Specify a Specific Task To Run

By default, all tasks in your config file will be run. By setting task_name you can override this and run only the task you require.

A configuration for designated source files could look as follows.

Example:

name: Spellcheck Action
on: push

jobs:
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
    steps:
    # The checkout step
    - uses: actions/checkout@v3
    - uses: rojopolis/spellcheck-github-actions@0.44.0
      name: Spellcheck
      with:
        source_files: README.md CHANGELOG.md notes/Notes.md
        task_name: Markdown

Specify a PySpelling Output Artifact

In order to make it easier to process larger amount of output. The action allows for the user to enable the generation of an artifact.

The optional output_file input parameter, if specified, defines the name of the generated file containing the spellcheck output. Such file can then be stored as workflow artifact using the actions/upload-artifact step.

A configuration for emitting an output artifact could look as follows.

Example:

name: Spellcheck Action
on: push

jobs:
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
    steps:
    # The checkout step
    - uses: actions/checkout@v3
    - uses: rojopolis/spellcheck-github-actions@0.44.0
      name: Spellcheck
      with:
        source_files: README.md CHANGELOG.md notes/Notes.md
        task_name: Markdown
        output_file: spellcheck-output.txt
    - uses: actions/upload-artifact@v3
      if: '!cancelled()' # Do not upload artifact if job was cancelled
      with:
        name: Spellcheck Output
        path: spellcheck-output.txt

The artifact can be downloaded via the GitHub UI or via the GitHub API. The artifact is named: Spellcheck Outout, based on the name specified in the above example and the file is named: spellcheck-output.txt, based on the name specified in the above example, it comes zipped.

Do see the official documentation for handling artifacts via the API.

The reason why if: '!cancelled()' is that the default behavior of GitHub Actions is to fail and stop processing. That would mean the proper state is displayed, but the artefact with the output is not available, this sort of beats the purpose.

Artifacts are by default available for 3 months.

Extra Configuration

Extra Configuration for PySpelling

Do check the PySpelling documentation for elaborate details on configuration of PySpelling.

Extra Configuration for Markdown

PySpelling uses the Python Markdown project. PySpelling allows for configuration of the Markdown handling using the pymdown-extensions authored by the author of PySpelling.

If for example wanted to use the superfences extension, you could configure it as follows:

  - pyspelling.filters.markdown:
      markdown_extensions:
      - pymdownx.superfences:

Current Spellcheck Action support the following extensions (in alphabetical order):

  • Arithmatex
  • B64
  • BetterEm
  • Caret
  • Critic
  • Details
  • Emoji
  • EscapeAll
  • Extra
  • Highlight
  • InlineHilite
  • Keys
  • MagicLink
  • Mark
  • PathConverter
  • ProgressBar
  • SaneHeaders
  • SmartSymbols
  • Snippets
  • StripHTML
  • SuperFences
  • Tabbed
  • Tasklist
  • Tilde

Please consult the documentation for the extensions for more details.

Currently only the case of use of superfences has been requested as outlined in the above example.

Do also see the Diagnostics sections below, demonstrating diagnostics emitted from Python Markdown, which might require the use of an extension.

Spellcheck Configuration File

You can either provide a path to the configuration file or save a file in the root of your repository with a predefined name (list below). If config_path is provided then it will be used and the other configuration options will be ignored. If config_path is not provided then the repository is searched after a first match

Example:

name: Spellcheck Action
on: push
jobs:
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: rojopolis/spellcheck-github-actions@0.44.0
      name: Spellcheck
      with:
        config_path: config/.spellcheck.yml # put path to configuration file here
        source_files: source/scanning.md source/triggers.md
        task_name: Markdown

Predefined Name

  1. .spellcheck.yml
  2. .spellcheck.yaml
  3. spellcheck.yml
  4. spellcheck.yaml (the old default)
  5. .pyspelling.yaml (PySpelling default)
  6. .pyspelling.yml (PySpelling default)

And is attempted read in that order, meaning first match is used, This means that you can use files prefixed with the . to have a less intrusive Spellcheck configuration in your repository.

matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

The above configuration will check the spelling of your repository's README.md and other Markdown files against an English dictionary. If your Markdown is named differently, correct or add additional patterns under sources, Markdown is sometimes named .mkdn.

When and if the run locates spelling errors, you have two options:

  1. Correct the spelling errors in the relevant files
  2. Add the relevant words to a custom word list, to be ignored

If you do the latter, you have to add the following to the Spellcheck configuration, under dictionary:

    wordlists:
    - .wordlist.txt

This supplies a custom list of words to supply the default dictionary for the specified language, in this case set to English en under aspell.

The complete configuration should resemble this:

matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    wordlists:
    - .wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

Change the configuration to suit your repository and needs, please see the examples/ directory for more example configurations.

Specifying Number of Jobs for Parallel Processing

This action supports parallel processing of the configured tasks, this is done using the jobs parameter. Introduced in version 4.10 of PySpelling.

The default value is 1, which means that the action will run in a single job.

jobs: 4

Full example:

jobs: 4
matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    wordlists:
    - .wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

Specifying Language

This action currently only support aspell, please see the section on Language Support below.

In the section for aspell you can specify the main language, for example en, via the lang parameter.

You can further specify dialect, using the d parameter.

See the documentation for PySpelling for more details.

Checking For Bad Spelling

The GitHub Action helps you make sure most spelling errors do not make it into your repository. You can however check your spelling prior to committing and pushing to your repository.

This simply uses the contents of our spelling toolchain:

$ pyspelling -c .spellcheck.yml
Misspelled words:

...

!!!Spelling check failed!!!

We can correct the error(s) pointed out by PySpelling as we go by adding new words to our local file: .wordlist.txt

And at some point we get:

$ pyspelling -c .spellcheck.yml
Spelling check passed :)

Now we should be good to go.

Do note you could also use the entrypoint.sh, which is the script used in the Docker image.

± sh entrypoint.sh

Using pyspelling on repository files outlined in .spellcheck.yml
----------------------------------------------------------------
Spelling check passed :)

Language Support

Currently only the following languages are supported via GNU Aspell:

Additional languages can be added by request, please open an issue.

Hunspell is supported by PySpelling, but is not currently supported by this action

Please open an issue or PR, if Hunspell should be evaluated for possible inclusion.

Tips

How to declutter your root directory from Spellcheck configuration files

If you think that the GitHub Spellcheck Action is cluttering the root directory of you project, you can move the configuration files to a subdirectory.

  1. In the action configuration (.github/workflows/<your action configuration of the spellcheck action>) you add the config_path parameter and specify where you want you have put you spellcheck configuration file
  2. In the spellcheck configuration file (mentioned above), you can specify the wordlist parameter to point to a designated path

Moving both files to .github could look at follows:

name: Spellcheck Action
on: push
jobs:
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: rojopolis/spellcheck-github-actions@0.44.0
      name: Spellcheck
      with:
        config_path: .github/spellcheck.yml # <--- put path to configuration file here
matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    wordlists:
    - .github/wordlist.txt # <-- put path to custom dictionary file here
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

Specify Code Not To Have Spelling Checked

Since this action checks all available text, you might run into problems with section of code examples etc.

This can be circumvented by the following configuration:

      ignores:
      - code
      - pre

This works on the intermediate HTML form of the data.

A complete configuration could look as follows:

matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    wordlists:
    - .wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

And code and pre sections are ignored by the spelling check.

Code fences in Markdown require additional configuration using the Markdown extension: pymdownx.superfences:

  - pyspelling.filters.markdown:
      markdown_extensions:
      - pymdownx.superfences:

A complete example could look as follows:

matrix:
- name: Markdown
  aspell:
    lang: en
  dictionary:
    wordlists:
    - .wordlist.txt
    encoding: utf-8
  pipeline:
  - pyspelling.filters.markdown:
      markdown_extensions:
      - pymdownx.superfences
  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre
  sources:
  - '**/*.md'
  default_encoding: utf-8

Getting Your Action Updated Automatically

The awesome tool dependabot lets you scan your used GitHub Marketplace Actions and lets you know if they are in need of an update.

The update is proposed via a pull request, which can be accepted or declined, it will itself take care of deleting pull requests if these become irrelevant.

You specify the configuration in the file: .github/dependabot.yml in your repository using this action - actually it scans all your actions.

# Basic dependabot.yml file
# REF: https://docs.github.com/en/code-security/supply-chain-security/keeping-your-actions-up-to-date-with-dependabot

version: 2
updates:
  # Enable version updates for Actions
  - package-ecosystem: "github-actions"
    # Look for `.github/workflows` in the `root` directory
    directory: "/"
    # Check for updates once a week
    schedule:
      interval: "weekly"

Slimming Your Wordlist By Ignoring Case

This tip works for aspell.

You can slim down your .wordlist.txt file if you have case variations of entries of words.

aspell:
    ignore-case: true

To convert you existing .wordlist.txt you could do something along the lines of this using Bash version 4.

$ tr '[:upper:]' '[:lower:]' < .wordlist.txt > temp-wordlist.txt
$ cat temp-wordlist.txt | sort -u > .wordlist.txt
$ rm temp-wordlist.txt

And you should be good to go.

Check only the changed files

The marvellous GitHub Action: tj-actions/changed-files can be used to check only the files changed in a pull request.

Your workflow could look something like this:

    - name: Get all changed markdown files
      uses: tj-actions/changed-files@v45
      id: changed_files
      with:
        files: |
           **.md

    - name: Run Spellcheck
      id: spellcheck
      uses: rojopolis/spellcheck-github-actions@v0
      with:
        task_name: Markdown
        source_files: ${{ steps.changed_files.outputs.all_changed_files }}

Lifted from: jonasbn/TIL.

This can be very use for very large repositories, where you only want to check the files changed in a pull request, so you can focus on the changes and do not have to wait so long for the action to complete.

Diagnostics

This is a list of common diagnostics, which can be emitted by the action and it's tools.

Diagnostic text: !!!Spelling check failed!!!

This indicates that a spelling check has been completed, but spelling errors were located and should be corrected.

  1. Either correct pinpointed spelling errors
  2. Or add pinpointed words to custom dictionary

Please see the section: "Checking For Bad Spelling" above.

Diagnostic text: RuntimeError: None of the source targets from the configuration match any files:

This diagnostic indicates that files outlines by the source wildcard pattern match did not match any files.

  1. Either adjust the pattern
  2. Or remove the configuration part since it does not match the repository contents
  3. Or set expect_match to false
matrix:
- name: markdown
  pipeline:
  - pyspelling.filters.text
  sources:
  - '**/*.md'
  expect_match: false
  default_encoding: utf-8

Please see the documentation for Wildcard Match (1 and 2) or Expect Match.

Diagnostic text: FileNotFoundError: [Errno 2] No such file or directory: '.wordlist.txt'

This diagnostic indicates that a custom word list has been specified in the used configuration, .spellcheck.yml, but the file does not exist.

  1. Create the empty file
$ touch .wordlist.txt

Please see the section: "Configuration" above.

Diagnostic text: ValueError: Unable to find or load pyspelling configuration from

This diagnostic indicates that the configuration file pointed to with the --config (-c) parameter cannot be located.

  1. Check that a file with the indicated name exists.

ValueError: Unable to find or load pyspelling configuration from spellcheck.yaml

Indicates: spellcheck.yaml so this file should exist in the repository.

If the file is available in the repository, please check that your workflow is configured correctly, with the following line, which enables the action: checkout.

uses: actions/checkout@v3

In full context:

name: Spellcheck Action
on: push

jobs:
  build:
    name: Spellcheck
    runs-on: ubuntu-latest
    steps:
    - uses: actions/checkout@v3
    - uses: rojopolis/spellcheck-github-actions@0.44.0
      name: Spellcheck

This step adds an action, which checkout out the repository for inspection by linters and other actions like this one.

Diagnostic text: ERROR: *.md -- 'NoneType' object has no attribute 'end'

This indicates issues with the Markdown and is reported by Markdown (See: PyPi site).

PySpelling does however support extension of the standard Markdown parser and you can specify the use of extensions of these are support.

This action support the extensions included in: pymdown-extensions (See: PyPi site)

And you can then put these to use in your configuration. The example below outlines the superfences extension.

  - pyspelling.filters.markdown:
      markdown_extensions:
      - pymdownx.superfences:

Please see the repository's requirements.txt for a list of all included Python modules and their exact versions.

Diagnostic text: ValueError: Pipline step in unexpected format:

This error emitted from PySpelling indicates issues with the configuration file.

Please see the section: "Configuration" above.

With the update of PySpelling from 2.6.1 to 2.7.3 with release 0.16.0 (0.15.0, do see the change log).

This error would be emitted from the GitHub Action, if the configuration was not properly formatted. The Markdown example's earlier revisions in this repository would demonstrate the error.

To address the error, the options below the filter should be properly indented.

  - pyspelling.filters.html:
    comments: false
    ignores:
    - code
    - pre

Should be indented to:

  - pyspelling.filters.html:
      comments: false
      ignores:
      - code
      - pre

The complete error emitted could would look something along the lines of:

Traceback (most recent call last):
  File "/usr/local/bin/pyspelling", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 30, in main
    return run(
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__main__.py", line 55, in run
    for results in spellcheck(
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 673, in spellcheck
    for result in spellchecker.run_task(task, source_patterns=sources):
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 311, in run_task
    self._build_pipeline(task)
  File "/usr/local/lib/python3.9/site-packages/pyspelling/__init__.py", line 255, in _build_pipeline
    raise ValueError(STEP_ERROR.format(str(step)))
ValueError: Pipline step in unexpected format: {'pyspelling.filters.html': None, 'comments': False, 'ignores': ['code', 'pre']}

Example lifted from issue #60

Diagnostic text: re.error: global flags not at the start of the expression at position 1

This error is emitted from PySpelling and indicate an issue with interpreting the configuration file.

From version 0.29.0 the action is using Python 3.11 and since Python 3.11 the regular expression engine (?i) can now only be used at the start of regular expression not elsewhere.

If you specify delimiters in the configuration file and use the (?i) flag, you will get this error, if used in the deprecated manner.

Thanks to @lasic for reporting and resolving the issue #189.

DockerHub

This action is based on a Docker image available on DockerHub.

This mean that if you developing your own spell checking action you can use this image.

Alternatively you can build your own Docker image based on the Dockerfile in this repository.

A note on DockerHub

The images are build from the GitHub repository master branch.

The recommended use is to use the latest release with a version tag. See the release history for details.

Whereas the tag latest just reflect the latest build based on the master branch.

The master branch might contain changes not tagged as released yet and can be regarded as unstable or experimental. Changes such as corrections to documentation etc. will not be tagged until separately as a general rule, unless the changes are significant, but the aim is to keep the documentation relevant and up to date.

Development

The GitHub Action is based on a Docker implementation.

The Dockerfile contains the image building and the entrypoint.sh, which acts as ENTRYPOINT for the Docker image describes the execution part.

You can test the action locally by building the Docker image and running it against your project/repository.

First you have to build it.

Download or fork the spellcheck action repository.

Unpack or clone the source code and build the Docker image.

$ docker build -t github-action-spellcheck .

Run the newly build Docker image.

Do note the project/repository has to contain a configuration, please see the section on configuration above:

$ cd <your project/repository directory>
$ docker run -it -v $PWD:/tmp github-action-spellcheck

Resources and References

Author

The original author of this GitHub Action is Robert Jordan (@rojopolis)

Acknowledgements

Here follows a list of contributors in alphabetical order:

  • @aSemy
  • Albert Volkman, @albertvolkman
  • Byron Miller, @supernovae
  • Isaac Muse, @facelessuser
  • Jonas Brømsø, @jonasbn
  • José Eduardo Montenegro Cavalcanti de Oliveira, @edumco
  • @Lasica
  • Matt Calvert, @miff2000
  • Matthew Macdonald-Wallace, @proffalken
  • Michael Flaxman, @mflaxman
  • Mike Starov, @xsaero00
  • Nicolas Lhomme, @nlhomme
  • Pavel Skipenes, @pavelskipenes
  • Peter Petrik, @PeterPetrik
  • Riccardo Porreca, @riccardoporreca
  • Stephen Bates, @sbates130272

Do you want to be left out, or feel left out of this list or have a different representation of your name, please submit a pull request or raise an issue

Copyright and License

This repository is licensed under the MIT license.