TeXtidote: a correction tool for LaTeX documents and other formats

Have you ever thought of using a grammar checker on LaTeX files?

If so, you probably know that the process is far from simple. Since LaTeX documents contain special commands and keywords (the so-called "markup") that are not part of the "real" text, you cannot run a grammar checker directly on these files: it cannot tell the difference between markup and text. The other option is to remove all this markup, leaving only the "clear" text; however, when a grammar tool points to a problem at a specific line in this clear text, it becomes hard to retrace that location in the original LaTeX file.

TeXtidote solves this problem; it can read your original LaTeX file and perform various sanity checks on it: for example, making sure that every figure is referenced in the text, enforcing the correct capitalization of titles, etc. In addition, TeXtidote can remove markup from the file and send it to the Language Tool library, which performs a verification of both spelling and grammar in a dozen languages. What is unique to TeXtidote is that it keeps track of the relative position of words between the original and the "clean" text. This means that it can translate the messages from Language Tool back to their proper location directly in your source file.

You can see the list of all the rules checked by TeXtidote at the end of this file.

TeXtidote also supports spelling and grammar checking of files in the Markdown format.

Getting TeXtidote

You can either install TeXtidote by downloading it manually, or by installing it using a package.

Under Debian systems: install package

Under Debian systems (Ubuntu and derivatives), you can install TeXtidote using dpkg. Download the latest .deb file in the Releases page; suppose it is called textidote_X.Y.Z_all.deb. You can install TeXtidote by typing:

$ sudo apt-get install ./textidote_X.Y.Z_all.deb

The ./ is mandatory; otherwise the command won't work.

Manual download

You can also download the TeXtidote executable manually: this works on all operating systems. Simply make sure you have Java version 8 or later installed on your system. Then, download the latest release of TeXtidote; put the JAR in the folder of your choice.

Using TeXtidote

TeXtidote is run from the command line. The TeXtidote repository contains a sample LaTeX file called example.tex. Download this file and save it to the folder where TeXtidote resides. You then have the choice of producing three types of "reports" on the contents of your file: an "HTML" report (viewable in a web browser), a "console" report (viewable in a terminal) and a "single-line" report (typically destined to be consumed by other scripts, see below).

HTML report

To run TeXtidote and perform a basic verification of the file, run:

java -jar textidote.jar --output html example.tex > report.html

In Linux, if you installed TeXtidote using apt-get, you can also call it directly by typing:

textidote --output html example.tex > report.html

Here, the --output html option tells TeXtidote to produce a report in HTML format; the > symbol indicates that the output should be saved to a file, whose name is report.html. TeXtidote will run for some time, and print:

TeXtidote v0.8 - A linter for LaTeX documents
(C) 2018-2019 Sylvain Hallé - All rights reserved

Found 23 warnings(s)
Total analysis time: 2 second(s)

Once the process is over, switch to your favorite web browser, and open the file report.html (using the File/Open menu). You should see something like this:

As you can see, the page shows your original LaTeX source file, where some portions have been highlighted in various colors. These correspond to regions in the file where an issue was found. You can hover your mouse over these colored regions; a tooltip will show a message that describes the problem.

If you don't write any filename (or write -- as the filename), TeXtidote will attempt to read one from the standard input.

Plain report

To run TeXtidote and display the results directly in the console, simply omit the --output html option (you can also use --output plain), and do not redirect the output to a file:

java -jar textidote.jar example.tex

TeXtidote will analyze the file like before, but produce a report that looks like this:

* L25C1-L25C25 A section title should start with a capital letter. [sh:001]
  \section{a first section}
  ^^^^^^^^^^^^^^^^^^^^^^^^^
* L38C1-L38C29 A section title should not end with a punctuation symbol.
  [sh:002]
  \subsection{ My subsection. }
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* L15C94-L15C99 Add a space before citation or reference. [sh:c:001]
   things, like a citation\cite{my:paper} .The text

Each element of the list corresponds to a "warning", indicating that something in the text requires your attention. For each warning, the position in the original source file is given: LxxCyy indicates line xx, column yy. The warning is followed by a short comment describing the issue, and an excerpt from the line in question is displayed. The range of characters where the problem occurs is marked by the "^^^^" symbols below the text. Each of these warnings results from the evaluation of some "rule" on the text; an identifier of the rule in question is also shown between brackets.

Single line report

Another option to display the results directly in the console is the single line report:

java -jar textidote.jar --output singleline example.tex

Textidote will analyze the file like before, but this time the report looks like this:

example.tex(L25C1-L25C25): A section title should start with a capital letter. "\section{a first section}"
example.tex(L38C1-L38C29): A section title should not end with a punctuation symbol. "\subsection{ My subsection. }"
example.tex(L15C94-L15C99): Add a space before citation or reference. "things, like a citation\cite{my:paper} .The text"

Each line corresponds to a warning, and is parseable by regular expressions easily, e.g., for further processing in another tool. The file is given at the beginning of the line, followed by the position in parentheses. Then, the warning message is given, and the excerpt causing the warning is printed in double quotes (""). Note, that sometimes it may happen that a position cannot be determined. In this case, instead of LxxCyy, ? is printed.

You can disable the use of color in any form of command-line output using the --no-color switch.

Spelling, grammar and style

You can perform further checks on spelling and grammar, by passing the --check option at the command line. For example, to check text in English, you run:

java -jar textidote.jar --check en example.tex

The --check parameter must be accompanied by a two-letter code indicating the language to be used. Language Tool is a powerful library that can verify spelling, grammar, and even provide suggestions regarding style. TeXtidote simply passes a cleaned-up version of the LaTeX file to Language Tool, retrieves the messages it generates, and coverts the line and column numbers associated to each message back into line/column numbers of the original source file. For more information about the kind of verifications made by Language Tool, please refer to its website.

Additionally, the --firstlang lang option can be used to make Language Tool check for false friends in your first language. For example, to check a text in english, when your first language is german, you may run:

java -jar textidote.jar --check en --firstlang de example.tex

The language codes you can use are:

de: (Germany) German, and the variants de_AT (Austrian) and de_CH (Swiss)
en: (US) English, and the variants en_CA (Canadian) and en_UK (British)
es: Spanish
fr: French
nl: Dutch
pt: Portuguese
pl: Polish

Using a dictionary

If you have a list of words that you want TeXtidote to ignore when checking spelling, you can use the --dict parameter to specify the location of a text file:

java -jar textidote.jar --check en --dict dico.txt example.tex

The file dico.txt must be a plain text file contain a list of words to be ignored, with each word on a separate line. (The list is case sensitive.)

If you already spell checked you file using Aspell and saved a local dictionary (as is done for example by the PaperShell environment), TeXtidote can automatically load this dictionary when invoked. More specifically, it will look for a file called .aspell.XX.pws in the folder where TeXtidote is started (this is the filename Aspell gives to local dictionaries). The characters XX are to be replaced with the two-letter language code. If such a file exists, TeXtidote will load it and mention it at the console:

Found local Aspell dictionary

Analyzing a file from the clipboard

If you omit a filename, TeXtidote analyzes the standard input. You can use this to copy a piece of code to the clipboard and analyze it without the need to save it as a file first. The procedure depends on your operating system.

In Linux, you can copy text and pipe it to TeXtidote using xclip: xclip -o | textidote [options...].
In Windows' PowerShell, the same can be done with GetClipboard: GetClipboard | textidote [options...]
In MacOS X, the same can be done with pbpaste: pbpaste | textidote [options...]

Ignoring rules

You may want to ignore some of TeXtidote's advice. You can do so by specifying rule IDs to ignore with the --ignore command line parameter. For example, the ID of the rule "A section title should start with a capital letter" is sh:001 (rule IDs are shown between brackets in the reports given by TeXtidote); to ignore warnings triggered by this rule, you call TeXtidote as follows:

java -jar textidote.jar --ignore sh:001 myfile.tex

If you want to ignore multiple rules, separate their IDs with a comma (but no space).

Ignoring environments

TeXtidote can be instructed to remove user-specified environments using the --remove command line parameter. For example:

$ java -jar textidote.jar --remove itemize myfile.tex

This command will remove all text lines between \begin{itemize} and \end{itemize} before further processing the file.

Ignoring macros

The same can be done with macros:

$ java -jar textidote.jar --remove-macros foo myfile.tex

This command will remove all occurrences of use-defined command \foo in the text. Alternate syntaxes like \foo{bar} and \foo[x=y]{bar} are also recognized and deleted.

Replacing strings

Before TeXtidote analyses a file, you can ask it to apply a set of find/replace operations (for example, to replace a macro by some predefined character string). You can write these patterns into a text file and pass them to the program at the command line:

$ java -jar textidote.jar --replace replacements.txt

Here, replacements.txt is the file that contains the find/replace patterns, fomatted as follows:

# Empty lines beginning with a pound sign are ignored
# Search and replace patterns are separated by a tab
find	replace
foo		bar
# Patterns can also be regular expressions
abc\d+[^x]	123

Replacement rules are applied line-wise, so multiline replacements are not possible.

Reading a sub-file

By default, TeXtidote ignores everything before the \begin{document} command. If you have a large document that consists of multiple included LaTeX "sub-files", and you want to check one such file that does not contain a \begin{document}, you must tell TeXtidote to read all the file using the --read-all command line option. Otherwise, TeXtidote will ignore the whole file and give you no advice.

TeXtidote also automatically follows sub-files that are embedded from a main document using \input{filename} and \include{filename} (braces are mandatory). Any such non-commented instruction will add the corresponding filename to the running queue. If you want to exclude an \input from being processed, you must surround the line with ignore begin/end comments (see below, Helping TeXtidote).

Removing markup

You can also use TeXtidote just to remove the markup from your original LaTeX file. This is done with the option --clean:

java -jar textidote.jar --clean example.tex

By default, the resulting "clean" file is printed directly at the console. To save it to a file, use a redirection:

java -jar textidote.jar --clean example.tex > clean.txt

You will see that TeXtidote performs a very aggressive deletion of LaTeX markup:

All figure, table and tabular environments are removed
All equations are removed
All inline math expressions ( $...$ ) are replaced by "X"
All \cite commands are replaced by "0"
All \ref commands are replaced by "[0]"
Commands that alter text (\textbf, \emph, \uline, \footnote) are removed (but the text is kept)
Virtually all other commands are simply deleted

Surprisingly, the result of applying these modifications is a text that is clean and legible enough for a spelling or grammar checker to provide sensible advice.

As was mentioned earlier, TeXtidote keeps a mapping between character ranges in the "cleaned" file, and the same character ranges in the original LaTeX document. You can get this mapping by using the --map option:

java -jar textidote.jar --clean --map map.txt example.tex > clean.txt

The --map parameter is given the name of a file. TeXtidote will put in this file the list of correspondences between character ranges. This file is made of lines that look like this:

L1C1-L1C24=L1C5-L128
L1C26-L1C28=L1C29-L1C31
L2C1-L2C10=L3C1-L3C10
...

The first entry indicates that characters 1 to 24 in the first line of the clean file correspond to characters 5 to 28 in the first line of the original LaTeX file --and so on. This mapping can have "holes": for example, character 25 line 1 does not correspond to anything in the original file (this happens when the "cleaner" inserts new characters, or replaces characters from the original file by something else). Conversely, it is also possible that characters in the original file do not correspond to anything in the clean file (this happens when the cleaner deletes characters from the original).

Character encodings

TeXtidote uses the OS default encoding when reading files (e.g. utf-8 in Linux, cp1252 in Windows). You can override this setting using the --encoding command line option:

java -jar textidote.jar --encoding cp1252 example.tex

Using a configuration file

If you need to run TeXtidote with many command line arguments (for example: you load a local dictionary, ignore a few rules, apply replacements, etc.), it may become tedious to invoke the program with a long list of arguments every time. TeXtidote can be "configured" by putting those arguments in a text file called .textidote in the directory from which it is called. Here is an example of what such a file could contain:

--output html --read-all
--replace replacements.txt
--dict mydict.txt
--ignore sh:001,sh:d:001
--check en mytext.tex

As you can see, arguments can be split across multiple lines. You can then call TeXtidote without any arguments like this:

textidote > report.html

If you call TeXtidote with command line arguments, they will be merged with whatever was found in .textidote. You can also tell TeXtidote to explicitly ignore that file and only take into account the command line arguments using the --no-config argument.

Markdown input

TeXtidote also supports files in the Markdown format. The only difference is that rules specific to LaTeX (references to figures, citations) are not evaluated.

Simply call TeXtidote with a Markdown input file instead of a LaTeX file. The format is auto-detected by looking at the file extension. However, if you pass a file through the standard input, you must tell TeXtidote that the input file is Markdown by using the command line parameter --type md. Otherwise, TeXtidote assumes by default that the input file is LaTeX.

Helping TeXtidote

It order to get the best results when using TeXtidote, it is advisable that you follow a few formatting conventions when writing your LaTeX file:

Avoid putting multiple \begin{environment} and/or \end{environment} on the same line
Keep the arguments of a command on a single line. Commands (such as \title{}) that have their opening and closing braces on different lines are not recognized by TeXtidote and will result in garbled output and nonsensical warnings.
Do not hard-wrap your paragraphs. It is easier for TeXtidote to detect paragraphs if they have no hard carriage returns inside. (If you need word wrapping, it is preferable to enable it in your text editor.)
Put headings like \section or \paragraph alone on their line and separate them from the text below by a blank line.

As a rule, it is advisable to first see what your text looks like using the --clean option, to make sure that TeXtidote is performing checks on something that makes sense.

If you realize that a portion of LaTeX markup is not handled properly and messes up the rest of the file, you can tell TeXtidote to ignore a region using a special LaTeX comment:

% textidote: ignore begin
Some weird LaTeX markup that TeXtidote does not
understand...
% textidote: ignore end

The lines between textidote: ignore begin and textidote: ignore end will be handled by TeXtidote as if they were comment lines.
When you are using markdown you can also selectively ignore parts of the document:

<!-- textidote: ignore begin -->
This should be ignored
<!-- textidote: ignore end -->

Linux shortcuts

To make using TeXtidote easier, you can create shortcuts on your system. Here are a few recommended tips.

First, we recommend you create a folder called /opt/textidote and put the big textidote.jar file there (this requires root privileges). This step is already taken care of if you installed the TeXtidote package using apt-get.

Command line shortcut

(This step is not necessary if TeXtidote has been installed with apt-get.) In/usr/local/bin, create a file called textidote with the following contents:

#! /bin/bash
java -jar /opt/textidote/textidote.jar "$@"

Make this file executable by typing at the command line:

sudo chmod +x /usr/local/bin/textidote

(These two operations also require root privileges.) From then on, you can invoke TeXtidote on the command line from any folder by simply typing textidote, e.g.:

textidote somefile.tex

Desktop shortcut

If you use a desktop environment such as Gnome or Xfce, you can automate this even further by creating a TeXtidote icon on your desktop. First, create a file called /opt/textidote/textidote-desktop.sh with the following contents, and make this file executable:

#!/bin/bash
if [ -x /usr/bin/notify-send ]; then
  err() { notify-send -a TeXtidote -i /opt/textidote/textidote-icon.svg "$*"; }
else
  err() { printf "%s\n" "$*" >&2; }
fi

[ $# -lt 1 ] && err "At least one file should be provided as input" && exit
dir=$(dirname "$1")

pushd "$dir" || err "$dir does not exist" && exit
java -jar /opt/textidote/textidote.jar --check en --output html "$@" > /tmp/textidote.html
popd || exit

xdg-open /tmp/textidote.html &

This script enters into the directory of the file passed as an argument, calls TeXtidote, sends the HTML report to a temporary file, and opens the default web browser to show that report.

Then, on your desktop (typically in your ~/Desktop folder), create another file called TeXtidote.desktop with the following contents:

[Desktop Entry]
Version=1.0
Type=Application
Name=TeXtidote
Comment=Check text with TeXtidote
Exec=/opt/textidote/textidote-desktop.sh %F
Icon=/opt/textidote/textidote-icon.svg
Path=
Terminal=false
StartupNotify=false

This will create a new desktop shortcut; make this file executable. From then on, you can drag LaTeX files from your file manager with your mouse and drop them on the TeXtidote icon. After the analysis, the report will automatically pop up in your web browser. Voilà!

Tab completions

You can auto-complete the commands you type at the command-line using the TAB key (as you are probably used to). If you installed TeXtidote using apt-get, auto-completion for Bash comes built-in. You can also enable auto-completion for other shells as follows.

Zsh

Users of Zsh can also enable auto-completion; in your ~/.zshrc file, add the line

source /opt/textidote/textidote.zsh

(Create the file if it does not exist.) You must then restart your Zsh shell for the changes to take effect.

Visual Studio Code integration

Users of Visual Studio Code can integrate TeXtidote by calling it with the --output singleline and --no-color options and parse its results. Moreover, user cphyc also wrote a nice build task.

Emacs integration

Emacs users can benefit from TeXtidote through flycheck.
A dedicated flycheck-checker can be defined as in the following init.el/.emacs snippet. Replace ~/PATH/TO/textidote.jar with the correct path for your system.

(flycheck-define-checker tex-textidote
  "A LaTeX grammar/spelling checker using textidote.

  See https://github.com/sylvainhalle/textidote"
  :modes (latex-mode plain-tex-mode)
  :command ("java" "-jar" (eval (expand-file-name "~/PATH/TO/textidote.jar"))
            "--read-all"
            "--output" "singleline"
            "--no-color"
            "--check"   (eval (if ispell-current-dictionary (substring ispell-current-dictionary 0 2) "en"))
            ;; Try to honor local aspell dictionary and replacements if they exist
            "--dict"    (eval (expand-file-name "~/.aspell.en.pws"))
            "--replace" (eval (expand-file-name "~/.aspell.en.prepl"))
            ;; Using source ensures that a single temporary file in a different dir is created
            ;; such that textidote won't process other files. This serves as a hacky workaround for
            ;; https://github.com/sylvainhalle/textidote/issues/200.
            source)
  :error-patterns ((warning line-start (file-name)
                            "(L" line "C" column "-" (or (seq "L" end-line "C" end-column) "?") "): "
                            (message (one-or-more (not "\""))) (one-or-more not-newline) line-end)))
(add-to-list 'flycheck-checkers 'tex-textidote)

Rules checked by TeXtidote

Here is a list of the rules that are checked on your LaTeX file by TeXtidote. Each rule has a unique identifier, written between square brackets.

Language Tool

In addition to all the rules below, the --check xx option activates all the rules verified by Language Tool (more than 2,000 grammar and spelling errors). Note that the verification time is considerably longer when using that option.

If the --check option is used, you can add the --languagemodel xx option to find errors using n-gram data. In order to do so, xx must be a path pointing to an n-gram-index directory. Please refer to the LanguageTool page (link above) on how to use n-grams and what this directory should contain.

Style

A section title should start with a capital letter. [sh:001]
A section title should not end with a punctuation symbol. [sh:002]
A section title should not be written in all caps. The LaTeX stylesheet takes care of rendering titles in caps if needed. [sh:003]
Use a capital letter when referring to a specific section, chapter or table: 'Section X'. [sh:secmag, sh:chamag, sh:tabmag]
A (figure, table) caption should end with a period. [sh:capperiod]

Citations and references

There should be one space before a \cite or \ref command [sh:c:001], and no space after [sh:c:002].
Do not use 'in [X]' or 'from [X]': the syntax of a sentence should not be changed by the removal of a citation. [sh:c:noin]
Do not mix \cite and \citep or \citet in the same document. [sh:c:mix]
When citing more than one reference, do not use multiple \cite commands; put all references in the same \cite. [sh:c:mul, sh:c:mulp]

Figures

Every figure should have a label, and every figure should be referenced at least once in the text. [sh:figref]
Use a capital letter when referring to a specific figure: 'Figure X'. [sh:figmag]

Structure

A section should not contain a single sub-section. More generally, a division of level n should not contain a single division of level n+1. [sh:nsubdiv]
The first heading of a document should be the one with the highest level. For example, if a document contains sections, the first section cannot be preceded by a sub-section. [sh:secorder]
There should not be a jump down between two non-successive section levels (e.g. a \section followed by a \subsubsection without a \subsection in between). [sh:secskip]
You should avoid stacked headings, i.e. consecutive headings without text in between. [sh:stacked]

Hard-coding

Figures should not refer to hard-coded local paths. [sh:relpath]
Do not refer to sections, figures and tables using a hard-coded number. Use \ref instead. [sh:hcfig, sh:hctab, sh:hcsec, sh:hccha]
You should not break lines manually in a paragraph with \\. Either start a new paragraph or stay in the current one. [sh:nobreak]
If you are writing a research paper, do not hard-code page breaks with \newpage. [sh:nonp]

LaTeX subtleties

Use a backslash or a comma after the last period in "i.e.", "e.g." and "et al."; otherwise LaTeX will think it is a full stop ending a sentence. [sh:010, sh:011]
There should not be a space before a semicolon or a colon. If in your language, typographic rules require a space here, LaTeX takes care of inserting it without your intervention. [sh:d:005, sh:d:006]

Potentially suspicious

There should be at least N words between two section headings (currently N=100). [sh:seclen]

Hints for using CI (Continuous Integration)

TeXtidote returns the number of warnings as exit code. GitHub and Gitlab exspect a default exit code auf 0 or else the pipeline will fail. To circumvent this you have disable this feature.

Github

continue-on-error: true

See more at GitHub documentation

Gitlab

allow_failure: true

See more at Gitlab documentation

Building TeXtidote

First make sure you have the following installed:

The Java Development Kit (JDK) to compile. TeXtidote requires version 8 of the JDK (and probably works with later versions).
Ant to automate the compilation and build process

Download the sources for TeXtidote from GitHub or clone the repository using Git:

git clone git@github.com:sylvainhalle/textidote.git

Compiling

First, download the dependencies by typing:

ant download-deps

Then, compile the sources by simply typing:

ant

This will produce a file called textidote.jar in the folder. This file is runnable and stand-alone, or can be used as a library, so it can be moved around to the location of your choice.

In addition, the script generates in the docs/doc folder the Javadoc documentation for using TeXtidote.

Testing

TeXtidote can test itself by running:

ant test

Unit tests are run with jUnit; a detailed report of these tests in HTML format is available in the folder tests/junit, which is automatically created. Code coverage is also computed with JaCoCo; a detailed report is available in the folder tests/coverage.

About the author

TeXtidote was written by Sylvain Hallé, Full Professor in the Department of Computer Science and Mathematics at Université du Québec à Chicoutimi, Canada.

Like TeXtidote?

TeXtidote is free software licensed under the GNU General Public License 3. It is released as postcardware: if you use and like the software, please tell the author by sending a postcard of your town at the following address:

Sylvain Hallé
Department of Computer Science and Mathematics
Univerité du Québec à Chicoutimi
555, boulevard de l'Université
Chicoutimi, QC
G7H 2B1 Canada

If you like TeXtidote, you might also want to look at PaperShell, a template environment for writing scientific papers in LaTeX.

Why is it called TeXtidote?

TeXtidote is a play on Antidote, which is a spelling/grammar checker well known to French-speaking users and works with word processors. So TeXtidote is like a version of Antidote for TeX.

TheConner/textidote