oslatex is a specialised Word-to-LaTeX converter. Instead of trying to translate a Word document into a faithful LaTeX representation, oslatex preserves the text, maps Word styles to customisable LaTeX commands and environments, and intentionally ignores most other formatting. If the Word document uses styles consistently and sensibly, this process produces a relatively clean LaTeX document.
oslatex was originally developed as an ad hoc converter for the Oslo Studies in Language (OSLa), which uses LaTeX for most typesetting but also accepts Word submissions. If your use case is similar you may find oslatex useful, but you will have to adapt the configuration to your needs as the default configuration included with oslatex was made to fit the OSLa journal's LaTeX classes and Word templates.
oslatex input.docx output.tex
oslatex can only read .docx
files. It does not support the older, binary
Word format.
All configuration is currently found in lib/oslatex/oslatex.json
.
Text is always transferred verbatim (except for abbreviations, which receive
special handling --- see below). Word styles apply either to paragraphs or to
running text. Their mapping to LaTeX is configured in two separate sections of
the configuration file (paragraph_styles
and run_styles
).
A small set of formatting instructions that make sense in running text (e.g. italics, bold face, underlining, overstrike, superscripts and subscripts) can optionally be mapped to LaTeX in the same way.
To map a paragraph style called Heading1
to the LaTeX command \section
, add
the following to paragraph_styles
:
"paragraph_styles": {
"Heading1": [null, "\\section"]
}
Note that style names in Word are case sensitive.
You can also map certain font features such as italics
, bold
, underline
,
strikethrough
, smallcaps
, superscipt
and subscript
:
"paragraph_styles": {
"italics": [null, "\\emph"]
}
Colours and highlighting includes the colour value after a hyphen, e.g.
color-00000
, highlight-red
.
To map to an environment instead, use the following:
"paragraph_styles": {
"Quote": [null, "blockquote"]
}
Sometimes it is necessary to map a style to a LaTeX command without arguments. To do so, use
"paragraph_styles": {
"Example": [null, "!ex."]
}
To do the same but enclose the LaTeX command and the text inside an environment, use
"paragraph_styles": {
"ListParagraph": [null, "enumerate!item"]
}
The final LaTeX file is generated on the basis of a template
(lib/oslatex/oslatex.tex.erb
). It is possible to map paragraph styles to
variables, which can then be expanded in the template. The default
configuration and template show several examples of this, e.g. Title
:
"paragraph_styles": {
"Title": ["title", ""]
}
To ignore a style, map it to [null, ""]
:
"paragraph_styles": {
"FootnoteText": [null, ""]
}
Mappings for run styles use a simpler syntax:
"run_styles": {
"italics": "\\emph",
"FootnoteReference": "",
"FootnoteNumbering": null
}
Mapping to ""
here ignores the styles. Mapping to null
ignores style and
text contents.
-
Multiple embedded levels of font adjustments like italics or bold face sometimes confuses the parser. You will see this easily in the LaTeX code as, for example, a whole paragraph will be italicised with a single non-italicised word when the opposite is the correct.
-
oslatex will try to merge adjacent elements with identical styles to create a less noisy LaTeX output. This strategy does not always succeed leaving multiple identical LaTeX commands adjacent to each other.
-
Some abbreviations that are common in English academic writing such as e.g. are automatically converted to
e.g.\@
to get the right spacing in LaTeX. This ought to be configurable but for now you will have to change the source code if you dislike this or need support for other abbreviations.