An extremely lightweight pandoc wrapper for Python 3.8+.
Features:
- Supports conversion between all formats that
pandoc
supports - markdown, HTML, LaTeX, Word, epub, pdf (output), and more. - Output to raw
bytes
(binary formats - e.g. PDF), tostr
objects (text formats - e.g. markdown), or to file (any format). pandoc
errors are raised as (informative) exceptions.- Full flexibility of the
pandoc
command-line tool, and the same syntax. (See the pandoc manual for more information.)
First, ensure pandoc
is on your PATH
.
(In other words, install pandoc and add it to
your PATH
.)
Then install pandadoc
from PyPI:
$ python -m pip install pandadoc
That's it.
Convert a webpage (or file) to markdown, and store it as a python str
:
>>> import pandadoc
>>> input_file = "https://example.com/"
>>> # Or: input_file = "path/to/my/file.html"
>>> example_md = pandadoc.call_pandoc(
... options=["-t", "markdown"], files=[input_file]
... )
>>> print(example_md)
<div>
# Example Domain
This domain is for use in illustrative examples in documents.
...
Now convert the markdown to RTF, and write it to a file:
>>> rtf_output_file = "example.rtf"
>>> pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "rtf", "-o", rtf_output_file],
... input_text=example_md,
... )
''
Notice that call_pandoc
returns an empty string ''
when a file output is used.
Looking at the output file:
{\pard \ql \f0 \sa180 \li0 \fi0 \outlinelevel0 \b \fs36 Example Domain\par} {\pard \ql \f0 \sa180 \li0 \fi0 This domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\par} {\pard \ql \f0 \sa180 \li0 \fi0 {\field{\*\fldinst{HYPERLINK "https://www.iana.org/domains/example"}}{\fldrslt{\ul More information... }}} \par}
Convert this RTF document to PDF, using xelatex with a custom character set,
and store the result as raw bytes
:
>>> raw_pdf = pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "pdf", "--pdf-engine", "xelatex", "--variable-mainfont", "Palatino"],
... files=[rtf_output_file],
... decode=False,
... )
(Note that PDF conversion requires a "PDF engine" to be installed - e.g. pdflatex, latexmk etc.)
Now you can send those raw bytes over a network, or write them to a file:
>>> with open("example.pdf", "wb") as f:
... f.write(raw_pdf)
...
>>> # Finished
You can find more pandoc
examples here.
If pandoc
exits with an error, an appropriate exception is raised (based on the
exit code):
>>> pandadoc.call_pandoc(
... options=["-f", "markdown", "-t", "zzz"], # non-existent format
... input_text=example_md,
... )
Traceback (most recent call last):
...
pandadoc.exceptions.PandocUnknownWriterError: Unknown output format zzz
>>> isinstance(pandadoc.exceptions.PandocUnknownWriterError(), pandadoc.PandocError)
True
You can find a full list of exceptions in the pandadoc.exceptions
module.
The pandoc
command-line tool works like this:
pandoc [OPTIONS] [FILES]
In addition to the OPTIONS
(documented here),
you can provide either some FILES
, or some input text (via stdin
).
The call_pandoc
function of pandadoc
works in a similar way:
- The
options
argument contains a list of pandoc options. E.g.["-f", "markdown", "-t", "html"]
. - The
files
argument is a list of file paths (or absolute URIs). E.g.["path/to/file.md", "https://www.fsf.org"]
- The
input_text
argument is used as text input to pandoc. E.g.# Simple Doc\n\nA simple markdown document\n
. - The
timeout
argument can be used to stop pandoc if it takes too long. - The
decode
argument determines whether the result should be decoded to astr
(True
by default) or left as raw bytes.
Please use the GitHub issue tracker to submit bugs or request features.
Feedback is always appreciated.
Distributed under the MIT license.