boisgera/pandoc

Please extend Readme

Opened this issue ยท 4 comments

The description of this project quite terse. What exaclty does it do?

  • Reimplement the pandoc data-model for documents?
  • Implement converters?
    What can this be used for or which other projects are using this.

(I'm asking these questions since I would prefer to use pandoc converters written in Python over ones written in haskell.)

Hi,

Agreed that much more documentation is needed. I could add a FAQ to begin with ; that wouldn't take too much time and I could start with the answer to your specific questions.

To answer your questions directly : from the user point of view, what this project brings is the ability to manage documents and document fragments in Python, with a hierarchy of classes that matches the one used by pandoc (the Haskell libary) and to read/write whatever format pandoc can read/write.
The major use case : read a markdown document, perform some automated analysis or transformation in Python, then export the result to the format you need (markdown, HTML, pdf, etc.).
Example of documents managed with this project : https://github.com/boisgera/CDIS, https://eul.ink/complex-analysis/#lectures-tutorials (I hope you like maths !) ; but to be honest, no external project that I know of uses this library.

Under the hood :

  • The Python class hierarchy is automatically generated from the Haskell types. So, yeah, we reimplement the exact data model for documents in Python (or rather the data models since pandoc has changed this model quite a bit over the years).

  • The only converter we implement are JSON to Python and Python to JSON. The pandoc JSON format opens the capability to deal with pandoc documents in any language, but it is also very unwieldy (in every language ...) in my opinion.

  • For everything else (JSON to whatever and whatever to JSON) we leverage the original pandoc program.

Thanks for the explanation. As a quick, intermediate solution you could simply add the three bullet points to the Readme - this would have been enough information for me then.

The major use case : read a markdown document, perform some automated analysis or transformation in Python, then export the result to the format you need (markdown, HTML, pdf, etc.).

Can you do html to markdown too?

The major use case : read a markdown document, perform some automated analysis or transformation in Python, then export the result to the format you need (markdown, HTML, pdf, etc.).

Can you do html to markdown too?

Sure, to the extent that pandoc (the CLI / Haskell tool that we leverage) supports that.

Consider the following README.html file:

<h1 id="pandoc-python-library">Pandoc โ€“ ๐Ÿ Python Library</h1>
<p><img src="https://img.shields.io/pypi/pyversions/pandoc.svg" alt="Python" /> <a href="https://pypi.python.org/pypi/pandoc"><img src="https://img.shields.io/pypi/v/pandoc.svg" alt="PyPI version" /></a> <a href="https://boisgera.github.io/pandoc"><img src="https://img.shields.io/badge/doc-mkdocs-845ed7.svg" alt="Mkdocs" /></a> <a href="https://github.com/boisgera/pandoc/discussions"><img src="https://img.shields.io/badge/discuss-online-845ef7" alt="GitHub discussions" /></a> <a href="https://pepy.tech/project/pandoc"><img src="https://pepy.tech/badge/pandoc" alt="Downloads" /></a> <a href="https://github.com/boisgera/pandoc/stargazers"><img src="https://img.shields.io/github/stars/boisgera/pandoc?style=flat" alt="GitHub stars" /></a> <a href="https://github.com/boisgera/pandoc/actions/workflows/build.yml"><img src="https://github.com/boisgera/pandoc/actions/workflows/build.yml/badge.svg" alt="build" /></a></p>
<h2 id="getting-started">๐Ÿš€ Getting started</h2>
<p><a href="https://pandoc.org/">Pandoc</a> โ€“ the general markup converter (and Haskell library) written by <a href="https://johnmacfarlane.net/">John MacFarlane</a> โ€“ needs to be available. You may follow the official <a href="https://pandoc.org/installing.html">installation instructions</a> or use <a href="https://docs.conda.io">conda</a>:</p>
<pre><code>$ conda install -c conda-forge pandoc</code></pre>
<p>Then, install the latest stable version of the pandoc Python library with pip:</p>
<pre><code>$ pip install --upgrade pandoc</code></pre>
<h2 id="overview">๐ŸŒŒ Overview</h2>
<p>This project brings <a href="https://pandoc.org/">Pandoc</a>โ€™s data model for markdown documents to Python:</p>
<pre><code>$ echo &quot;Hello world!&quot; | python -m pandoc read 
Pandoc(Meta({}), [Para([Str(&#39;Hello&#39;), Space(), Str(&#39;world!&#39;)])])</code></pre>
<p>It can be used to analyze, create and transform documents, in Python :</p>
<pre><code>&gt;&gt;&gt; import pandoc
&gt;&gt;&gt; text = &quot;Hello world!&quot;
&gt;&gt;&gt; doc = pandoc.read(text)
&gt;&gt;&gt; doc
Pandoc(Meta({}), [Para([Str(&#39;Hello&#39;), Space(), Str(&#39;world!&#39;)])])

&gt;&gt;&gt; paragraph = doc[1][0]
&gt;&gt;&gt; paragraph
Para([Str(&#39;Hello&#39;), Space(), Str(&#39;world!&#39;)])
&gt;&gt;&gt; from pandoc.types import Str
&gt;&gt;&gt; paragraph[0][2] = Str(&#39;Python!&#39;)
&gt;&gt;&gt; text = pandoc.write(doc)
&gt;&gt;&gt; print(text)
Hello Python!</code></pre>
<p>For more information, refer to the <a href="https://boisgera.github.io/pandoc">๐Ÿ“– documentation</a>.</p>

In Python you can do:

>>> import pandoc
>>> doc = pandoc.read(file="README.html")
>>> doc
Pandoc(Meta({}), [Header(1, ('pandoc-python-library', [], []), [Str('Pandoc'), Space(), Str('โ€“'), Space(), Str('๐Ÿ'), Space(), Str('Python'), Space(), Str('Library')]), Para([Image(('', [], []), [Str('Python')], ('https://img.shields.io/pypi/pyversions/pandoc.svg', '')), Space(), Link(('', [], []), [Image(('', [], []), [Str('PyPI'), Space(), Str('version')], ('https://img.shields.io/pypi/v/pandoc.svg', ''))], ('https://pypi.python.org/pypi/pandoc', '')), Space(), Link(('', [], []), [Image(('', [], []), [Str('Mkdocs')], ('https://img.shields.io/badge/doc-mkdocs-845ed7.svg', ''))], ('https://boisgera.github.io/pandoc', '')), Space(), Link(('', [], []), [Image(('', [], []), [Str('GitHub'), Space(), Str('discussions')], ('https://img.shields.io/badge/discuss-online-845ef7', ''))], ('https://github.com/boisgera/pandoc/discussions', '')), Space(), Link(('', [], []), [Image(('', [], []), [Str('Downloads')], ('https://pepy.tech/badge/pandoc', ''))], ('https://pepy.tech/project/pandoc', '')), Space(), Link(('', [], []), [Image(('', [], []), [Str('GitHub'), Space(), Str('stars')], ('https://img.shields.io/github/stars/boisgera/pandoc?style=flat', ''))], ('https://github.com/boisgera/pandoc/stargazers', '')), Space(), Link(('', [], []), [Image(('', [], []), [Str('build')], ('https://github.com/boisgera/pandoc/actions/workflows/build.yml/badge.svg', ''))], ('https://github.com/boisgera/pandoc/actions/workflows/build.yml', ''))]), Header(2, ('getting-started', [], []), [Str('๐Ÿš€'), Space(), Str('Getting'), Space(), Str('started')]), Para([Link(('', [], []), [Str('Pandoc')], ('https://pandoc.org/', '')), Space(), Str('โ€“'), Space(), Str('the'), Space(), Str('general'), Space(), Str('markup'), Space(), Str('converter'), Space(), Str('(and'), Space(), Str('Haskell'), Space(), Str('library)'), Space(), Str('written'), Space(), Str('by'), Space(), Link(('', [], []), [Str('John'), Space(), Str('MacFarlane')], ('https://johnmacfarlane.net/', '')), Space(), Str('โ€“'), Space(), Str('needs'), Space(), Str('to'), Space(), Str('be'), Space(), Str('available.'), Space(), Str('You'), Space(), Str('may'), Space(), Str('follow'), Space(), Str('the'), Space(), Str('official'), Space(), Link(('', [], []), [Str('installation'), Space(), Str('instructions')], ('https://pandoc.org/installing.html', '')), Space(), Str('or'), Space(), Str('use'), Space(), Link(('', [], []), [Str('conda')], ('https://docs.conda.io', '')), Str(':')]), CodeBlock(('', [], []), '$ conda install -c conda-forge pandoc'), Para([Str('Then,'), Space(), Str('install'), Space(), Str('the'), Space(), Str('latest'), Space(), Str('stable'), Space(), Str('version'), Space(), Str('of'), Space(), Str('the'), Space(), Str('pandoc'), Space(), Str('Python'), Space(), Str('library'), Space(), Str('with'), Space(), Str('pip:')]), CodeBlock(('', [], []), '$ pip install --upgrade pandoc'), Header(2, ('overview', [], []), [Str('๐ŸŒŒ'), Space(), Str('Overview')]), Para([Str('This'), Space(), Str('project'), Space(), Str('brings'), Space(), Link(('', [], []), [Str('Pandoc')], ('https://pandoc.org/', '')), Str('โ€™s'), Space(), Str('data'), Space(), Str('model'), Space(), Str('for'), Space(), Str('markdown'), Space(), Str('documents'), Space(), Str('to'), Space(), Str('Python:')]), CodeBlock(('', [], []), '$ echo "Hello world!" | python -m pandoc read \nPandoc(Meta({}), [Para([Str(\'Hello\'), Space(), Str(\'world!\')])])'), Para([Str('It'), Space(), Str('can'), Space(), Str('be'), Space(), Str('used'), Space(), Str('to'), Space(), Str('analyze,'), Space(), Str('create'), Space(), Str('and'), Space(), Str('transform'), Space(), Str('documents,'), Space(), Str('in'), Space(), Str('Python'), Space(), Str(':')]), CodeBlock(('', [], []), '>>> import pandoc\n>>> text = "Hello world!"\n>>> doc = pandoc.read(text)\n>>> doc\nPandoc(Meta({}), [Para([Str(\'Hello\'), Space(), Str(\'world!\')])])\n\n>>> paragraph = doc[1][0]\n>>> paragraph\nPara([Str(\'Hello\'), Space(), Str(\'world!\')])\n>>> from pandoc.types import Str\n>>> paragraph[0][2] = Str(\'Python!\')\n>>> text = pandoc.write(doc)\n>>> print(text)\nHello Python!'), Para([Str('For'), Space(), Str('more'), Space(), Str('information,'), Space(), Str('refer'), Space(), Str('to'), Space(), Str('the'), Space(), Link(('', [], []), [Str('๐Ÿ“–'), Space(), Str('documentation')], ('https://boisgera.github.io/pandoc', '')), Str('.')])])

then perform whatever analysis/transformation you want on the document, and finally get the corresponding markdown:

>>> markdown = pandoc.write(doc)  # or use _ = pandoc.write(doc, file="README.md") to write to README.md
>>> print(markdown)
# Pandoc -- ๐Ÿ Python Library

![Python](https://img.shields.io/pypi/pyversions/pandoc.svg) [![PyPI
version](https://img.shields.io/pypi/v/pandoc.svg)](https://pypi.python.org/pypi/pandoc)
[![Mkdocs](https://img.shields.io/badge/doc-mkdocs-845ed7.svg)](https://boisgera.github.io/pandoc)
[![GitHub
discussions](https://img.shields.io/badge/discuss-online-845ef7)](https://github.com/boisgera/pandoc/discussions)
[![Downloads](https://pepy.tech/badge/pandoc)](https://pepy.tech/project/pandoc)
[![GitHub
stars](https://img.shields.io/github/stars/boisgera/pandoc?style=flat)](https://github.com/boisgera/pandoc/stargazers)
[![build](https://github.com/boisgera/pandoc/actions/workflows/build.yml/badge.svg)](https://github.com/boisgera/pandoc/actions/workflows/build.yml)

## ๐Ÿš€ Getting started

[Pandoc](https://pandoc.org/) -- the general markup converter (and
Haskell library) written by [John
MacFarlane](https://johnmacfarlane.net/) -- needs to be available. You
may follow the official [installation
instructions](https://pandoc.org/installing.html) or use
[conda](https://docs.conda.io):

    $ conda install -c conda-forge pandoc

Then, install the latest stable version of the pandoc Python library
with pip:

    $ pip install --upgrade pandoc

## ๐ŸŒŒ Overview

This project brings [Pandoc](https://pandoc.org/)'s data model for
markdown documents to Python:

    $ echo "Hello world!" | python -m pandoc read 
    Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])

It can be used to analyze, create and transform documents, in Python :

    >>> import pandoc
    >>> text = "Hello world!"
    >>> doc = pandoc.read(text)
    >>> doc
    Pandoc(Meta({}), [Para([Str('Hello'), Space(), Str('world!')])])

    >>> paragraph = doc[1][0]
    >>> paragraph
    Para([Str('Hello'), Space(), Str('world!')])
    >>> from pandoc.types import Str
    >>> paragraph[0][2] = Str('Python!')
    >>> text = pandoc.write(doc)
    >>> print(text)
    Hello Python!

For more information, refer to the [๐Ÿ“–
documentation](https://boisgera.github.io/pandoc).