boisgera/pandoc

Support Pandoc 3.0

Closed this issue · 8 comments

Pandoc 3.0 and 3.0.1 are associated to pandoc-types 1.23 (>= 1.23, < 1.24) . The new document model is:

data Pandoc = Pandoc !Meta ![Block]
newtype Meta = Meta {unMeta :: Map Text MetaValue}
data MetaValue
  = MetaMap !(Map Text MetaValue)
  | MetaList ![MetaValue]
  | MetaBool !Bool
  | MetaString !Text
  | MetaInlines ![Inline]
  | MetaBlocks ![Block]
type ListAttributes = (Int, ListNumberStyle, ListNumberDelim)
data ListNumberStyle
  = DefaultStyle
  | Example
  | Decimal
  | LowerRoman
  | UpperRoman
  | LowerAlpha
  | UpperAlpha
data ListNumberDelim = DefaultDelim | Period | OneParen | TwoParens
type Attr = (Text, [Text], [(Text, Text)])
newtype Format = Format Text
newtype RowHeadColumns = RowHeadColumns Int
data Alignment
  = AlignLeft | AlignRight | AlignCenter | AlignDefault
data ColWidth = ColWidth !Double | ColWidthDefault
type ColSpec = (Alignment, ColWidth)
data Row = Row !Attr ![Cell]
data TableHead = TableHead !Attr ![Row]
data TableBody = TableBody !Attr !RowHeadColumns ![Row] ![Row]
data TableFoot = TableFoot !Attr ![Row]
type ShortCaption = [Inline]
data Caption = Caption !(Maybe ShortCaption) ![Block]
data Cell = Cell !Attr !Alignment !RowSpan !ColSpan ![Block]
newtype RowSpan = RowSpan Int
newtype ColSpan = ColSpan Int
data Block
  = Plain ![Inline]
  | Para ![Inline]
  | LineBlock ![[Inline]]
  | CodeBlock !Attr !Text
  | RawBlock !Format !Text
  | BlockQuote ![Block]
  | OrderedList !ListAttributes ![[Block]]
  | BulletList ![[Block]]
  | DefinitionList ![([Inline], [[Block]])]
  | Header !Int !Attr ![Inline]
  | HorizontalRule
  | Table !Attr
          !Caption
          ![ColSpec]
          !TableHead
          ![TableBody]
          !TableFoot
  | Figure !Attr !Caption ![Block]
  | Div !Attr ![Block]
data QuoteType = SingleQuote | DoubleQuote
type Target = (Text, Text)
data MathType = DisplayMath | InlineMath
data Inline
  = Str !Text
  | Emph ![Inline]
  | Underline ![Inline]
  | Strong ![Inline]
  | Strikeout ![Inline]
  | Superscript ![Inline]
  | Subscript ![Inline]
  | SmallCaps ![Inline]
  | Quoted !QuoteType ![Inline]
  | Cite ![Citation] ![Inline]
  | Code !Attr !Text
  | Space
  | SoftBreak
  | LineBreak
  | Math !MathType !Text
  | RawInline !Format !Text
  | Link !Attr ![Inline] !Target
  | Image !Attr ![Inline] !Target
  | Note ![Block]
  | Span !Attr ![Inline]
data Citation
  = Citation {citationId :: !Text,
              citationPrefix :: ![Inline],
              citationSuffix :: ![Inline],
              citationMode :: !CitationMode,
              citationNoteNum :: !Int,
              citationHash :: !Int}
data CitationMode = AuthorInText | SuppressAuthor | NormalCitation

Very little changes overall:

  1. ! is a strictness flag that doesn't matter for us, we can filter it out.
  2. No more Null block.
  3. A new figure block : Figure Attr Caption [Block].

Need to update:

  • the documentation: API + Pandoc's Markdown.

  • the list of extensions that are recognized (e.g. Jupyter notebooks).

  • add a cookbook for notebooks (get rid of the old, manual one? Or keep it?)

yeus commented

First great library! Thank you. I am playing around with it right now and have this issue:

When using the pandoc.write command:

ERROR:pydoxtools.document_base:problem with extractor 'full_text'
Traceback (most recent call last):
  File "/home/dev/git/pydoxtools/pydoxtools/document_base.py", line 513, in x
    res = extractor_func._mapped_call(self, *args, config_params=params, **kwargs)
  File "/home/dev/git/pydoxtools/pydoxtools/document_base.py", line 157, in _mapped_call
    output = self(*args, **mapped_kwargs)
 
## ▼▼▼▼▼▼▼▼▼ pandoc-relevant part ▼▼▼▼▼▼▼▼▼:


  File "/home/dev/git/pydoxtools/pydoxtools/extract_pandoc.py", line 64, in __call__
    full_text = pandoc.write(pandoc_document, format=output_format)
  File "/home/dev/.cache/pypoetry/virtualenvs/pydoxtools-UuJZOkke-py3.10/lib/python3.10/site-packages/pandoc/__init__.py", line 355, in write
    pandoc(options)
  File "/home/dev/.cache/pypoetry/virtualenvs/pydoxtools-UuJZOkke-py3.10/lib/python3.10/site-packages/plumbum/commands/base.py", line 113, in __call__
    return self.run(args, **kwargs)[1]
  File "/home/dev/.cache/pypoetry/virtualenvs/pydoxtools-UuJZOkke-py3.10/lib/python3.10/site-packages/plumbum/commands/base.py", line 252, in run
    return p.run()
  File "/home/dev/.cache/pypoetry/virtualenvs/pydoxtools-UuJZOkke-py3.10/lib/python3.10/site-packages/plumbum/commands/base.py", line 215, in runner
    return run_proc(p, retcode, timeout)
  File "/home/dev/.cache/pypoetry/virtualenvs/pydoxtools-UuJZOkke-py3.10/lib/python3.10/site-packages/plumbum/commands/processes.py", line 304, in run_proc
    return _check_process(proc, retcode, timeout, stdout, stderr)
  File "/home/dev/.cache/pypoetry/virtualenvs/pydoxtools-UuJZOkke-py3.10/lib/python3.10/site-packages/plumbum/commands/processes.py", line 17, in _check_process
    proc.verify(retcode, timeout, stdout, stderr)
  File "/home/dev/.cache/pypoetry/virtualenvs/pydoxtools-UuJZOkke-py3.10/lib/python3.10/site-packages/plumbum/machines/base.py", line 27, in verify
    raise ProcessExecutionError(
plumbum.commands.processes.ProcessExecutionError: Unexpected exit code: 64
Command line: | /usr/bin/pandoc -t markdown -o /tmp/tmph_3tsfz8/output -f json /tmp/tmph_3tsfz8/input.js
Stderr:       | JSON parse error: Error in $: Incompatible API versions: encoded with [1,22,2,1] but attempted to decode with [1,23].

All I am doing is calling the

on a previously opened document.

pandoc_document = pandoc.read(raw_content, format="docx") 
pandoc.write(pandoc_document, format="markdown")

It works when installing pandoc https://github.com/jgm/pandoc/releases/tag/2.19.2

but if I use this version: https://github.com/jgm/pandoc/releases/tag/3.0.1

I get the error. So I am not sure, if its related to the python library, or pandoc itself?

yeus commented

(forgot to mention, that I am running the latest version of your library 2.3 on an ubuntu 22.04 system)

Hi @yeus! 👋

Thanks for the report! 🙏 It's probably an issue with the python library, since the 2.3 version of this project (the latest available on PyPi) does not support pandoc 3.x yet (see https://boisgera.github.io/pandoc/changelog/).

Last time I had some free time for the project, the conda packaging of pandoc 3.x was not ready, which was a (soft) blocker for me ; I have started to take into account the new document model in the main branch, but nothing has been tested.

Now pandoc 3.1.1 is available for conda and I should be able to make some progress on the tests and release on PyPi.

Cheers,

SB

Hello @yeus,

Could you try again with the current (beta) version of the Pandoc Python Library on PyPi?

$ pip install pandoc==2.4b0

Cheers,

SB

yeus commented

Hello @yeus,

Could you try again with the current (beta) version of the Pandoc Python Library on PyPi?

$ pip install pandoc==2.4b0

Cheers,

SB

will do! will come back in a couple days, as soon as I had the time! thanks for helping out there!

yeus commented

Hello @yeus,

Could you try again with the current (beta) version of the Pandoc Python Library on PyPi?

$ pip install pandoc==2.4b0

Cheers,

SB

Hi, seems to work! 👍

here the versions I am using :

pip install pandoc==2.4b0

pandoc.configuration:

{'auto': True,
'path': '/usr/bin/pandoc',
'version': '3.1.2',
'pandoc_types_version': '1.23'}

python 3.10

Thank you!