boisgera/pandoc

Docx list items with tables indents text in each cell to align with list nesting

Closed this issue · 2 comments

import pandoc

ast = Pandoc(Meta({'title': MetaString('Listed Tables')}), [OrderedList((1, Decimal(), Period()), [[Plain([Str('First'), Space(), Str('item')]), OrderedList((1, Decimal(), Period()), [[Plain([Str('First'), Space(), Str('sub-item')])]]), Para([]), Table(('', [], []), Caption(None, []), [(AlignDefault(), ColWidthDefault()), (AlignDefault(), ColWidthDefault())], TableHead(('', [], []), []), [TableBody(('', [], []), RowHeadColumns(0), [], [Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('dsfdfa')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('asgdsagd')])])]), Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('gsagdsaf')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('gdsagdsa')])])]), Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('gdsagdsag')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('gdsagdsa')])])])])], TableFoot(('', [], []), []))], [Plain([Str('Second'), Space(), Str('item')]), OrderedList((1, Decimal(), Period()), [[Plain([Str('First'), Space(), Str('sub-ite')])]]), Para([]), Table(('', [], []), Caption(None, []), [(AlignDefault(), ColWidthDefault()), (AlignDefault(), ColWidthDefault())], TableHead(('', [], []), []), [TableBody(('', [], []), RowHeadColumns(0), [], [Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('SAGEQRHEWR')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('KDHKGLAD')])])]), Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('iopetueoptr')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('dsknaiw')])])]), Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('oirenvl')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('dfsipwiunec')])])])])], TableFoot(('', [], []), []))]])])
document = pandoc.write(ast, format="docx", options=['--standalone'])
if isinstance(document, bytes):
        file_mode = "wb"
    else:
        file_mode = "w"
with open("page.docx", file_mode) as f:
        f.write(document)

Will output a file like this:
current.docx

As you can see, each cell indents the text like it's a list item itself. I expected this output:
expected.docx

Hi @sHermanGriffiths!

I tried to use the pandoc CLI to get the pandoc ast (in json instead of Python) of your expected docx document, then generate the corresponding docx document.

$ pandoc -o expected.json expected.docx
$ pandoc -o expected-roundtrip.docx expected.json

I see the same kind of difference between expected.docx and expected-roundtrip.docx that the one you were pointing out. So I guess that something in the description of docx tables is "lost in translation" with the original pandoc cli tool, or the pandoc docx writer has an issue ; the behavior of the pandoc Python library is merely consistent with that.

I don't know if this is to be expected given the pandoc table model or would be considered a bug of the original pandoc project (the pandoc table specification is quite complex and I have very little knowledge of the docx format). I guess that the next step would be to ask this question to https://groups.google.com/g/pandoc-discuss.

Are you ok with that?

Best regards,

Sébastien

@boisgera sounds good; sorry, I didn't check for that first. Thanks so much for this library! The company I work for is building a whole platform around it!