Docx list items with tables indents text in each cell to align with list nesting
Closed this issue · 2 comments
import pandoc
ast = Pandoc(Meta({'title': MetaString('Listed Tables')}), [OrderedList((1, Decimal(), Period()), [[Plain([Str('First'), Space(), Str('item')]), OrderedList((1, Decimal(), Period()), [[Plain([Str('First'), Space(), Str('sub-item')])]]), Para([]), Table(('', [], []), Caption(None, []), [(AlignDefault(), ColWidthDefault()), (AlignDefault(), ColWidthDefault())], TableHead(('', [], []), []), [TableBody(('', [], []), RowHeadColumns(0), [], [Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('dsfdfa')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('asgdsagd')])])]), Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('gsagdsaf')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('gdsagdsa')])])]), Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('gdsagdsag')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('gdsagdsa')])])])])], TableFoot(('', [], []), []))], [Plain([Str('Second'), Space(), Str('item')]), OrderedList((1, Decimal(), Period()), [[Plain([Str('First'), Space(), Str('sub-ite')])]]), Para([]), Table(('', [], []), Caption(None, []), [(AlignDefault(), ColWidthDefault()), (AlignDefault(), ColWidthDefault())], TableHead(('', [], []), []), [TableBody(('', [], []), RowHeadColumns(0), [], [Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('SAGEQRHEWR')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('KDHKGLAD')])])]), Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('iopetueoptr')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('dsknaiw')])])]), Row(('', [], []), [Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('oirenvl')])]), Cell(('', [], []), AlignDefault(), RowSpan(1), ColSpan(1), [Plain([Str('dfsipwiunec')])])])])], TableFoot(('', [], []), []))]])])
document = pandoc.write(ast, format="docx", options=['--standalone'])
if isinstance(document, bytes):
file_mode = "wb"
else:
file_mode = "w"
with open("page.docx", file_mode) as f:
f.write(document)
Will output a file like this:
current.docx
As you can see, each cell indents the text like it's a list item itself. I expected this output:
expected.docx
I tried to use the pandoc CLI to get the pandoc ast (in json instead of Python) of your expected docx document, then generate the corresponding docx document.
$ pandoc -o expected.json expected.docx
$ pandoc -o expected-roundtrip.docx expected.json
I see the same kind of difference between expected.docx
and expected-roundtrip.docx
that the one you were pointing out. So I guess that something in the description of docx tables is "lost in translation" with the original pandoc cli tool, or the pandoc docx writer has an issue ; the behavior of the pandoc Python library is merely consistent with that.
I don't know if this is to be expected given the pandoc table model or would be considered a bug of the original pandoc project (the pandoc table specification is quite complex and I have very little knowledge of the docx format). I guess that the next step would be to ask this question to https://groups.google.com/g/pandoc-discuss.
Are you ok with that?
Best regards,
Sébastien
@boisgera sounds good; sorry, I didn't check for that first. Thanks so much for this library! The company I work for is building a whole platform around it!