TinoDidriksen/Transfuse

DOCX allows multiple w:t per w:r

TinoDidriksen opened this issue · 1 comments

Google Docs and presumably other editors may produce DOCX files with XML akin to <w:p><w:r><w:t>...</w:t><w:br/><w:br/><w:t>...</w:t><w:br/></w:r></w:p> - that is, each w:r can have multiple w:t and w:br intermingled.

MS Word itself will produce <w:p><w:r><w:t>...</w:t></w:r><w:r><w:br/></w:r><w:r><w:br/><w:t>...</w:t></w:r><w:r><w:br/></w:r></w:p> - that is, each w:r holds max one w:t

Unfortunately, the schema does allow multiple: http://www.datypic.com/sc/ooxml/e-w_r-2.html

(cf. apertium/apertium#110)

Probably also an issue for PPTX's a:t