marhop/pandoc-unicode-math

Expanded UnicodeMath support

bwiernik opened this issue · 4 comments

Is there any possibility you might expand this plugin to support UnicodeMath syntax more broadly? I really prefer the nearly-plain-text syntax of Unicode, especially for things like fractions, to LaTeX.

Hi,

Thanks for the link to the Unicode Technical Note, I did not know about this! This is a really nice idea and it feels like the perfect logical consequence of this filter.

I fear it is a little out of scope at the moment though, mostly for technical reasons. This filter currently only uses a very simple implementation that reads exactly one character at a time (no further context!), looks it up in a translation table and if found replaces it by the corresponding Latex command. Implementing UnicodeMath as specified in the Tech Note would instead require a real parser that is capable of handling more complex syntactic constructs like (2+3)/5.

I'm not saying I won't think about it (because I probably will) but you should not expect anything close to a working solution anytime soon ... Sorry.

Best,
Martin

It occurs to me that pandoc already has parsers for the Microsoft Office implementation of UnicodeMath (labeled as ​readOMML and writeOMML). Do you know if it is possible to call those functions from a filter?

Yeah, that should be possible by importing the texmath library (GitHub/API docs) in which these functions are implemented in a Pandoc filter.

I fear though that this is not really what you are looking for. OMML is not the same as UnicodeMath, but an XML dialect (remotely similar to MathML) that is used by Microsoft Word for the definition of math structures. From a slightly more recent version of the Unicode Technical Note, page 3:

In Word, the structures are defined in OMML (Office MathML) and built up by Word, while for the other apps, the structures are defined in UnicodeMath and built up by RichEdit.

I guess that means while you can enter UnicodeMath in Word (you can, right? I don't know) it is stored as OMML internally. That's why Pandoc does not need to read/write UnicodeMath but "only" OMML to process Word documents.

Ah, got it. The writings of the UnicodeMath author about this are somewhat hard to follow regarding exactly what everything is.