ixmatus/orgmode-parse

Markup parsers should only consider markup on word boundaries

Opened this issue ยท 16 comments

Discovered while testing the HyperLink parser. The parser will incorrectly parse the following:

/[[https://orgmode.org/manual/Link-format.html][The Org Manual: Link format]]/

... as:

Right [Paragraph [Italic [Plain "[[https:"],Italic [Plain "orgmode.org"],Plain "manual",Italic [Plain "Link-format.html][The Org Manual: Link format]]"]]]

This should be easy to fix since formatting markup is only treated as such if the beginning sentinel character is preceded by whitespace and followed by a non-whitespace character.

CC: @zhujinxuan (only CC'ing to let you know, I have a branch with a fix already in-place I haven't pushed it yet because I want to add more thorough tests).

Hi, I think we shall write Hyperlink parser like the LaTeX parser. The elements inside the [] shall be considered as Text rather than Markup Text

@zhujinxuan I agree with you. However this is still a problem with the markup parser as it considers some text as "marked up" that org-mode's fontification does not. I have the latter fixed and I will also implement your suggestion.

@ixmatus I think we can guard that by typing. If we define

data Markup a = LaTeX Text

Then we will not need to worry about whether the content of LaTeX is parsed as markup.

I don't think that's a problem. I mean that:

/http://someurl.com//

... is parsed incorrectly. Disregarding org-mode hyperlink markup syntax, we expect /http://someurl.com// to parse to an Italic [ Plain "http://someurl.com/" ] however it parses into the following:

Paragraph
[ Italic [ Plain "http:"]
, Italic [Plain "someurl.com"]
, Plain "/"
]

As an example, some of the tests demonstrate incorrect behavior too, for instance:

*text *

... should not parse as Bold [ Plain "text" ] but as Plain "*text *".

@ixmatus Do you have a document of orgmode markup syntax? It seems many corner cases are not documented in https://orgmode.org/manual/Markup.html

@ixmatus I agree. I tested in emacs-org. I am wondering shall we consider * test * as marked?
screenshot 2018-11-26 16 59 16

No, I don't think we should. I think we should follow org-mode's fontification behavior and treat * text * as plain text (that is what my stashed change does now).

I'm finding lots of corner cases (by adding tests) that we didn't account for in the markup parser that I need to resolve before I can push up my work.

@ixmatus Can you open up a PR with some of the tests?

I will when I clean up some of my experiments :) I will probably get to it throughout the week, no stress!

I haven't had much free-time to finish this up but I do have free-time coming up for the holidays and I will be working on this then.

@ixmatus Hi, are you working on this recently? If not, I will begin the fix in next Sat (Mar 30th)

@zhujinxuan my real life and job have taken an intense turn. I won't get to this until late May now so if you're able to then that would be great.

Thank you.