Markup parsers should only consider markup on word boundaries
Opened this issue ยท 16 comments
Discovered while testing the HyperLink parser. The parser will incorrectly parse the following:
/[[https://orgmode.org/manual/Link-format.html][The Org Manual: Link format]]/
... as:
Right [Paragraph [Italic [Plain "[[https:"],Italic [Plain "orgmode.org"],Plain "manual",Italic [Plain "Link-format.html][The Org Manual: Link format]]"]]]
This should be easy to fix since formatting markup is only treated as such if the beginning sentinel character is preceded by whitespace and followed by a non-whitespace character.
CC: @zhujinxuan (only CC'ing to let you know, I have a branch with a fix already in-place I haven't pushed it yet because I want to add more thorough tests).
Hi, I think we shall write Hyperlink parser like the LaTeX parser. The elements inside the [] shall be considered as Text
rather than Markup Text
@zhujinxuan I agree with you. However this is still a problem with the markup parser as it considers some text as "marked up" that org-mode's fontification does not. I have the latter fixed and I will also implement your suggestion.
@ixmatus I think we can guard that by typing. If we define
data Markup a = LaTeX Text
Then we will not need to worry about whether the content of LaTeX is parsed as markup.
I don't think that's a problem. I mean that:
/http://someurl.com//
... is parsed incorrectly. Disregarding org-mode hyperlink markup syntax, we expect /http://someurl.com//
to parse to an Italic [ Plain "http://someurl.com/" ]
however it parses into the following:
Paragraph
[ Italic [ Plain "http:"]
, Italic [Plain "someurl.com"]
, Plain "/"
]
As an example, some of the tests demonstrate incorrect behavior too, for instance:
*text *
... should not parse as Bold [ Plain "text" ]
but as Plain "*text *"
.
@ixmatus Do you have a document of orgmode markup syntax? It seems many corner cases are not documented in https://orgmode.org/manual/Markup.html
@ixmatus I agree. I tested in emacs-org. I am wondering shall we consider * test *
as marked?
No, I don't think we should. I think we should follow org-mode's fontification behavior and treat * text *
as plain text (that is what my stashed change does now).
I'm finding lots of corner cases (by adding tests) that we didn't account for in the markup parser that I need to resolve before I can push up my work.
@ixmatus Can you open up a PR with some of the tests?
I will when I clean up some of my experiments :) I will probably get to it throughout the week, no stress!
I haven't had much free-time to finish this up but I do have free-time coming up for the holidays and I will be working on this then.
@ixmatus Hi, are you working on this recently? If not, I will begin the fix in next Sat (Mar 30th)
@zhujinxuan my real life and job have taken an intense turn. I won't get to this until late May now so if you're able to then that would be great.
Thank you.