Should not use markdown-escaping inside of HTML-syntax
EvitanRelta opened this issue · 2 comments
The problem
Currently, this HTML:
<p align="center">
<tag>
</p>
converts to:
<p align="center">
\<tag>
</p>
which incorrectly uses markdown's backslash-escaping, instead of HTML's <
escaping.
Edge cases
Most of the time, while inside HTML tags, markdown-syntax (including backslash escaping) doesn't work.
However, there are times when it does, specifically in tags which are:
- In-line (eg. text-formattings
<em>
/<code>
&span
) - are in a single-line in the markdown
For example, these markdown-syntax containing tags render properly:
<code>\<tag> \ **Bold**</code>
<sup>\<tag> \ **Bold**</sup>
<span>\<tag> \ **Bold**</span>
Rendered as:
<tag> Bold
<tag> Bold
<tag> Bold
But when they are broken up into multi-lines, the markdown-syntax stop working:
<code>
\<tag> \ **Bold**
</code>
<sup>
\<tag> \ **Bold**
</sup>
<span>
\<tag> \ **Bold**
</span>
Rendered as:
\ \ **Bold**
\ \ **Bold**
\ \ **Bold**
To keep the spirit of keeping output as readable as possible, with as little HTML as possible,
instead of just escaping the usual 5 characters for HTML (ie. &
, <
, >
, "
, '
),
or even the 3 main characters (ie. &
, <
, >
),
I propose to escape a character based on what's around it.
For example:
<p forcehtml><div></p>
<p forcehtml>I <3 Justin Bieber</p>
<p forcehtml>Cookies & cream</p>
<p forcehtml>Empty ampersand escape: &;</p>
would be converted to:
<p><div></p>
<p>I <3 Justin Bieber</p>
<p>Cookies & cream</p>
<p>Empty ampersand escape: &;</p>
Which properly renders as:
<div>
I <3 Justin Bieber
Cookies & cream
Empty ampersand escape: &;
Update:
Turns out there are more rules than I though on which characters must be escaped.
For example, this:
<p>"&#xA": 
</p>
<p>"&#<!--GH_ISSUE_AUTOLINK_BUSTER-->3": </p>
<p>"</>": </></p>
<p>"<?>": <?></p>
<p>"<!>": <!></p>
Renders in Github as:
"
":
"": �
"</>":
"<?>":
"<!>":
I've settled on these 2 regex:
/&(?=#[0-9]|#x\w|\w)/g,
which escapes to: "&"
/<(?=[!?/a-z])/gi
which escapes to: "<"
Then maybe add an option to turn off this "conservative escaping" feature, to either escape the 3 characters (ie. &, <, >) or all 5 characters (ie. &, <, >, ", ').