&, < and >
jean-gui opened this issue · 4 comments
Is there a reason why &, < and > are not converted to &, < and >?
$converter = new HtmlConverter();
$html = "& < > " '";
$markdown = $converter->convert($html);
echo $markdown;
Expected result:
& < >
Actual result:
& < > " '
It's to ensure that converting the resulting HTML back into Markdown gives consistent results. Take the following HTML for example:
<p>> test</p>
<p>{</p>
<p><pre> test </pre></p>
If we didn't encode them, you'd end up with this Markdown:
> test
{
<pre> test </pre>
Which, if converted back into Markdown, would result in:
<blockquote><p>test</p></blockquote>
{
<pre> test </pre>
Which does not match the original HTML.
Where possible, this library tries to produce Markdown which, if run through league/commonmark
, would convert back to HTML that is as close to the original input as possible.
Thanks for your response, I understand the rationale.
Is there a way to change that behavior through config options? I'm using it with Symfony mailer to produce the text version of emails (see https://symfony.com/doc/4.4/mailer.html#text-content), so converting back to HTML is not useful in this specific use case. If not, I guess there should be way for me to str_replace those entities.
There's no built-in config option for that, but since we convert them using htmlspecialchars()
it shouldn't be too hard to write a little bit of code to convert those back once you get the Markdown back this library.
OK, thanks for the info. I'm closing this issue.