baynezy/Html2Markdown

Escaped brackets &lt; and &;gt in the HTML source are converted to < and > in the output HTML.

Closed this issue · 4 comments

Escaped brackets < and &;gt in the HTML source are converted to < and > in the output HTML.

@msander1983 to clarify. You are saying that if you have &lt; in your HTML then you get a < in the resultant Markdown?

@baynezy Yes, exactly!

@msander1983 - so this is intentional and was implemented in #25

I can completely understand that there are use cases where this would be required. This is entirely possible with the current implementation by creating your own inplementation of IScheme. This is documented in the README. However, I acknowledge that would require duplicating the Html2Markdown.Scheme.Markdown class almost entirely. To this end I have implemented #107 which makes this very straightforward.

I will be publishing the new Nuget package 3.4.0 shortly. Let me know if you have any issues.

@msander1983 - I meant to give you an example sorry:-

If you create the following:-

public class CustomMarkdownScheme : AbstractScheme
	{
		public CustomMarkdownScheme()
		{
			AddReplacementGroup(_replacers, new TextFormattingReplacementGroup());
			AddReplacementGroup(_replacers, new HeadingReplacementGroup());
			AddReplacementGroup(_replacers, new IllegalHtmlReplacementGroup());
			AddReplacementGroup(_replacers, new LayoutReplacementGroup());
		}
	}

Then you can work like this:-

var scheme = new CustomMarkdownScheme();
var html = "Something to <strong>convert</strong> &lt;";
var converter = new Converter(scheme);
var markdown = converter.Convert(html);