baynezy/Html2Markdown

Span tags with styling unmodified

Closed this issue · 3 comments

Issue

Span tags in the HTML pass through the converter unmodified.

Expected

The span, or at least the "style" or "class" is removed.

Actual

No change to the span

Original

<p>this is <span style="color:red;">red text</span></p>

Expected Conversion

<p>this is <span>red text</span></p>

@EltonInAtlanta - thanks for your interest in this project. I understand you want Html2Markdown to sanitise the span tag to remove any style elements, or to remove the span. Now span is not part of the Markdown spec. So this means there is no Markdown equivalent to convert it to. Also as stated in the spec:

Span-level HTML tags — e.g. <span>, <cite>, or <del> — can be used anywhere in a Markdown paragraph, list item, or header. If you want, you can even use HTML tags instead of Markdown formatting; e.g. if you’d prefer to use HTML <a> or <img> tags instead of Markdown’s link or image syntax, go right ahead.

So I don't see how removing span or modifying them honours converting HTML to Markdown. Can you explain your thinking?

@baynezy Thanks for responding. I'm coming from a different and perhaps invalid mindset. I'm converting existing HTML from a user-entered dialog to markdown. The old client was an old ASP.Net site that handled the HTML. The new client is Angular and it's not so happy with injected HTML. The existing data is very span-happy (bold! red! huge!). I was looking more for a markdown sanitizer perhaps - a mode that would only emit plain text and markdown tags. I cloned the project and it should be easy to extend it for what I need to do in this limited case. Thanks.

@EltonInAtlanta if you want to do this then the easiest way would be to implement IScheme and extend Markdown with your custom HTML handling.

Then you can instantiate Converter like so:-

IScheme customScheme = new CustomMarkdownScheme();
Converter converter = new Converter(customScheme);

You can read more about it in the README