JohannesKaufmann/html-to-markdown

Configure elements to keep in `<code>`

zrcoder opened this issue · 1 comments

Describe the bug
A clear and concise description of what the bug is.

HTML Input

<p>The ordinal number "fifth" can be abbreviated in various languages as follows:</p>
<ul>
	<li><code>English: 5<sup>th</sup></code></li>
	<li>French: 5<sup>ème</sup></li>
</ul>

Generated Markdown

The ordinal number "fifth" can be abbreviated in various languages as follows:

- `English: 5th`
- French: 5<sup>ème</sup>

Expected Markdown

The ordinal number "fifth" can be abbreviated in various languages as follows:

- `English: 5<sup>th</sup>`
- French: 5<sup>ème</sup>

Additional context
I use NewConverter("", true, nil).Keep("sup") to convert.

Here is what the HTML would be displayed in the browser:

Screenshot 2022-12-30 at 13 02 42

The markdown representation `English: 5th` would render English: 5th,
while the markdown representation `English: 5<sup>th</sup>` would render English: 5<sup>th</sup>.

As such, the current implementation is closest to what the browser would display.


The Keep() function is for normal HTML elements, not for elements inside a code block. So for example, for keeping a <sup> element inside a paragraph.

If many people need it, one could add something like a KeepInsideCode function. But not planned for now...


@zrcoder If you want something different than the current implementation, you can write your own rule. Let me know if you encounter any problems with this...