jugglerchris/rust-html2text

Feature request: Comply with CommonMark specification

Opened this issue · 1 comments

Your library looks very promising. Unfortunately I can not use it because:

  1. html2text's output is not sufficiently CommonMark compliant yet,
  2. the HTML's metadata is not converted into a YAML metadata block (see pandoc's yaml_metadata_block)

Let me explain my use case more in detail: I tested if I could replace in my toolchain:

pandoc --standalone -f html -t markdown_strict+yaml_metadata_block+pipe_tables

with:

html2text

This would allow me to do this:

curl $(xclip -o)| thml2text | tp-note

and even integrate your library into tp-note. Then the above would look like this:

curl $(xclip -o) | tp-note

Tp-Note comes with a document viewer that renders the content with pulldown-cmark which is compliant with the CommonMark specification.

As the de facto official specification for Markdown is CommonMark, making Html2text compatible with it, would open a wider range of use cases (mine included).
Another advantage: CommonMark has a validation test suite.

What do you think?

Hi,

As you've noticed the text output is currently inspired by markdown rather than trying to be actually valid markdown.

However I do like the idea of a full markdown output mode, and CommonMark would seem to be a good choice. (I don't think I'd want markdown to be the default for the executable, but it'd be fine to add an option html2text --md or even a separate example program html2md).