snarfed/bridgy

github publish: some markup conversion failures

sknebel opened this issue · 9 comments

I just used a fairly complex GH issue (that I had written on GH) to test POSSE-ing to a test repo:
https://www.svenknebel.de/posts/2018/11/8/ to sknebel/random-test-repo#1

The HTML in my post is a cleaned up version of Githubs HTML (with mentions of users and other issues removed to cut down the noise)

The following things in the created issue were unexpected:

  • the nested lists didn't convert correctly
  • < > were added around bare links
  • in the "output format" section, a space was added before the italicized "not")
  • the JSON code example was cut, but I had forgotten to escape a < - the browser still displayed the code following, probably because <------ clearly wasn't valid HTML tag, but it is understandable bridgy (or maybe even mf2py?) failed there (I have since edited the post to use a &lt;)

EDIT: feel free to move this issue to granary or ask me to split it up or ... - happy to help you as much as I can.

hey, thanks for the report! yeah, converting HTML to markdown will often be imperfect, and mostly at the mercy of html2text, but i can take a look!

Just tested: lxml and mf2py handle the <----------- correctly (and escape the < in the html output of the e-content), which makes it surprising it doesn't make it through.

fixed the < and > around linked URLs.

the nested lists and space before italicized not are afaict bugs in html2text. i may narrow them down and file issues; we'll see.

It seems the nested list is something where the original markdown implementation and those based on it accept html2text's output (the markdown documentation doesn't appear to describe nested lists at all), but CommonMark, on which GitHub's markdown support is based, specified it explicitly in a way that requires a deeper indentation. Its specification has a section on this history: https://spec.commonmark.org/0.28/#motivation

Looks like the space before __not__ is this html2text bug: Alir3z4/html2text#324

I've filed Alir3z4/html2text#344 for the list bug, and a PR that fixes it in Alir3z4/html2text#345.

fixed! the first three at least, if not the <------ one. thanks for your patience!