matthewwithanm/python-markdownify

The unexpected line break occurred during the process of handling HTML

Closed this issue · 6 comments

When I was trying to convert a web page to Markdown, the original HTML <p> tags did not contain any line breaks. The expected behavior was for it to be output as a single Markdown paragraph, but when markdownify processed this paragraph, it inserted extra line breaks.

This is my code:

Code been removed

At the end of the output file, 1.md, you can observe the unexpected line break behavior.

I try to use

class CustomMarkdownConverter(MarkdownConverter):
    def convert_p(self, el, text, convert_as_inline):
        # For debug
        if "People" in text:
            pass
        return f"{text}\n\n"

But syill has extra line breaks

@Randark-JMT - can you please simplify this to a minimum-but-sufficient set of 2-3 commands that reproduces the behavior? Please take a look at other issues to see examples:

https://github.com/matthewwithanm/python-markdownify/issues?q=is%3Aissue

Sorry my bad, just give me some minutes to simplify this

With my debug, i think maybe it's not a bug

from markdownify import markdownify

content = """
<p>
       People often wonder how others become hackers (security consultants) or defenders (security analysts fighting
        cybercrime), and the answer is simple. Break it down, learn an area of cyber security you're interested in, and
        regularly practice using hands-on exercises. Build a habit of learning a little bit each day on TryHackMe, and
        you'll acquire the knowledge to get your first job in the industry.
      </p>"""

print(markdownify(content))

Now I want to now how to convert the <p> HTML content to markdown without extra \n

@Randark-JMT - the extra newlines are fixed by pull request #181, which will be available in the next release of Markdownify.

So cool, thank you for your pull request, I will wait to try the next release