The unexpected line break occurred during the process of handling HTML
Closed this issue · 6 comments
When I was trying to convert a web page to Markdown, the original HTML <p> tags did not contain any line breaks. The expected behavior was for it to be output as a single Markdown paragraph, but when markdownify processed this paragraph, it inserted extra line breaks.
This is my code:
Code been removedAt the end of the output file, 1.md, you can observe the unexpected line break behavior.
I try to use
class CustomMarkdownConverter(MarkdownConverter):
def convert_p(self, el, text, convert_as_inline):
# For debug
if "People" in text:
pass
return f"{text}\n\n"But syill has extra line breaks
@Randark-JMT - can you please simplify this to a minimum-but-sufficient set of 2-3 commands that reproduces the behavior? Please take a look at other issues to see examples:
https://github.com/matthewwithanm/python-markdownify/issues?q=is%3Aissue
Sorry my bad, just give me some minutes to simplify this
With my debug, i think maybe it's not a bug
from markdownify import markdownify
content = """
<p>
People often wonder how others become hackers (security consultants) or defenders (security analysts fighting
cybercrime), and the answer is simple. Break it down, learn an area of cyber security you're interested in, and
regularly practice using hands-on exercises. Build a habit of learning a little bit each day on TryHackMe, and
you'll acquire the knowledge to get your first job in the industry.
</p>"""
print(markdownify(content))Now I want to now how to convert the <p> HTML content to markdown without extra \n
@Randark-JMT - the extra newlines are fixed by pull request #181, which will be available in the next release of Markdownify.
So cool, thank you for your pull request, I will wait to try the next release