The strip_document option works on BeautifulSoup objects but not Tag objects
Opened this issue · 0 comments
chrispy-snps commented
The default value of the strip_document option is STRIP, which strips leading and trailing whitespace from the Markdown.
If I have a BeautifulSoup object:
import bs4
import markdownify
html = """
<html>
<body>
<div>
<p>hello</p>
</div>
</body>
</html>"""
soup = bs4.BeautifulSoup(html, "lxml")strip_document is applied for the BeautifulSoup object:
print(repr(markdownify.MarkdownConverter().convert_soup(soup)))
# 'hello'but not for a Tag object:
print(repr(markdownify.MarkdownConverter().convert_soup(soup.find("html"))))
# '\n\n\nhello\n\n'