matthewwithanm/python-markdownify

The strip_document option works on BeautifulSoup objects but not Tag objects

Opened this issue · 0 comments

The default value of the strip_document option is STRIP, which strips leading and trailing whitespace from the Markdown.

If I have a BeautifulSoup object:

import bs4
import markdownify

html = """
<html>
 <body>
  <div>
   <p>hello</p>
  </div>
 </body>
</html>"""
soup = bs4.BeautifulSoup(html, "lxml")

strip_document is applied for the BeautifulSoup object:

print(repr(markdownify.MarkdownConverter().convert_soup(soup)))
# 'hello'

but not for a Tag object:

print(repr(markdownify.MarkdownConverter().convert_soup(soup.find("html"))))
# '\n\n\nhello\n\n'