matthewwithanm/python-markdownify

[bug] Markdownify fails to parse `<img>` tags inside `<h5>` elements correctly

Closed this issue · 1 comments

When using Markdownify to convert HTML to Markdown, images within <h5> elements are not correctly parsed as Markdown image links. Instead, the content of the <h5> tag is treated as a header, and the <img> tag is ignored.

Reproduction:

from markdownify import markdownify as md

# Example 1: `<img>` inside `<h5>` element
html_string_with_h5 = """
<h5><img src="https://sample_image" alt="Example Image"></h5>
"""
markdown_with_h5 = md(html_string_with_h5)
print(markdown_with_h5)

Output:

##### Example Image

Expected Output:

##### ![Example Image](https://sample_image)

However, if the <img> tag is outside of the <h5> element, Markdownify successfully converts it to a Markdown image link:

html_string_without_h5 = """
<img src="https://sample_image" alt="Example Image">
"""
markdown_without_h5 = md(html_string_without_h5)
print(markdown_without_h5)

Output:

![Example Image](https://sample_image)

Hey! This was fixed in #61 by adding the keep_inline_images_in parameter. You can use

markdown_with_h5 = md(html_string_with_h5, keep_inline_images_in=['h5'])