Link Conversion Fails with Preceding or Following Square Brackets
syntaxsurge opened this issue · 2 comments
Issue Description
I've encountered an issue with the markdown2 library where links are not properly converted to HTML when there are square brackets immediately before or after the link syntax. This issue disrupts the expected formatting and link functionality in the converted Markdown text.
Steps to Reproduce
- Use the markdown2 library to convert a Markdown string that includes a link surrounded by square brackets.
- Observe that the link is not correctly converted into an HTML anchor tag.
Sample Python Code to Reproduce the Issue
The following Python code snippet can be used to reproduce the issue with the markdown2
library:
import markdown2
# Sample Markdown string with a link surrounded by square brackets
markdown_text = '''
[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3)
[after]
'''
# Convert Markdown to HTML
html_output = markdown2.markdown(markdown_text, extras=['tables', 'footnotes', 'markdown-in-html', 'cuddled-lists'])
# Print the HTML output
print("HTML Output:")
print(html_output)
Expected Result
The link in the Markdown string should be converted into an HTML anchor tag, producing an output similar to:
<p>[before]
<a href="https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3">Triggers of Alarm Systems</a>
[after]</p>
Actual Result
The actual output keeps the Markdown link syntax without converting it into an HTML anchor tag:
<p>[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3)
[after]</p>
This code snippet should help in replicating the issue for troubleshooting and resolving the problem.
Additional Context
This issue seems to be specific to cases where square brackets immediately surround the Markdown link syntax. Removing the surrounding square brackets results in correct HTML conversion.
Additional Context: Challenges with 'markdown-in-html' Extra and Dynamic Content
I attempted to use the markdown-in-html
extra provided by markdown2
to address this issue. However, this approach is not ideal for dynamic content due to the need for multiple processing steps. This method leads to two significant problems:
1. Unintended Conversions with Dynamic Content
When working with dynamic content, processing all HTML tags twice can lead to unexpected conversions. Specifically, there are instances where text that is meant to be included literally (as part of the text, not as Markdown) gets incorrectly converted into HTML. This unintended conversion distorts the intended output and complicates the handling of dynamic content.
Example of Unintended Conversion Issue
Consider the following Markdown input and its conversion process:
Markdown Input:
[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3)
[after]
First Conversion to HTML:
<p>[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3)
[after]</p>
Adding Class for Markdown Processing:
<p markdown="1">[before]
[Triggers of Alarm Systems](https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3)
[after]</p>
2. Inconsistent HTML Structure with Nested Paragraph Tags
The use of markdown-in-html
results in an inconsistent HTML structure. Specifically, when a link is converted within a paragraph (<p>
tag), it generates a new paragraph tag for the link. This creates nested paragraph tags, which is not standard HTML practice and can lead to display and styling issues.
Example of Nested Paragraph Tag Issue
After adding the class for Markdown processing, the output is:
Processed HTML Output:
<p>[before]
<p><a href="https://www.youtube.com/watch?v=aJOTlE1K90k&list=RDGMEMQ1dJ7wXfLlqCjwV0xfSNbA&index=3">Triggers of Alarm Systems</a></p>
[after]</p>
Here, the link is wrapped in its own paragraph tag, creating a nested structure within the original paragraph tag. This is not inline and disrupts the intended flow and structure of the content.
When working with dynamic content, processing all HTML tags twice can lead to unexpected conversions
Were you referring to the &
-> &
conversions in the snippet? Not 100% sure on what the issue is
The use of
markdown-in-html
results in an inconsistent HTML structure.
I believe the way this works internally is we take the text inside the tags and run the snippet through the markdown parser, which includes forming paragraphs. Not sure how we would avoid the nested paragraphs issue. We could turn off paragraph forming when markdown="1"
is attached to a <p>
tag? Or maybe add a postprocess that "flattens" a level of paragraphs?
Is it possible to wrap the content in <div>
instead?