Issue with Table Conversion - Last Row Not Converted
syntaxsurge opened this issue · 1 comments
Hello markdown2
team,
I've encountered an issue with the markdown-to-HTML conversion process, specifically concerning tables. The last row of a markdown table isn't being converted correctly into HTML. Below, I've provided a sample of the markdown input and the resulting HTML output for your reference.
Markdown Input:
<p markdown="1"><strong>OpenAI's Growth Trajectory:</strong>
| Version | Parameters | Abilities |
|---------|------------|---------------------------------|
| GPT | 117M | Basic understanding of language |
| GPT-2 | 1.5B | More nuanced language processing|
| GPT-3 | 175B | Highly advanced AI capabilities |</p>
Expected HTML Output:
<p><strong>OpenAI's Growth Trajectory:</strong>
<table>
<thead>
<tr>
<th>Version</th>
<th>Parameters</th>
<th>Abilities</th>
</tr>
</thead>
<tbody>
<tr>
<td>GPT</td>
<td>117M</td>
<td>Basic understanding of language</td>
</tr>
<tr>
<td>GPT-2</td>
<td>1.5B</td>
<td>More nuanced language processing</td>
</tr>
<tr>
<td>GPT-3</td>
<td>175B</td>
<td>Highly advanced AI capabilities</td>
</tr>
</tbody>
</table></p>
Actual HTML Output:
<p><strong>OpenAI's Growth Trajectory:</strong>
<table>
<thead>
<tr>
<th>Version</th>
<th>Parameters</th>
<th>Abilities</th>
</tr>
</thead>
<tbody>
<tr>
<td>GPT</td>
<td>117M</td>
<td>Basic understanding of language</td>
</tr>
<tr>
<td>GPT-2</td>
<td>1.5B</td>
<td>More nuanced language processing</td>
</tr>
</tbody>
</table>
| GPT-3 | 175B | Highly advanced AI capabilities |</p>
As you can see in the actual output, the last row of the table (pertaining to GPT-3) is not being converted into HTML format and remains in markdown.
I am using the following options for conversion: extras=['tables', 'footnotes', 'markdown-in-html', 'cuddled-lists']
.
Could you please look into this issue?
Thank you for your assistance.
This seems to be a continuation of #546. The linked PR (#547) implemented a solution for when markdown-in-html
tags are on the same line as the markdown itself. The problem is that this solution is only applied when the snippet is < 3 lines long.
The assumption here is that the snippet would look like this:
<div markdown="1">Some **text**
</div>
This completely breaks down on longer snippets such as:
<div markdown="1">Some **text**
Followed by more **text**
</div>
The fix here would be to either implement a better check for HTML on the same lines as markdown, or to not check at all and always attempt to split it up.