trentm/python-markdown2

Issue with Table Conversion - Last Row Not Converted

syntaxsurge opened this issue · 1 comments

Hello markdown2 team,

I've encountered an issue with the markdown-to-HTML conversion process, specifically concerning tables. The last row of a markdown table isn't being converted correctly into HTML. Below, I've provided a sample of the markdown input and the resulting HTML output for your reference.

Markdown Input:

<p markdown="1"><strong>OpenAI's Growth Trajectory:</strong>
| Version | Parameters | Abilities                       |
|---------|------------|---------------------------------|
| GPT     | 117M       | Basic understanding of language |
| GPT-2   | 1.5B       | More nuanced language processing|
| GPT-3   | 175B       | Highly advanced AI capabilities |</p>

Expected HTML Output:

<p><strong>OpenAI's Growth Trajectory:</strong>
<table>
<thead>
<tr>
  <th>Version</th>
  <th>Parameters</th>
  <th>Abilities</th>
</tr>
</thead>
<tbody>
<tr>
  <td>GPT</td>
  <td>117M</td>
  <td>Basic understanding of language</td>
</tr>
<tr>
  <td>GPT-2</td>
  <td>1.5B</td>
  <td>More nuanced language processing</td>
</tr>
<tr>
  <td>GPT-3</td>
  <td>175B</td>
  <td>Highly advanced AI capabilities</td>
</tr>
</tbody>
</table></p>

Actual HTML Output:

<p><strong>OpenAI's Growth Trajectory:</strong>
<table>
<thead>
<tr>
  <th>Version</th>
  <th>Parameters</th>
  <th>Abilities</th>
</tr>
</thead>
<tbody>
<tr>
  <td>GPT</td>
  <td>117M</td>
  <td>Basic understanding of language</td>
</tr>
<tr>
  <td>GPT-2</td>
  <td>1.5B</td>
  <td>More nuanced language processing</td>
</tr>
</tbody>
</table>

| GPT-3   | 175B       | Highly advanced AI capabilities |</p>

As you can see in the actual output, the last row of the table (pertaining to GPT-3) is not being converted into HTML format and remains in markdown.

I am using the following options for conversion: extras=['tables', 'footnotes', 'markdown-in-html', 'cuddled-lists'].

Could you please look into this issue?

Thank you for your assistance.

This seems to be a continuation of #546. The linked PR (#547) implemented a solution for when markdown-in-html tags are on the same line as the markdown itself. The problem is that this solution is only applied when the snippet is < 3 lines long.

The assumption here is that the snippet would look like this:

<div markdown="1">Some **text**
</div>

This completely breaks down on longer snippets such as:

<div markdown="1">Some **text**
Followed by more **text**
</div>

The fix here would be to either implement a better check for HTML on the same lines as markdown, or to not check at all and always attempt to split it up.