xijo/reverse_markdown

Improper nested ol/ul parsing

Closed this issue · 3 comments

The Gem (although awesome) seems to struggle a bit with nested lists, both numbered and unnumbered.

Take this example HTML:

  <ol>
    <li>One
      <ol>
        <li>Sub one
          <ol>
            <li>Sub sub one</li>
            <li>Sub sub two</li>
          </ol>
        </li>
      </ol>
    </li>
  </ol>

Line breaks add whitespace

The above is outputted as:

2. Two
  1. Sub one
    1. Sub sub one
     2. Sub sub two

Note the 5 spaces before the "sub sub two" list item. Boggled my mind how it could get a number of spaces that wasn't a multiple of 2 (the indent function), but it turns out the line break after the preceding </li> inserts and extra space before the subsequent <li> at some point in the pipeline. Completely stripping new lines before passing to reverse markdown solves the problem.

Numbering

The numbering of numbered lists properly resets when a subsequent ol is nested deeper than the preceding ol, but seems to continue numbering if it is shallower.

1. One
  1. Sub one
  2. Sub two

3. Two
  1. Sub one
    1. Sub sub one
    2. Sub sub two

  3. Sub two

4. Three

I tried to implement the same logic in my own Gem before realizing that reverse markdown should handle it. Perhaps it'd make more sense to store the current_li value in an array, where each element represented one level of nesting (the value being the current number). Upon discovering a less indented list item, you'd reset all elements in the array after the current nesting level. Not a show stopper, because still valid markdown, but still an odd behavior.

Using the Gem to convert Word to Markdown. Awesome stuff!

xijo commented

Hi Ben,

first of all thanks for your feedback and detailed error descriptions, this is very helpful to find nasty bugs that are out in the wild there.

After your last issue concerning unknown tags (#24) I decided to go the more fundamental way and change the 'opening/closing' system in favor of something more flexible. However that might take some time.

I'll keep you updated in the issue comments from time to time, alright?

Sounds good. Coded around both for now. It's a great gem.

One idea might be to use Nokogiri to parse the document, and iterate down through each node (using node_name to get the tag as you get it now). That'd allow you out outsource the closing tag logic.

xijo commented

After my rewrite at the weekend I added a spec for the case you described. The changes will be available after the next major release.
Those two issues should be fixed now and I'm closing this issue. Thanks for letting me know! 👍