matthewwithanm/python-markdownify

Improve runtime for parent element context checking

Closed this issue · 0 comments

There are various places where conversion functions must know about their parent/ancestor context, such as:

  • Performing inline (single-line) text conversion in <h1>-<h6> and <th>/<td> tags
  • Suppressing formatting inside <pre>, <code>, <kbd>, and <samp> tags
  • Avoiding newline collapsing in <pre> tags

The current code uses Beautiful Soup's find_parent() method to check the parent/ancestor context. However, this imposes runtime overhead, particularly for content with nontrivial content (such as nested <div>, <section>, and <article> hierarchy).

This enhancement issue requests that context be propagated downward instead of checked upward. Because only a single level is added for each downward traversal, instead of checking all ancestors upward, complexity would be reduced from O(n!) to O(n).