wilsonzlin/minify-html

Unbounded tree depth causes eventual segfault due to excessive stack frames

kevinhu opened this issue · 1 comments

Hi! This is a super useful tool. I'm using this package for a scraper, and I ran into a segfault when running minify_html 0.11.1 (default settings) with this particular website: https://gist.github.com/kevinhu/1c60437a9cecf3b8c741c3f006d35b8f

To reproduce:

import minify_html

with open("./bad_website.html", "r") as f:
    long_html = f.read()

    minified = minify_html.minify(long_html)

I also tried using minify_html_onepass, which fails gracefully with the following error:

SyntaxError: Closing tag name does not match opening tag (expected "span", got "a"). [Character 2824653]

I did a quick test and this appears to be due to an extremely deep tree; there were over 4000 stack frames before the segfault. This isn't really solvable but I could implement a feature to limit the max parse depth.