Quadratic behavior when parsing emphasis
nwellnhof opened this issue · 0 comments
nwellnhof commented
Parsing a *t
sequence followed by a _t*_
sequence exhibits quadratic behavior:
$ for n in 5000 10000 20000; do python3 -c "print('*t '*$n+'_t*_ '*$n)" |time -f "$n elems: %e secs" build/src/cmark >/dev/null; done
5000 elems: 0.19 secs
10000 elems: 0.88 secs
20000 elems: 5.45 secs
Sample pattern
*t *t *t *t _t*_ _t*_ _t*_ _t*_
Analysis
- After matching a
*
closer, the preceding_
opener is removed. - After parsing a
_
closer, no opener is found. The correspondingopeners_bottom
entry is set to preceding*
opener. - After matching the next
*
closer, the corresponding opener is removed. This is the entry inopeners_bottom
. - The entry in
openers_bottom
becomes a dangling pointer. - Subsequent searches for an opener for
_
closers will test against the dangling pointer (which is undefined behavior). - Since the opener was removed, there will never be a match.
- The
openers_bottom
optimization stops working, resulting in quadratic behavior with abusive input patterns.
Possible solutions
- Adjust
openers_bottom
when removing delimiters (somewhat expensive). - Number the delimiters and store the delimiter position instead of a pointer in
openers_bottom
.