Mistletoe hangs when parsing some specifically formatted Footnotes
Closed this issue · 6 comments
>>> import mistletoe
>>> input = "foo bar [1]:\r\nfoo bar\r\n\r\n[1]: https://example.org/\r\nhttps://example.org"
>>> mistletoe.markdown(input)
This never returns, or at least does not return within the limits of my patience.
Hi, it looks like this is caused by mistletoe not quite expecting CRLF line-endings in the input - see #64. From my quick testing, it freezes because of the last \r\n
. The stacktrace is like this (after pressing ctrl
+c
):
$ python issue-124.py
Traceback (most recent call last):
File "issue-124.py", line 3, in <module>
print(mistletoe.markdown(input))
File "d:\projects\my-forks\mistletoe\mistletoe\__init__.py", line 22, in markdown
return renderer.render(Document(iterable))
File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 150, in __init__
self.children = tokenize(lines)
File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 49, in tokenize
return tokenizer.tokenize(lines, _token_types)
File "d:\projects\my-forks\mistletoe\mistletoe\block_tokenizer.py", line 51, in tokenize
return make_tokens(tokenize_block(iterable, token_types))
File "d:\projects\my-forks\mistletoe\mistletoe\block_tokenizer.py", line 67, in tokenize_block
result = token_type.read(lines)
File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 734, in read
match_info = cls.match_reference(lines, string, offset)
File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 754, in match_reference
match_info = cls.match_link_dest(string, label_end)
File "d:\projects\my-forks\mistletoe\mistletoe\block_token.py", line 793, in match_link_dest
offset = shift_whitespace(string, offset+1)
File "d:\projects\my-forks\mistletoe\mistletoe\core_tokens.py", line 381, in shift_whitespace
for i, c in enumerate(string[index:], start=index):
KeyboardInterrupt
So I would classify this as an enhancement with a workaround: use simple \n
if you need to create an input string with line-endings programmatically (or possibly use a multi-line string).
OK for now?
I would not classify a problem in which any input causes the library to hang forever as in need of an enhancement, but rather suffering from a bug. Consider that this is a DoS vector.
I will apply an appropriate workaround (converting CRLF to LF) in my software, but this is definitely a bug and probably an urgent one at that.
Good news, it looks like I found the culprit in the Footnote.backtrack()
method / call. I guess I can come with a fix soon.
Fixed in the master branch. It has shown that any whitespace character before \n
can break the parsing, not just \r
.
Thanks!