`MarkdownRenderer` is not concurrent safe
Opened this issue · 5 comments
What
Constructing a MarkdownRenderer
modifies a global array (mistletoe.block_token.remove_token
), which is not concurrent safe
mistletoe/mistletoe/markdown_renderer.py
Line 112 in b911e5b
As a result, using renderers in multiple threads at the same time results in exception being raised
To reproduce
Reproducable with mistletoe==1.2.1
import threading
import time
from mistletoe.markdown_renderer import MarkdownRenderer
def worker():
with MarkdownRenderer() as render:
time.sleep(1)
threading.Thread(target=worker).start()
threading.Thread(target=worker).start()
It will raise
Exception in thread Thread-2:
Traceback (most recent call last):
File "**\Python39\lib\threading.py", line 973, in _bootstrap_inner
self.run()
File "**\Python39\lib\threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "**\example.py", line 8, in worker
with MarkdownRenderer() as render:
File "**\venv\lib\site-packages\mistletoe\markdown_renderer.py", line 101, in __init__
block_token.remove_token(block_token.Footnote)
File "**\venv\lib\site-packages\mistletoe\block_token.py", line 61, in remove_token
_token_types.remove(token_cls)
ValueError: list.remove(x): x not in list
Other notes
Yeah I'm aware of the following notes in the readme:
Important: As can be seen from the example above, the parsing phase is currently tightly connected with initiation and closing of a renderer. Therefore, you should never call Document(...) outside of a with ... as renderer block, unless you know what you are doing.
If the described issue is an expected behavior, I'll suggest to leave a warning in the readme as well, so people know they need a threading.Lock
for this
@Fallen-Breath, you are right. If nothing else (we could at least avoid that exception from calling remote_token()
?), this should be documented. I think that the other renderers aren't by-design guaranteed to be 100% thread-safe either (mainly in the case we would use different renderers in parallel), because of modifying the global lists of token classes. But I haven't found the time to investigate that deeply yet...
Another thing, while having just looked at #212, mistletoe also partly uses class attributes to temporarily store parsed data in between method calls (e.g. here). AFAIK, this could cause random errors when running mistletoe in parallel (within a single Python process), right? So, I think mistletoe as a whole was not written with having thread-safety on mind. Wondering how much of an issue this could be among common mistletoe users...?
I have a similar problem with this implementation. It's not concurrency but gets into issues when it has two instances of MarkdownRenderer().
with MarkdownRenderer() as render:
with MarkdownRenderer() as insider_render:
pass
I wonder if we can have an internal state for each renderer instance.
Mistletoe's reliance on global state has been annoying me for some time, so I used my unexpected free time to refactor it to explicitly pass state around instead. I tried some tricks to keep compatibility with old token code, but sadly none of them worked well enough, so I had to change a lot of mistletoe's API.
A compatibility layer could be created so that existing users of mistletoe could keep their code unchanged, but it wouldn't make their use of mistletoe thread-safe.
@pbodnar as the most active maintainer, would you consider merging something like this? A large breaking change to mistletoe's structure to make it concurrent and reentrant, with or without a compatibility layer? Or is the current state good enough and this change too disruptive?
@dvdkon, thanks for pushing this further. As you yourself write, it is not a small change. So I'm not sure if I can give it the required time nowadays. So far, we've got just 3 upvotes for this issue, which is a relatively small number. But I agree it would be great to have mistletoe thread-safe. I guess I will look at this closer one day, but I cannot promise anything...