huggingface/tokenizers

Unsound use of unsafe in `src/utils/parallelism.rs`

albertsgarde opened this issue · 1 comments

The static variable USED_PARALLELISM is accessed in has_parallelism_been_used, and modified in MaybeParallelIterator::into_maybe_par_iter and MaybeParallelBridge::maybe_par_bridge.
All these cases are unsafe because if they are done at the same time from different threads, they could cause a data race which has undefined behaviour.
The issue is that all of these are safe functions, and no checks are made to ensure that the unsafe operations are in fact safe.
This means that it would be possible to cause UB in safe rust by calling these functions from separate threads.
There maybe reasons to believe that this is unlikely or impossible given the rest of the library (I don't know the code base well enough to say), but that does not change the fact that this is unsound.

The easiest way to fix this would be to place the variable in a Mutex. I propose an implementation of this in #1492 .

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.