Dataloder thread issues
Closed this issue · 1 comments
Thanks for awesome paper and model! I was experimenting with multi-node training and sometimes (I can't trace the problem) I end up running into:
Error catched when processing sample from viola: The global thread pool has not been initialized.: ThreadPoolBuildError { kind: GlobalPoolAlreadyIniti alized }
issues coming from the tokenizer of the llm. Have you seen this error before? I run the producer, fill the buffer and then run the training.
I assume the producer is blocking some threads for the rust tokenizer fast of the T5.
Maybe you can disable the parallelism of the tokenizer:
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
In any case, I would suggest you use the HDF5VLADataset rather than the TensorFlow Dataset. That way, you will not need any producer...