Load transformer config and tokenizer from disk when n>1 for nfold traninig
Closed this issue · 0 comments
kermitt2 commented
In case we have:
- a transformer in the architecture
- we load initially the transformer from Hugging Face Hub
When performing a nfold training for text classification or sequence labeling, we currently reload the transformer configuration and the tokenizer via AutoModel and HuggingFace hub n times, one time for each model. In order to limit the access to Hugging Face Hub (not very reliable), we should only make an online access the first time for n=1
, and then load the transformer configuration and the transformer tokenizer from file, because both have been saved when building the model for n=1
.