Can you use the pre-trained BERT models, but add novel tokens to the vocabulary?
mepster opened this issue · 0 comments
mepster commented
Can you use the pre-trained BERT models, but add novel tokens to the vocabulary during fine-tuning? Any tips on what's needed for this?
Or during fine-tuning MUST you use the same vocab.txt file that was used in pre-training?
I want to add some of the IUPAC symbols, for example the symbol Y which means "T or C". So that will expand my vocabulary a lot.
But I don't have the resources to retrain.
Related, but I believe talking about training from scratch:
#81