Creating new embedding with a portion of data
SukruHan opened this issue · 2 comments
Hi,
First of all thanks for this great effort!
I need to create BERT models with new configurations and re-train the models from scratch using reduced number of data and reduced number of types of proteins if possible.
What kind of path should I follow?
My initial problem is that, I couldn't find a way to re-train BERT model from scratch to create new embeddings. (I am not talking about the fine-tuning of parameters.)
I have checked this issue; #89 and learned about the model modification.
But I think, I need more guidance on this issue.
Thanks, Best Regards
Since this is a question that gets asked a lot and since TAPE's training machinery is a bit old and not quite maintained, I went ahead and wrote a tutorial of how to train a language model using fairseq
, which is Facebook AI's sequence modeling framework. It's very simple, and all you need is a fasta file.
Here's the colab with the tutorial. This is meant to get you started, although you actually can train this in the colab just fine.
https://colab.research.google.com/drive/1JrKtL7bHTSyYYRvqfQhezy0qqiZkuNEb?usp=sharing
Hope this helps!
Hi,
Thanks for the quick answer and guidance.
Best Regards,