wasiahmad/NeuralCodeSum

How to do transfer learning on the pretrained models ?

Opened this issue · 4 comments

I obtained the pretrained models from one of the threads where @vrmasrv had posted his pretrained models.
I am totally new to NLP, so forgive me if I am wrong, is there any possibility to do transfer learning on this model ?
If so, how can I do it ?

What do you mean by transfer learning? Can you elaborate?

By transfer learning, I mean to say that instead of training your network from scratch, if we could add some additional layers to your model, or make only the decoder part trainable, or any such possibility ? I want to run this model on 4 clusters of data, to see how the model performs on each cluster. Each cluster has around 100,000 code snippet pairs. I don't have the computing infrastructure needed to perform long hours of training. So I thought of doing transfer learning.

Yeah, you can do that for any seq2seq model. Our model is no exception. You can freeze any layer/module and train the remaining part of the model. What is your task? Source code summarization, right? Otherwise, our pretrained model won't be useful. Also, check if vocabulary overlaps (otherwise if there are too many OOV words, then it won't help you). One more thing to add, since you have 100k examples for each cluster, you may not require to use our model as a pre-trained model.

hi, @wasiahmad

is it possible to do transfer learning by adding new tokens to the src_dict and trg_dict and extending the dimensions of the embedding layer for the new tokens, then fine-tune the model on the new dataset?

Thanks.