songlab-cal/tape

the size of pre-train model 's input data

willow-yll opened this issue · 1 comments

Hi,
I want to use pretrain-model and add some layers to train a downstream task,
what kind of data should I input?
Should I process the amino acid sequence into one-hot encoding?
Or if there is a file of tape I can use to deal with the amino acid sequence?
I don't know the data size to input....

my model like this:

pretrained = pretrained = UniRepModel.from_pretrained('babbler-1900', force_download=False)

class UniRep_bilstm(nn.Module):

    def __init__(self,emb_dim,hidden_dim,num_layers,output_dim,max_len):

        super(UniRep_bilstm,self).__init__()
        self.unirep = pretrained
        self.bilstm = bilstm(emb_dim,hidden_dim,num_layers,output_dim,max_len)

    def forward(self,input_ids):
        unirep_outputs = self.unirep(input_ids)
        outputs = self.bilstm(unirep_outputs)
        return outputs

Could you please tell me the size and type of input data , and the way to deal with amino acid sequence? thanks :)