THUDM/ProteinLM

Extracting embeddings?

ddofer opened this issue · 1 comments

Could you please provide an example for extracting embeddings (per position and per sequence/batch) from the models?

Hi, Ddofer! Thank you for your interest in our work.
I think transformer_output is what you are looking for (protein embeddings). You can find transformer_output here.
If you want to use ProteinLM to encode protein sequences, the easiest way is to directly dump the output of the transformer model, then you can load the embedding for downstream tasks. If you want to perform end-to-end training, you may need to add some finetune code based on ProteinLM.
Hope my answer solves your problem :)