question about the input sequences for esm
Closed this issue · 1 comments
huawen-poppy commented
Hi! Thank you for your nice tool!
I noticed that before using the esm model, you removed the sequences that with '*' (stop codons). May I ask why do we need to removed all such kind of sequences? Is this step necessary?
Thanks!
Yanay1 commented
I believe that these were removed because that character was not in the ESM vocabulary. For the ensembl proteomes, I think the stop codon is implied at the end of the sequence.
I am not sure, but I don't think it is necessary.