alexarnimueller/LSTM_peptides

About the pseudo-random peptide sequences

jkwang93 opened this issue · 5 comments

Hello Alex,
I recently read your work and was very inspired. I found a way to generate pseudo-random peptide sequences in your code:


self.ran = Random(len(self.generated), np.min(d.descriptor), np.max(d.descriptor)) # generate rand seqs
probas = count_aas(''.join(seq_desc.sequences)).values() # get the aa distribution of training seqs
self.ran.generate_sequences(proba=probas)

But the pseudo-random peptide sequences I generated using this code are completely different from the peptide sequences provided in your appendix. Only 15% of the sequences predicted by the CAMP predictor are AMP, but the pseudo-random peptide sequences you provided exceed 70% Both were predicted to be AMP.
May I know what is this all about? How are your pseudo-random peptide sequences generated?

Looking forward to your reply!

sorry it was my mistake~

Hey jkwang93
Thanks for your interest and great you could fix it.
Good luck with your work

Hello Alex,

Here are some questions I hope you can answer.
I use RNN training, which will overfit and generate more polypeptides that are repeated with the training set. And random, the generated peptides are not repeated, and have good diversity (use clustaW to test pairwise) and activity; there is even helice, which has a higher activity ratio. So I am very confused, what are the advantages of so many subsequent AI-based AMP designs compared to random and helice? Or where are the shortcomings of random and helice?

Thank you very much~

did you have a look at the documentation of modlamp.org ? it explains the different classes of sequences that can be constructed

Thank you, I have read the document here, is it convenient for you to provide an email address (my email: jikewang@whu.edu.cn), so that our communication is more convenient.
Thank you so much.