protein encoding
wawpaopao opened this issue · 1 comments
May I ask how to encode the active site from BLOSUM62 matrix?
Hi @wawpaopao thanks for your interest in the work.
You can achieve this using the ProteinFeatureLanguage
from the pytoda
package (which is a dependency of this repo).
from pytoda.proteins import ProteinFeatureLanguage
protein_language = ProteinFeatureLanguage(features='blosum_norm')
aas = "ABCDEF"
protein_language.sequence_to_token_indexes(aas)
If you dont have pytoda
yet, you can also install it via pip
now. This is the normalized version of the BLOSUM matrix which we recommend for training neural networks. If you change the key to blosum
you will get the integer values (like in the original matrix).
The code in this repo should allow for reproduction of the configurations with learned embedding, but not for the BLOSUM62 matrix. We did this for the revision. If you need an example, please let me know, I can add the code snippet, it's a minor adaptation only in the training/testing script.