PaccMann/paccmann_kinase_binding_residues

protein encoding

wawpaopao opened this issue · 1 comments

May I ask how to encode the active site from BLOSUM62 matrix?

Hi @wawpaopao thanks for your interest in the work.

You can achieve this using the ProteinFeatureLanguage from the pytoda package (which is a dependency of this repo).

from pytoda.proteins import ProteinFeatureLanguage

protein_language = ProteinFeatureLanguage(features='blosum_norm')
aas = "ABCDEF"
protein_language.sequence_to_token_indexes(aas)

If you dont have pytoda yet, you can also install it via pip now. This is the normalized version of the BLOSUM matrix which we recommend for training neural networks. If you change the key to blosum you will get the integer values (like in the original matrix).

The code in this repo should allow for reproduction of the configurations with learned embedding, but not for the BLOSUM62 matrix. We did this for the revision. If you need an example, please let me know, I can add the code snippet, it's a minor adaptation only in the training/testing script.