/ProtCNNSim

Method for finding protein family similarity

Primary LanguageJupyter NotebookApache License 2.0Apache-2.0

ProtCNNSim: Discovery of new protein family relationships with Deep Learning

Data

Per-family embeddings and their Pfam labels can be downloaded from Zenodo 10.5281/zenodo.10091909.

Pfam 35 family-to-clan mapping can be downloaded from the Pfam FTP site (direct download link).

Embeddings, labels and Pfam-A clan descriptions should be located in the data/ directory.

Calculating scores

ProtCNNSim_examples.ipynb contains the code for calculating the scores and building the sensitivity curve.

All utilities needed for this are in utils.py.