/PLMAlign

PLMAlign utilizes per-residue embeddings as input to obtain specific alignments and more refined similarity

Primary LanguageJupyter NotebookOtherNOASSERTION

PLMAlign

  • 2024.6.5 Update: We have uploaded the Dataset of PLMSearch & PLMAlign in Zenodo.

This is the implement of PLMAlign, a pairwise protein sequence alignment tool in "PLMSearch: Protein language model powers accurate and fast sequence search for remote homology". PLMAlign takes per-residue embeddings as input to obtain specific alignments and corresponding alignment scores.

Specifically, PLMAlign can achieve local and global alignment. The specific algorithm and parameters are similar to the SW and NW algorithms implemented by EMBL-EBI and pLM-BLAST. However, by converting a fixed substitution matrix into similarity calculated by the dot product of per-residue embeddings, PLMAlign is able to capture deep evolutionary information and perform better on remote homology protein pairs.

Quick links

Webserver

PLMAlign web server : dmiip.sjtu.edu.cn/PLMAlign ✈️

PLMSearch web server : dmiip.sjtu.edu.cn/PLMSearch 🚀

PLMSearch source code : github.com/maovshao/PLMSearch 🚁

Requirements

Follow the steps in requirements.sh

Data preparation

We have released our experiment data, which can be downloaded from plmalign_data or Zenodo.

# Use the following command or download it from https://zenodo.org/records/11480660
wget https://dmiip.sjtu.edu.cn/PLMAlign/static/download/plmalign_data.tar.gz
tar zxvf plmalign_data.tar.gz

Reproduce all our experiments

Reproduce all our experiments with good visualization by following the steps in:

Notice: Detailed results are saved in data/alignment_benchmark/result/.

Notice: Detailed results are saved in data/scope40_test/output/.

Run PLMAlign locally

Notice: the inputs and outputs of the example are saved in example/.

Citation

Liu, W., Wang, Z., You, R. et al. PLMSearch: Protein language model powers accurate and fast sequence search for remote homology. Nat Commun 15, 2775 (2024). https://doi.org/10.1038/s41467-024-46808-5