/alphaFold-ECOD-PDB

Workflow to compare AlphaFold models with PDB structures and ECOD domains

Primary LanguagePythonMIT LicenseMIT

alphaFold-ECOD-PDB

Workflow to compare AlphaFold models with PDB structures and ECOD domains

Requirements

The following tools should be installed in the working directory:

  • Foldseek $ wget --no-check-certificate https://mmseqs.com/foldseek/foldseek-linux-sse41.tar.gz; tar xvzf foldseek-linux-sse41.tar.gz; mv foldseek/bin ./; rm -r foldseek; rm foldseek-linux-sse41.tar.gz;
  • TMalign $ wget https://zhanggroup.org/TM-align/TMalign.gz; gunzip TMalign.gz; chmod u+x TMalign;
  • Python >= 3.6

Python packages

  • mysqlclient >= 2.0
  • requests >= 2.25
  • mysql-connector-python >= 8.0.27

Run the comparison

Fill the config file (config_comp.ini).

  • ecod_pdb_file should be <path>/ecod.latest.F70.pdb.tar.gz
  • ecod_file and matches_file should have .txt extension.
  • output_file should have .m8 extension
  • af2pfam_file should have .tsv extension

Execute $ ./pipeline.sh config_comp.ini

The pipeline is divided in 4 steps:

Find the AlphaFold models

This is executed by the python script find_pfam_duf.py. Only the AlphaFold models matching Pfam without PDB structures are used for the comparison.

Download files from ECOD

2 files need to be downloaded:

Run Foldseek

Run Foldseek using the AlphaFold pdb files and ECOD pdb files downloaded.

Extract relevant matches

This is executed by the python script find_relevant_matches.py. From the output of Foldseek, we only consider matches with an e-value < e-05. Additionnally, TMalign is run and only matches with TMscores > 0.6 for normalized by length of Chain_2 are kept. Annotations to Pfam domains and ECOD domains are also provided when available.