
Workflow to compare AlphaFold models with PDB structures and ECOD domains

Primary LanguagePythonMIT LicenseMIT


Workflow to compare AlphaFold models with PDB structures and ECOD domains


The following tools should be installed in the working directory:

  • Foldseek $ wget --no-check-certificate https://mmseqs.com/foldseek/foldseek-linux-sse41.tar.gz; tar xvzf foldseek-linux-sse41.tar.gz; mv foldseek/bin ./; rm -r foldseek; rm foldseek-linux-sse41.tar.gz;
  • TMalign $ wget https://zhanggroup.org/TM-align/TMalign.gz; gunzip TMalign.gz; chmod u+x TMalign;
  • Python >= 3.6

Python packages

  • mysqlclient >= 2.0
  • requests >= 2.25
  • mysql-connector-python >= 8.0.27

Run the comparison

Fill the config file (config_comp.ini).

  • ecod_pdb_file should be <path>/ecod.latest.F70.pdb.tar.gz
  • ecod_file and matches_file should have .txt extension.
  • output_file should have .m8 extension
  • af2pfam_file should have .tsv extension

Execute $ ./pipeline.sh config_comp.ini

The pipeline is divided in 4 steps:

Find the AlphaFold models

This is executed by the python script find_pfam_duf.py. Only the AlphaFold models matching Pfam without PDB structures are used for the comparison.

Download files from ECOD

2 files need to be downloaded:

Run Foldseek

Run Foldseek using the AlphaFold pdb files and ECOD pdb files downloaded.

Extract relevant matches

This is executed by the python script find_relevant_matches.py. From the output of Foldseek, we only consider matches with an e-value < e-05. Additionnally, TMalign is run and only matches with TMscores > 0.6 for normalized by length of Chain_2 are kept. Annotations to Pfam domains and ECOD domains are also provided when available.