/msa-af2-nf

Nextflow pipeline for the computation of structure-based MSAs with AlphaFold2 models

Primary LanguageJupyter NotebookMIT LicenseMIT

Highly significant improvement of protein sequence alignments with AlphaFold2

Data, documentation, analysis and nextflow pipeline for the manuscript "Highly significant improvement of protein sequence alignments with AlphaFold2".

Credits

This work has been carried out in Notredame Lab at the Centre for Genomic Regulation - CRG

The authors who contributed to the analysis and manuscript are:

  • Athanasios Baltzis
  • Leila Mansouri
  • Suzanne Jin
  • Bjorn Langer
  • Ionas Erb
  • Cedric Notredame

Notebooks

This repository contains a series of Jupyter Notebooks that contain the steps for replicating the analysis, tables and figures in the manuscript using R.

Pipeline and containers

The pipeline for predicting the AF2 models and producing the MSAs is built using Nextflow. It comes with a singularity container (the recipe is available here) for running AF2 and a docker container (available on DockerHub here).

Usage

  • Download the genetic databases required for AlphaFold2 using the provided script.
  • Download and format the database used for PSI-Coffee blast search (by default Uniref50).
  • Make sure you have singularity installed in your system.
  • Install the Nextflow runtime by running the following command:
     curl -fsSL get.nextflow.io | bash
    
  • You can launch the pipeline execution by entering the command shown below:
     nextflow run athbaltzis/msa-af2-nf
    

By default the pipeline is executed against the provided example dataset. You can modify the input data as well as the other available parameteres listed below:

--input_fasta

Input sequences (FASTA)

--list

Input lists of sequences

--template

Input template lists

--pdbs

Input experimentally determined PDB structures

--db

Input path to Database for PSI-Coffee

--predict

Predict structures with AF2 [true or false(default)]

--AF2

Path to AF2 predicted models (if --predict false)

--pdb_for_dssp

Input PDB structures for secondary structure assignment