Highly significant improvement of protein sequence alignments with AlphaFold2

Data, documentation, analysis and nextflow pipeline for the manuscript "Highly significant improvement of protein sequence alignments with AlphaFold2".

Credits

This work has been carried out in Notredame Lab at the Centre for Genomic Regulation - CRG

The authors who contributed to the analysis and manuscript are:

Athanasios Baltzis
Leila Mansouri
Suzanne Jin
Bjorn Langer
Ionas Erb
Cedric Notredame

Notebooks

This repository contains a series of Jupyter Notebooks that contain the steps for replicating the analysis, tables and figures in the manuscript using R.

Pipeline and containers

The pipeline for predicting the AF2 models and producing the MSAs is built using Nextflow. It comes with a singularity container (the recipe is available here) for running AF2 and a docker container (available on DockerHub here).

Usage

Download the genetic databases required for AlphaFold2 using the provided script.
Download and format the database used for PSI-Coffee blast search (by default Uniref50).
Make sure you have singularity installed in your system.
Install the Nextflow runtime by running the following command:
```
 curl -fsSL get.nextflow.io | bash
```
You can launch the pipeline execution by entering the command shown below:
```
 nextflow run athbaltzis/msa-af2-nf
```

By default the pipeline is executed against the provided example dataset. You can modify the input data as well as the other available parameteres listed below:

`--input_fasta`

Input sequences (FASTA)

`--list`

Input lists of sequences

`--template`

Input template lists

`--pdbs`

Input experimentally determined PDB structures

`--db`

Input path to Database for PSI-Coffee

`--predict`

Predict structures with AF2 [true or false(default)]

`--AF2`

Path to AF2 predicted models (if --predict false)

`--pdb_for_dssp`

Input PDB structures for secondary structure assignment

cbcrg/msa-af2-nf

Highly significant improvement of protein sequence alignments with AlphaFold2

Credits

Notebooks

Pipeline and containers

Usage

--input_fasta

--list

--template

--pdbs

--db

--predict

--AF2

--pdb_for_dssp