LazyAF Pipeline

This pipeline is currently designed to work with ColabFold_BATCH for high-throughput protein structure and complex prediction. I wanted to create an accessible way of taking genome fasta files from NCBI and quickly adding on a bait protein sequence to each coding sequence, perform large scale in silico pulldown predictions and create a table of results to analyse. This all happens through Google Dive and Colaboratory.

This pipeline consist of three parts:

Part 1 produces individual fasta files for each coding sequence in the genome which have been modified to include the bait protein sequence as a second chain.

Part 2 uses ColabFold_BATCH to perform AlphaFold2-based model prediction on each protein-protein pair.

Part 3 then takes the raw output from these predictions and copies the rank_001 JSON files into a new folder before extracting the pTM and ipTM scores, calculating a ranking_confidence score and putting that all into a .csv table for analysis.

For further details, refer to our manuscript

Thank you,
Tom McLean
@TomMcLean05

Jun-Lizst/LazyAF

LazyAF Pipeline