This pipeline is currently designed to work with ColabFold_BATCH for high-throughput protein structure and complex prediction. I wanted to create an accessible way of taking genome fasta files from NCBI and quickly adding on a bait protein sequence to each coding sequence, perform large scale in silico pulldown predictions and create a table of results to analyse. This all happens through Google Dive and Colaboratory.
This pipeline consist of three parts:
Part 1 produces individual fasta files for each coding sequence in the genome which have been modified to include the bait protein sequence as a second chain.
Part 2 uses ColabFold_BATCH to perform AlphaFold2-based model prediction on each protein-protein pair.
Part 3 then takes the raw output from these predictions and copies the rank_001 JSON files into a new folder before extracting the pTM and ipTM scores, calculating a ranking_confidence score and putting that all into a .csv table for analysis.
For further details, refer to our manuscript
Thank you,
Tom McLean
@TomMcLean05