/blasting

BLAST pipeline

Primary LanguagePython

BLASTing

Dependencies

Python3 should be installed.

  • python3 command is necessary (preferably in the command path).

BLAST+ should be installed.

  • blastp command is necessary (preferably in the command path).
  • install_blast.sh will install lib/blast/bin/blastp

Usage

Data preparation

1_download_genomes.py [-o OUT_DIR] homologene_species.tsv
  • The default OUT_DIR is data.
2_process_genomes.py [-p path/to/makeblastdb] data/genomes_downloaded.tsv

Run BLAST

3_blast_all_vs_all.py [-p path/to/blastp] [-o OUT_DIR] -n 45 data/genomes_downloaded.tsv > blast.queue 2> blast.err &
3_blast_time.sh [-o OUT_DIR] > blast.time
  • The default OUT_DIR is blast.out.

Summarize results

cd blast.out
4_extract_top_score.py *out -n 90
for i in {1..21}; do ../bin/perl/extract_threshold.pl $i > $i.threshold; done &
for i in {1..21}; do ../bin/perl/extract_paralogs.pl $i-$i.out $i.threshold > $i-$i.paralog_score & done
for i in {1..21}; do c $i-$i.paralog_score | ../bin/perl/summarize_paralog.pl > $i.paralogs & done
cd ..
5_extract_bbh_org.py input/genomes_downloaded.tsv -i blast.out -o blast.out -n 50
cd blast.out
6_make_matrix.pl ../homologene_species.tsv ../input/gene_refseq.human > ../blast.out.mat &