Workflow for 1kp Assemblies
NOT WORKING WITH NEW BLAST DATABASE VERSION 5
How to Use
make download
will download and decompress assemblies and stats filesmake split
will divide a specified genome in fasta format into individual genesmake blastdb
creates a blast database for each assemblymake search
performs a blast search on all databases using all genes from a specied genomemake create_alignment
creates an alignment for each gene found in each speciesmake align
automatically aligns the alignment created bymake create_alignment
Specify a genome by typing genome=CODE (See below for CODE details) after the above commands
Roadmap
Prepare Query Sequences
- Download annotated genome and convert names to the following form:
- CODE-geneName
- Split each gene into its own file
Prepare Local Blast Database
- Download & decompress wanted 1kp SOAPdenovoTrans assembly
- Filter out "bad" scaffolds using Transrate stats file
- Create a blast database using the filtered 1kp assembly
Search
- Use
blastn
andtblastx
to find a gene in the 1kp assembly from an
annotated genome - Create different versions of search that can be modified from the command
line (ex. blastp, megablast, hmmer3)
Alignment
- Collect search hits from each database into an alignment for each gene
Codes
- For 1kp assemblies, use assigned 4-letter code
- For annotated genomes, use first three letters of genus and first two letters of species