Map contigs to reference sequence and build consensus

generate-consensus.sh

Bash script that takes reference sequences (e.g., exon's, UCE loci) and maps assembled contigs to the reference sequences.

Requires:

  1. BWA
  2. SAMtools
  3. BCFtool
  4. VCFtools

Inputs:

  1. Directory of fasta contigs (or other fasta). Could be modified to work on read files (to do?)
  2. Fasta reference file to map reads to

This runs on directories of contigs, but may also work on single contig files (untested).

To run:

chmod +x generate-consensus.sh
generate-consensus.sh ./path/to/contigs-dir ./path/to/output-dir ./path/to/reference_fasta_file.fasta num_processors

generate-alignments.py

Multi-faceted python script. Originally intended to convert fastq output from the above bash script to fasta files sorted by locus.
Was expanded to: generate phyluce inputs (post matching/single fasta file generation), convert to nexus, concat nexus files and prepare a phylip file.

Run generate-alignments.py -h for additional information on options.

Requires:

  1. BIOpython

Inputs:

  1. Directory containing fastq output from above script.
  2. Fasta reference file used in above script.

Need to look more into pruning / cleaning the mapped 'alignments.' Right now, the 'alignment' is generated by position during the mapping, and not using a traditional multiple alignment program. Creates problems with low coverage contigs / large references (i.e., many "N's" are present in the datasets I've tried. GBLOCKS may help this, but remains currently untested.

Aspects of python code from Phyluce