Automated analysis and visualization generation of the LD12 genomes and the freshwater genomes comparison paper.
- All code written by Lucas Sinclair.
The published paper for which this pipeline was made can be found here:
http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2015260a.html
The code is documented with many docstrings, in addition here is overview of what happens in the analysis:
The command line tool supports a few optional arguments:
e_value
: Minimum e-value in similarity search. Defaults to0.0001
.mcl_factor
: The MCL clustering factor. Defaults to1.5
.seq_type
: Eithernucl
orprot
. Defaults toprot
.num_threads
: Number of threads to use. Default to the number of cores on the current machine.min_identity
: Minimum identity in similarity search. Defaults to0.97
.min_coverage
: Minimum query coverage in similarity search. Defaults to0.97
.