rbturnbull/orthoflow

trimming step missing from workflow

Closed this issue · 1 comments

After aligning multiple sequence alignments, sites with low phylogenetic information (e.g., gap-rich sites) should be removed. Following the workflow diagram, this is step 8, which requires ClipKIT. To run ClipKIT, the appropriate command is: clipkit <input>

An additional argument, -o, can be used to specify the name of the output file. If possible, I think it would be great to save the stdout from trimming each alignment. This will provide helpful information -- e.g., how much of the alignment was removed during trimming.

For nucleotide sequences, they can be trimmed after using the thread_dna function in PhyKIT. In other words, the codon-based alignment can be trimmed.

complete