*BEAST issues
rvosa opened this issue · 5 comments
There are some issues with the *BEAST implementation:
- it appears that we actually never have clade alignments with multiple sequences per species. This defeats the purpose of the multispecies, multilocus coalescent. A possible way to address this is to add, e.g. as an option, that during
smrt clademerge
for each merged sequence we look for up to X similar sequences (i.e. same species, same seed gi) and profile align those against the existing clade alignment. - there should be an option
smrt cladeinfer --append
, which appends log files and tree files to a previously existing run, so that users can raise ESS incrementally. Two things to keep in mind here: i) check to make sure that the translation tables in tree files are identically sorted so that the tree descriptions can be appended verbatim to the existing nexus file ii) burnin is properly removed before appending.
We set sortTranslationTable="true"
in the BEAST xml, which means we should be able to append tree descriptions verbatim.
--append
makes the next available suffix for the *.nex and *.log files (e.g. *.nex.1). At the end of the run the additional tree results are appended to the stem file sans --burnin
.
The problem with including more haplotypes in *BEAST is that we have a separate CLADE_MAX_DISTANCE setting, so we can't re-use the results from smrt orthologize
, because these are filtered using BACKBONE_MAX_DISTANCE. We basically need to redo the orthologize, which is why most of its functionality has been moved to SequenceGetter, as of e1f4973
Perhaps the easiest way to do this is to implement something like smrt clademerge --enrich
, which adds up to CLADE_HAPLOTYPES (e.g. 3) similar sequences to each species while verifying that CLADE_MAX_DISTANCE is not exceeded.